qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] colo: Introduce resource agent and high-level test


From: Lukas Straub
Subject: Re: [PATCH 0/4] colo: Introduce resource agent and high-level test
Date: Wed, 18 Dec 2019 10:27:11 +0100

On Wed, 27 Nov 2019 22:11:34 +0100
Lukas Straub <address@hidden> wrote:

> On Fri, 22 Nov 2019 09:46:46 +0000
> "Dr. David Alan Gilbert" <address@hidden> wrote:
>
> > * Lukas Straub (address@hidden) wrote:
> > > Hello Everyone,
> > > These patches introduce a resource agent for use with the Pacemaker CRM 
> > > and a
> > > high-level test utilizing it for testing qemu COLO.
> > >
> > > The resource agent manages qemu COLO including continuous replication.
> > >
> > > Currently the second test case (where the peer qemu is frozen) fails on 
> > > primary
> > > failover, because qemu hangs while removing the replication related block 
> > > nodes.
> > > Note that this also happens in real world test when cutting power to the 
> > > peer
> > > host, so this needs to be fixed.
> >
> > Do you understand why that happens? Is this it's trying to finish a
> > read/write to the dead partner?
> >
> > Dave
>
> I haven't looked into it too closely yet, but it's often hanging in 
> bdrv_flush()
> while removing the replication blockdev and of course thats probably because 
> the
> nbd client waits for a reply. So I tried with the workaround below, which will
> actively kill the TCP connection and with it the test passes, though I haven't
> tested it in real world yet.
>

In the real cluster, sometimes qemu even hangs while connecting to qmp (after 
remote
poweroff). But I currently don't have the time to look into it.

Still a failing test is better than no test. Could we mark this test as 
known-bad and
fix this issue later? How should I mark it as known-bad? By tag? Or warn in the 
log?

Regards,
Lukas Straub



reply via email to

[Prev in Thread] Current Thread [Next in Thread]