qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replica


From: Lukas Straub
Subject: Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
Date: Fri, 16 Aug 2019 20:20:07 +0200

On Fri, 16 Aug 2019 01:51:20 +0000
"Zhang, Chen" <address@hidden> wrote:

> > -----Original Message-----
> > From: Lukas Straub [mailto:address@hidden]
> > Sent: Friday, August 16, 2019 3:48 AM
> > To: Dr. David Alan Gilbert <address@hidden>
> > Cc: qemu-devel <address@hidden>; Zhang, Chen
> > <address@hidden>; Jason Wang <address@hidden>; Xie
> > Changlong <address@hidden>; Wen Congyang
> > <address@hidden>
> > Subject: Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious
> > replication
> >
> > On Thu, 15 Aug 2019 19:57:37 +0100
> > "Dr. David Alan Gilbert" <address@hidden> wrote:
> >
> > > * Lukas Straub (address@hidden) wrote:
> > > > Hello Everyone,
> > > > These Patches add support for continious replication to colo.
> > > > Please review.
> > >
> > >
> > > OK, for those who haven't followed COLO for so long; 'continuous
> > > replication' is when after the first primary fails, you can promote
> > > the original secondary to a new primary and start replicating again;
> > >
> > > i.e. current COLO gives you
> > >
> > > p<->s
> > >     <primary fails>
> > >     s
> > >
> > > with your patches you can do
> > >
> > >     s becomes p2
> > >     p2<->s2
> > >
> > > and you're back to being resilient again.
> > >
> > > Which is great; because that was always an important missing piece.
> > >
> > > Do you have some test scripts/setup for this - it would be great to
> > > automate some testing.
> >
> > My Plan is to write a Pacemaker Resource Agent[1] for qemu-colo and then do
> > some long-term testing in my small cluster here. Writing standalone tests 
> > using
> > that Resource Agent should be easy, it just needs to be provided with the 
> > right
> > arguments and environment Variables.
>
> Thanks Dave's explanation.
> It looks good for me and I will test this series in my side.
>
> Another question: Is "Pacemaker Resource Agent[1] "  like a heartbeat module?

It's a bit more than that. Pacemaker itself is an Cluster Resource Manager, you 
can think of it like sysvinit but for clusters. It controls where in the 
cluster Resources run, what state (master/slave) and what to do in case of a 
Node or Resource failure. Now Resources can be anything like SQL-Server, 
Webserver, VM, etc. and Pacemaker itself doesn't directly control them, that's 
the Job of the Resource Agents. So a Resource Agent is like an init-script, but 
cluster-aware with more actions like start, stop, monitor, promote (to master) 
or migrate-to.

> I have wrote an internal heartbeat module running on Qemu, it make COLO can 
> detect fail and trigger failover automatically, no need external APP to call 
> the QMP command "x-colo-lost-heartbeat". If you need it, I can send a RFC 
> version recently.

Cool, this should be faster to failover than with Pacemaker.
What is the plan with cases like Primary-failover, which need to issue multiple 
commands?

> Thanks
> Zhang Chen
> >
> > Regards,
> > Lukas Straub
> >
> > [1] 
> > https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra-dev-guide.asc#what-is-a-resource-agent




reply via email to

[Prev in Thread] Current Thread [Next in Thread]