[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 0/4] Curling: KVM Fault Tolerance
From: |
Jules |
Subject: |
Re: [Qemu-devel] [PATCH v3 0/4] Curling: KVM Fault Tolerance |
Date: |
Wed, 23 Oct 2013 08:08:55 +0800 |
> On Tue, Oct 15, 2013 at 03:26:19PM +0800, Jules Wang wrote:
> > v2 -> v3:
> > * add documentation of new option in qapi-schema.
> >
> > * long option name: ft -> fault-tolerant
> >
> > v1 -> v2:
> > * cmdline: migrate curling:tcp:<address>:<port>
> > -> migrate -f tcp:<address>:<port>
> >
> > * sender: use QEMU_VM_FILE_MAGIC_FT as the header of the migration
> > to indicate this is a ft migration.
> >
> > * receiver: look for the signature:
> > QEMU_VM_EOF_MAGIC + QEMU_VM_FILE_MAGIC_FT(64bit total)
> > which indicates the end of one migration.
> > --
> > Jules Wang (4):
> > Curling: add doc
> > Curling: cmdline interface.
> > Curling: the sender
> > Curling: the receiver
>
First of all, thanks for your superb and spot-on comments.
> It would be helpful to clarify the status of Curling in the cover letter
> email so reviewers know what to expect.
OK, but I'm not quite clear about how to clarify the status, would you
pls give me an example?
>
> This series does not address I/O or failover. I guess you are aware of
> the missing topics that I mentioned, here are my thoughts on them:
>
> I/O needs to be held back until the destination host has acknowledged
> receiving the last full migration state. The outside world cannot
> witness state changes in the guest until the migration state has been
> successfully transferred to the destination host. Otherwise the guest
> may appear to act incorrectly when resuming execution from the last
> snapshot.
>
> The time period used by the FT sender thread determines how much latency
> is added to I/O requests.
Yes, there is the latency. That is inevitable.
I guess you mean the following situation:
If a msg 'hello' is sent to the chat room server just a few seconds
before the failover happens, there is a possibility that the msg will be
sent to the others twice or be lost.
Am I right?
>
> Failover functionality is missing from these patches. We cannot simply
> start executing on the destination host when the migration connection
> ends. If the guest disk image is located on shared storage then
> split-brain occurs when a network error terminates the migration
> connection -
> will both hosts begin accessing the shared disk?
YES
>
I have a simple way to handle that. In one word, the third point
--gateway.
Both the sender and the receiver check the connectivity to the gateway
every X seconds. Let's use A and B stand for whether the sender and the
receiver are connected to the gateway respectively.
When the connection between the sender and the receiver is down.
A && B is false.
If A is false, the vm instance at the sender will be stopped.
If B is false, the vm instance at the receiver will not be started.
a.A false B false: 0 vm run
b.A false B true: 1 vm run
c.A true B false: 1 vm run
d.A true B true : 1 vm run (normal case)
It becomes complicated when we consider the state transitions in
these four states.
I suggest adding this feature to libvirt instead of qemu.
> What is your plan to address these issues?
>
> Stefan
>
- [Qemu-devel] [PATCH v3 0/4] Curling: KVM Fault Tolerance, Jules Wang, 2013/10/15
- [Qemu-devel] [PATCH v3 3/4] Curling: the sender, Jules Wang, 2013/10/15
- [Qemu-devel] [PATCH v3 2/4] Curling: cmdline interface., Jules Wang, 2013/10/15
- [Qemu-devel] [PATCH v3 4/4] Curling: the receiver, Jules Wang, 2013/10/15
- [Qemu-devel] [PATCH v3 1/4] Curling: add doc, Jules Wang, 2013/10/15
- Re: [Qemu-devel] [PATCH v3 0/4] Curling: KVM Fault Tolerance, Stefan Hajnoczi, 2013/10/17
- Re: [Qemu-devel] [PATCH v3 0/4] Curling: KVM Fault Tolerance,
Jules <=
- Re: [Qemu-devel] [PATCH v3 0/4] Curling: KVM Fault Tolerance, Michael R. Hines, 2013/10/22
- Re: [Qemu-devel] [PATCH v3 0/4] Curling: KVM Fault Tolerance, Michael R. Hines, 2013/10/22