[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank insta
From: |
Lukas Straub |
Subject: |
Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances |
Date: |
Mon, 25 Sep 2023 17:32:37 +0200 |
On Mon, 25 Sep 2023 09:20:58 -0300
Fabiano Rosas <farosas@suse.de> wrote:
> CC: Daniel for the QIOChannel discussion
>
> Lukas Straub <lukasstraub2@web.de> writes:
> > On Thu, 14 Sep 2023 10:57:47 -0400
> > Peter Xu <peterx@redhat.com> wrote:
> >
> >> On Thu, Sep 14, 2023 at 10:23:38AM -0300, Fabiano Rosas wrote:
> >> > Peter Xu <peterx@redhat.com> writes:
> >> >
> >> > > On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote:
> >> > >> Peter Xu <peterx@redhat.com> writes:
> >> > >>
> >> > >> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:
> >> > >> >> The core yank code is strict about balanced registering and
> >> > >> >> unregistering of yank functions.
> >> > >> >>
> >> > >> >> This creates a difficulty because the migration code registers one
> >> > >> >> yank function per QIOChannel, but each QIOChannel can be
> >> > >> >> referenced by
> >> > >> >> more than one QEMUFile. The yank function should not be removed
> >> > >> >> until
> >> > >> >> all QEMUFiles have been closed.
> >> > >> >>
> >> > >> >> Keep a reference count of how many QEMUFiles are using a QIOChannel
> >> > >> >> that has a yank function. Only unregister the yank function when
> >> > >> >> all
> >> > >> >> QEMUFiles have been closed.
> >> > >> >>
> >> > >> >> This improves the current code by removing the need for the
> >> > >> >> programmer
> >> > >> >> to know which QEMUFile is the last one to be cleaned up and fixes
> >> > >> >> the
> >> > >> >> theoretical issue of removing the yank function while another
> >> > >> >> QEMUFile
> >> > >> >> could still be using the ioc and require a yank.
> >> > >> >>
> >> > >> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> > >> >> ---
> >> > >> >> migration/yank_functions.c | 81
> >> > >> >> ++++++++++++++++++++++++++++++++++----
> >> > >> >> migration/yank_functions.h | 8 ++++
> >> > >> >> 2 files changed, 81 insertions(+), 8 deletions(-)
> >> > >> >
> >> > >> > I worry this over-complicate things.
> >> > >>
> >> > >> It does. We ran out of simple options.
> >> > >>
> >> > >> > If you prefer the cleaness that we operate always on qemufile
> >> > >> > level, can we
> >> > >> > just register each yank function per-qemufile?
> >> > >>
> >> > >> "just" hehe
> >> > >>
> >> > >> we could, but:
> >> > >>
> >> > >> i) the yank is a per-channel operation, so this is even more
> >> > >> unintuitive;
> >> > >
> >> > > I mean we can provide something like:
> >> > >
> >> > > void migration_yank_qemufile(void *opaque)
> >> > > {
> >> > > QEMUFile *file = opaque;
> >> > > QIOChannel *ioc = file->ioc;
> >> > >
> >> > > qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> >> > > }
> >> > >
> >> > > void migration_qemufile_register_yank(QEMUFile *file)
> >> > > {
> >> > > if (migration_ioc_yank_supported(file->ioc)) {
> >> > > yank_register_function(MIGRATION_YANK_INSTANCE,
> >> > > migration_yank_qemufile,
> >> > > file);
> >> > > }
> >> > > }
> >> >
> >> > Sure, this is what I was thinking as well. IMO it will be yet another
> >> > operation that happens on the channel, but it performed via the
> >> > file. Just like qio_channel_close() at qemu_fclose(). Not the end of the
> >> > world, of course, I just find it error-prone.
> >> >
> >> > >>
> >> > >> ii) multifd doesn't have a QEMUFile, so it will have to continue using
> >> > >> the ioc;
> >> > >
> >> > > We can keep using migration_ioc_[un]register_yank() for them if
> >> > > there's no
> >> > > qemufile attached. As long as the function will all be registered
> >> > > under
> >> > > MIGRATION_YANK_INSTANCE we should be fine having different yank func.
> >> > >
> >> >
> >> > ok
> >> >
> >> > >>
> >> > >> iii) we'll have to add a yank to every new QEMUFile created during the
> >> > >> incoming migration (colo, rdma, etc), otherwise the incoming side
> >> > >> will be left using iocs while the src uses the QEMUFile;
> >> > >
> >> > > For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported()
> >> > > will be a noop for it for either reg/unreg.
> >> > >
> >> > > Currently it seems we will also unreg the ioc even for RDMA (even
> >> > > though we
> >> > > don't reg for it). But since unreg will be a noop it seems all fine
> >> > > even
> >> > > if not paired.. maybe we should still try to pair it, e.g. register
> >> > > also in
> >> > > rdma_start_outgoing_migration() for the rdma ioc so at least they're
> >> > > paired.
> >> > >
> >> > > I don't see why COLO is special here, though. Maybe I missed
> >> > > something.
> >> >
> >> > For colo I was thinking we'd have to register the yank just to be sure
> >> > that all paths unregistering it have something to unregister.
> >> >
> >> > Maybe I should move the register into qemu_file_new_impl() with a
> >> > matching unregister at qemu_fclose().
> >>
> >> Sounds good. Or...
> >>
> >> >
> >> > >>
> >> > >> iv) this is a functional change of the yank feature for which we have
> >> > >> no
> >> > >> tests.
> >> > >
> >> > > Having yank tested should be preferrable. Lukas is in the loop, let's
> >> > > see
> >> > > whether he has something. We can still smoke test it before a selftest
> >> > > being there.
> >> > >
> >
> > Hi All,
> > Sorry for the late reply.
> >
> > Yes, testing missing. I'll work on it.
> >
> >> > > Taking one step back.. I doubt whether anyone is using yank for
> >> > > migration?
> >> > > Knowing that migration already have migrate-cancel (for precopy) and
> >> > > migrate-pause (for postcopy).
> >> >
> >> > Right, both already call qio_channel_shutdown().
> >> >
> >> > > I never used it myself, and I don't think
> >> > > it's supported for RHEL. How's that in suse's case?
> >> >
> >> > Never heard mention of it and I don't see it in our virtualization
> >> > documentation.
> >> >
> >> > >
> >> > > If no one is using it, maybe we can even avoid registering migration to
> >> > > yank?
> >> > >
> >> >
> >> > Seems reasonable to me.
> >>
> >> ... let's wait for a few days from Lukas to see whether he as any more
> >> input, or I'd vote for dropping yank for migration as a whole. It caused
> >> mostly more crashes that I knew than benefits, so far..
> >>
> >> I also checked libvirt is not using yank.
> >>
> >
> > The main user for yank is COLO. It can't be replaced by 'migrate_pause'
> > or 'migrate_cancel', because:
> >
> > 1) It needs to work while the main lock is taken by the migration
> > thread, so it needs to be an OOB qmp command. There are places
> > where the migration thread can hang on a socket while the main lock
> > is taken. 'migrate_pause' is OOB, but not usable in the COLO case (it
> > doesn't support postcopy).
> >
> > 2) In COLO, it needs to work both on outgoing and on incoming side, since
> > both sides have a completely healthy and ready to takeover guest state.
> >
> > I agree that the migration yank code was not well thought out :(.
>
> I'd say the QIOChannel being referenced via multiple QEMUFiles throws a
> curve ball to the yank design.
>
> > I had the idea back then to create child class of the IOCs, e.g.
> > YankableQIOChannelSocket and YankableQIOChannelTLS. It's not
> > perfect, but then the lifetime of the yank functions is directly
> > coupled with the iochannel. Then the IOCs can be used just as usual in
> > the rest of the migration code.
>
> The yank really wants to be tied to the channel. We should do that.
>
> I'm just thinking whether a feature bit + setter would be simpler to
> implement. It wouldn't require changing any of the object creation code,
> just add a qio_channel_enable_yank() at the start of migration and let
> the channel take care of the rest.
I think Daniel was against adding external dependencies to QIO
(dependency on yank in this case). But now that I'm thinking about it:
@Daniel How about qio_channel_add_destroy_cb() or similar?
>
> > Another problem area was to be that there was no clear point in
> > migration code where all channels are closed to unregister the yank
> > instance itself. That seems to be solved now?
>
> I'm inclined to add reference counting all over instead of trying to
> squint at the code and figure out where these cleanups should
> go. Specially since we have these pause/recovery scenarios.
>
>
> That said, I haven't looked closely at the instance unregister, but I
> don't think this series changes anything that would help in that regard.
pgpG6crJZofkA.pgp
Description: OpenPGP digital signature
- Re: [PATCH v6 08/10] migration/yank: Use channel features, (continued)
- [PATCH v6 04/10] migration: Remove redundant cleanup of postcopy_qemufile_src, Fabiano Rosas, 2023/09/11
- [PATCH v6 09/10] migration/yank: Keep track of registered yank instances, Fabiano Rosas, 2023/09/11
- Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances, Peter Xu, 2023/09/13
- Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances, Fabiano Rosas, 2023/09/13
- Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances, Peter Xu, 2023/09/13
- Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances, Fabiano Rosas, 2023/09/14
- Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances, Peter Xu, 2023/09/14
- Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances, Lukas Straub, 2023/09/25
- Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances, Fabiano Rosas, 2023/09/25
- Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances,
Lukas Straub <=
[PATCH v6 10/10] migration: Add a wrapper to cleanup migration files, Fabiano Rosas, 2023/09/11