qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unre


From: Greg Kurz
Subject: Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim
Date: Tue, 27 Sep 2022 21:47:02 +0200

On Tue, 27 Sep 2022 19:14:33 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Dienstag, 27. September 2022 15:05:13 CEST Linus Heckemann wrote:
> > Christian Schoenebeck <qemu_oss@crudebyte.com> writes:
> > > Ah, you sent this fix as a separate patch on top. I actually just meant
> > > that you would take my already queued patch as the latest version (just
> > > because I had made some minor changes on my end) and adjust that patch
> > > further as v4.
> > > 
> > > Anyway, there are still some things to do here, so maybe you can send your
> > > patch squashed in the next round ...
> > 
> > I see, will do!
> > 
> > >> @Christian: I still haven't been able to reproduce the issue that this
> > >> commit is supposed to fix (I tried building KDE too, no problems), so
> > >> it's a bit of a shot in the dark. It certainly still runs and I think it
> > >> should fix the issue, but it would be great if you could test it.
> > > 
> > > No worries about reproduction, I will definitely test this thoroughly. ;-)
> > > 
> > >>  hw/9pfs/9p.c | 46 ++++++++++++++++++++++++++++++----------------
> > >>  1 file changed, 30 insertions(+), 16 deletions(-)
> > >> 
> > >> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > >> index f4c1e37202..825c39e122 100644
> > >> --- a/hw/9pfs/9p.c
> > >> +++ b/hw/9pfs/9p.c
> > >> @@ -522,33 +522,47 @@ static int coroutine_fn
> > >> v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path) V9fsFidState
> > >> *fidp;
> > >> 
> > >>      gpointer fid;
> > >>      GHashTableIter iter;
> > >> 
> > >> +    /*
> > >> +     * The most common case is probably that we have exactly one
> > >> +     * fid for the given path, so preallocate exactly one.
> > >> +     */
> > >> +    GArray *to_reopen = g_array_sized_new(FALSE, FALSE,
> > >> sizeof(V9fsFidState*), 1); +    gint i;
> > > 
> > > Please use `g_autoptr(GArray)` instead of `GArray *`, that avoids the need
> > > for explicit calls to g_array_free() below.
> > 
> > Good call. I'm not familiar with glib, so I didn't know about this :)
> > 
> > >> -            fidp->flags |= FID_NON_RECLAIMABLE;
> > > 
> > > Why did you remove that? It should still be marked as FID_NON_RECLAIMABLE,
> > > no?
> > Indeed, that was an accident.
> > 
> > >> +            /*
> > >> +             * Ensure the fid survives a potential clunk request during
> > >> +             * v9fs_reopen_fid or put_fid.
> > >> +             */
> > >> +            fidp->ref++;
> > > 
> > > Hmm, bumping the refcount here makes sense, as the 2nd loop may be
> > > interrupted and the fid otherwise disappear in between, but ...
> > > 
> > >> +            g_array_append_val(to_reopen, fidp);
> > >> 
> > >>          }
> > >> 
> > >> +    }
> > >> 
> > >> -        /* We're done with this fid */
> > >> +    for (i=0; i < to_reopen->len; i++) {
> > >> +        fidp = g_array_index(to_reopen, V9fsFidState*, i);
> > >> +        /* reopen the file/dir if already closed */
> > >> +        err = v9fs_reopen_fid(pdu, fidp);
> > >> +        if (err < 0) {
> > >> +            put_fid(pdu, fidp);
> > >> +            g_array_free(to_reopen, TRUE);
> > >> +            return err;
> > > 
> > > ... this return would then leak all remainder fids that you have bumped
> > > the
> > > refcount for above already.
> > 
> > You're right. I think the best way around it, though it feels ugly, is
> > to add a third loop in an "out:".
> 
> Either that, or continuing the loop to the end. Not that this would become 
> much prettier. I must admit I also don't really have a good idea for a clean 
> solution in this case.
> 
> > > Also: I noticed that your changes in virtfs_reset() would need the same
> > > 2-loop hack to avoid hash iterator invalidation, as it would also call
> > > put_fid() inside the loop and be prone for hash iterator invalidation
> > > otherwise.
> > Good point. Will do.
> > 
> > One more thing has occurred to me. I think the reclaiming/reopening
> > logic will misbehave in the following sequence of events:
> > 
> > 1. QEMU reclaims an open fid, losing the file handle
> > 2. The file referred to by the fid is replaced with a different file
> >    (e.g. via rename or symlink) outside QEMU
> > 3. The file is accessed again by the guest, causing QEMU to reopen a
> >    _different file_ from before without the guest having performed any
> >    operations that should cause this to happen.
> > 
> > This is neither introduced nor resolved by my changes. Am I overlooking
> > something that avoids this (be it documentation that directories exposed
> > via 9p should not be touched by the host), or is this a real issue? I'm
> > thinking one could at least detect it by saving inode numbers in
> > V9fsFidState and comparing them when reopening, but recovering from such
> > a situation seems difficult.
> 
> Well, in that specific scenario when rename/move happens outside of QEMU then 
> yes, this might happen unfortunately. The point of this "reclaim fid" stuff 
> is 
> to deal with the fact that there is an upper limit on systems for the max. 
> amount of open file descriptors a process might hold at a time. And on some 
> systems like macOS I think that limit is quite low by default (like 100?).
> 
> There is also another issue pending that affects pure inner-guest behaviour; 
> the infamous use-after-unlink() use pattern:
> https://wiki.qemu.org/Documentation/9p#Implementation_Plans
> https://gitlab.com/qemu-project/qemu/-/issues/103
> 
> It would make sense to look how other file servers deal with the max. amount 
> of file descriptors limit before starting to just fight the symptoms. This 
> whole reclaim fid stuff in general is PITA.
> 

Yes this reclaim code is just a best effort tentative to not
starve file descriptors. But since its implementation is path
based, it gets the per-design limitation that nothing should
modify the backing fs outside of the current 9p session.
Note: just like the use-after-unlink() infamous pattern (I love
the wording), you can get this with a "pure inner-guest behaviour"
using two devices with overlapping backends (shoot in the foot
setup) :-)

Recovering from lost state is impossible but the server should
at least try to detect that and return EIO to the client, pretty
much like any storage device is expected to do if possible.

Cheers,

--
Greg

> Best regards,
> Christian Schoenebeck
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]