qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argume


From: Greg Kurz
Subject: Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
Date: Mon, 3 Jun 2019 08:57:15 +0200

On Wed, 22 May 2019 18:03:13 +0200
Christian Schoenebeck <address@hidden> wrote:

> On Montag, 20. Mai 2019 16:05:09 CEST Greg Kurz wrote:
> > Hi Christian,  
> 
> Hi Greg,
> 
> > On the other hand, I'm afraid that having a functional solution won't
> > motivate people to come up with a new spec... Anyway, I agree that the
> > data corruption/loss issues must be prevented, ie, the 9p server should
> > at least return an error to the client rather than returning a colliding
> > QID.   
> 
> Ok, I will extend Antonios' patch to log that error on host. I thought about 
> limiting that error message to once per session (for not flooding the logs), 
> but it is probably not worth it, so if you don't veto then I will just log 
> that error simply on every file access.
> 

Please use error_report_once().

> > A simple way to avoid that is to enforce a single device, ie. patch
> > 2 in Antonios's series. Of course this may break some setups where
> > multiple devices are involved, but it is pure luck if they aren't already
> > broken with the current code base.   
> 
> Yes, the worst thing you can have is this collision silently being ignored, 
> like it is actually right now. Because you end up getting all sorts of 
> different misbehaviours on guests, and these are just symptoms that take a 
> while to debug and realise that the actual root cause is an inode collision. 
> So enforcing a single device is still better than undefined behaviour.
> 
> > > Background: The concept of FS "data sets" combines the benefits of
> > > classical partitions (e.g. logical file space separation, independent fs
> > > configurations like compression on/off/algorithm, data deduplication
> > > on/off, snapshot isolation, snapshots on/off) without the disadvantages
> > > of classical real partitions (physical space is dynamically shared, no
> > > space wasted on fixed boundaries; physical device pool management is
> > > transparent for all data sets, configuration options can be inherited
> > > from parent data sets).  
> > 
> > Ok. I must admit my ignorance around ZFS and "data sets"... so IIUC, even
> > with a single underlying physical device you might end up with lstat()
> > returning different st_dev values on the associated files, correct ?
> > 
> > I take your word for the likeliness of the issue to popup in such setups. 
> > :)  
> 
> Yes, that is correct, you _always_ get a different stat::st_dev value for 
> each 
> ZFS data set. Furthermore, each ZFS data set has its own inode number 
> sequence 
> generator starting from one. So consider you create two new ZFS data sets, 
> then you create one file on each data set, then both files will have inode 
> number 1.
> 
> That probably makes it clear why you hit this ID collision bug very easily 
> when using the combination ZFS & 9p.
> 
> > > also a big difference giving the user the _optional_ possibility to define
> > > e.g. one path (not device) on guest said to be sensitive regarding high
> > > inode numbers on guest; and something completely different telling the
> > > user that he _must_ configure every single device from host that is ever
> > > supposed to pop up with 9p on guest and forcing the user to update that
> > > configuration whenever a new device is added or removed on host. The
> > > "vii" configuration feature does not require any knowledge of how many
> > > and which kinds of devices are actually ever used on host (nor on any
> > > higher level host in case of nested
> > > virtualization), nor does that "vii" config require any changes ever when
> > > host device setup changes. So 9p's core transparency feature would not be
> > > touched at all.  
> > 
> > I guess this all boils down to I finding some time to read/understand more
> > :)  
> 
> Yes, that helps sometimes. :)
> 
> > As usual, a series with smaller and simpler patches will be easier to
> > review, and more likely to be merged.  
> 
> Of course.
> 
> In the next patch series I will completely drop a) the entire QID persistency 
> feature code and b) that disputed "vii" feature. But I will still suggest the 
> variable inode suffix length patch as last patch in that new patch series.
> 
> That should make the amount of changes manageable small.
> 
> > > Let me make a different suggestion: how about putting these fixes into a
> > > separate C unit for now and making the default behaviour (if you really
> > > want) to not use any of that code by default at all. So the user would
> > > just get an error message in the qemu log files by default if he tries to
> > > export several devices with one 9p device, suggesting him either to map
> > > them as separate 9p devices (like you suggested) and informing the user
> > > about the alternative of enabling support for the automatic inode
> > > remapping code (from that separate C unit) instead by adding one
> > > convenient config option if he/she really wants.  
> > It seems that we may be reaching some consensus here :)
> > 
> > I like the approach, provided this is configurable from the command line,
> > off by default and doesn't duplicate core 9p code. Not sure this needs to
> > sit in its own C unit though.  
> 
> Any preference for a command line argument name and/or libvirt XML config tag/
> attribute for switching the inode remapping code on?
> 
> About that separate C unit: I leave that up to you to decide, it does not 
> matter to me. I just suggested it since you consider these patches as 
> temporary workaround until there are appropriate protocol changes. So clear 
> code separation for them might help to get rid of the entire code later on. 
> Plus for distribution maintainers it might be easiert to cherry pick them as 
> backports.
> 
> However since I will drop the persistency and "vii" feature in the next patch 
> series, it probably does not make a huge difference anyway. As you prefer.
> 
> > The 9p code has a long history of CVEs and limitations that prevented it
> > to reach full production grade quality. Combined with the poor quality of
> > the code, this has scared off many experienced QEMU developpers, which
> > prefer to work on finding an alternative solution.   
> 
> And I already wondered about the obvious low activity on this particular qemu 
> feature. I mean I don't find it contemporary still running guests to use 
> their 
> own file system being emulated on a file ontop of yet another file system and 
> loosing essentially all benefits of the host's actual backend file system 
> features.
> 
> > Such alternative is virtio-fs, which is being actively worked on:
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2019-04/msg02746.html
> > 
> > Note: I'm not telling "stay away from 9p" but maybe worth taking a look,
> >       because if virtio-fs gets merged, it is likely to become the official
> >       and better supported way to share files between host and guest.  
> 
> Ah, good to know! That's new to me, thanks!
> 
> Makes sense to me, especially its performance will certainly be much better.
> 
> > Please repost a series, possibly based on some of Antonios's patches that
> > allows to avoid the QID collision, returns an error to the client instead
> > and possibly printing out some useful messages in the QEMU log. Then, on
> > top of that, you can start introducing hashing and variable prefix length.  
> 
> So you want that as its own patch series first, or can I continue with my 
> suggestion to deliver the hash patch and variable suffix length patch as last 
> patches within the same series?
> 

Same series is okay.

> Best regards,
> Christian Schoenebeck

Cheers,

--
Greg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]