On 09.10.20 14:55, Jakob Bohm wrote:
On 2020-10-09 10:48, Max Reitz wrote:
[...]
The error I got was specifically "Failed to lock byte 100" and VM not
starting. The ISO file was on a R/W NFS3 share, but was itself R/O for
the user that root was mapped to by linux-nfs-server via /etc/exports
options, specifically the file iso file was mode 0444 in a 0755
directory, and the exports line was (simplified)
/share1
xxxx:xxxx:xxxx:xxxx/64(ro,sync,mp,subtree_check,anonuid=1000,anongid=1000)
where xxxx:xxxx:xxxx:xxxx/64 is the numeric IPv6 prefix of the LAN
NFS kernel Server ran Debian Stretch kernel 4.19.0-0.bpo.8-amd64 #1 SMP
Debian 4.19.98-1~bpo9+1 (2020-03-09) x86_64 GNU/Linux
NFS client mount options were:
rw,nosuid,nodev,noatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,
soft,proto=tcp6,timeo=600,retrans=6,sec=sys,mountaddr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx,
mountvers=3,mountport=45327,mountproto=udp6,local_lock=none,addr=xxxx:xxxx:xxxx:xxxx:xxxx:xxff:fexx:xxxx
NFS client ran Debian Buster kernel 4.19.0-0.bpo.6-amd64 #1 SMP Debian
4.19.67-2+deb10u2~bpo9+1 (2019-11-12) x86_64 with Debian qemu-system-
x86 version 1:5.0-14~bpo10+1 Booting used SysV init and libvirt
was not used.
Copying the ISO to a local drive (where qemu-as-root had full
capabilities to bypass file security) worked around the failure.
I hope these details help reproduce the bug.
I’ll try again, thanks.
Can you perchance reproduce the bug with a more recent upstream kernel
(e.g. 5.8)? I seem to recall there have been some locking bugs in the
NFS code, perhaps there was something that was fixed by now.
(Or at least 4.19.150, which seems to be the most recent 4.19.x
according to kernel.org)
And I still have no idea why qemu tried to lock bytes in a read-only raw
image file, there is no block metadata to synchronize access to (like in
qcow2), when the option explicitly said ",format=raw" to avoid attempts
to access the iso file as any of the advanced virtual disk formats.
I reasoned about that in my previous reply already, see below. It’s
because just because an image file is read-only when opening it doesn’t
mean that it’s going to stay that way.
You’re correct that in the case of raw, this isn’t about metadata (as
there isn’t any), but about guest data, which needs to be protected from
concurrent access all the same, though.
(As for “why does qemu try to lock, when the option explicitly said
raw”; there is a dedicated option to turn off locking, and that is
file.locking=off. I’m not suggesting that as a realistic work-around,
I’m just adding that FYI in case you didn’t know and need something ASAP.)
[...]
The error message itself seams meaningless, as there is no particular
reason to request file locks on a read-only raw disk image.
Yes, there is. We must prevent a concurrent instance from writing to
the image[1], and so we have to signal that somehow, which we do through
file locks.
I suppose it can be argued that if the image file itself is read-only
(outside of qemu), there is no need for locks, because nothing could
ever modify the image anyway. But wouldn’t it be possible to change the
modifications after qemu has opened the image, or to remount some RO
filesystem R/W?
Perhaps we could automatically switch off file locks for a given image
file when taking the first one fails, and the image is read-only. But
first I’d rather know what exactly is causing the error you see to
appear.
[1] Technically, byte 100 is about being able to read valid data from
the image, which is a constraint that’s only very rarely broken. But
still, it’s a constraint that must be signaled. (You only see the
failure on this byte, because the later bytes (like the one not
preventing concurrent R/W access, 201) are not even attempted to be
locked after the first lock fails.)
(As for other instances writing to the image, you can allow that by
setting the share-rw=on option on the guest device. This tells qemu
that the guest will accept modifications from the outside. But that
still won’t prevent qemu from having to take a shared lock on byte 100.)
Max