qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
Date: Tue, 13 Aug 2019 13:51:15 +0200
User-agent: Mutt/1.11.3 (2019-02-01)

Am 13.08.2019 um 13:14 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 13.08.2019 12:33, Vladimir Sementsov-Ogievskiy wrote:
> > 13.08.2019 12:01, Vladimir Sementsov-Ogievskiy wrote:
> >> 13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
> >>> 12.08.2019 22:50, Max Reitz wrote:
> >>>> On 12.08.19 21:46, Max Reitz wrote:
> >>>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
> >>>>>> Hi all!
> >>>>>>
> >>>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
> >>>>>> broken. It should be either fixed like I propose (by Max's suggestion)
> >>>>>> or somehow forbidden (just forbid backing-file supporting node to be
> >>>>>> file child of raw-format node).
> >>>>>
> >>>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
> >>>>>
> >>>>> (And not even them, they should just be handled transparently by
> >>>>> bdrv_co_block_status().  But that’s something for later.)
> >>>>>
> >>>>>> Vladimir Sementsov-Ogievskiy (2):
> >>>>>>    block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
> >>>>>>    iotests: test mirroring qcow2 under raw format
> >>>>>>
> >>>>>>   block/raw-format.c         |  2 +-
> >>>>>>   tests/qemu-iotests/263     | 46 
> >>>>>> ++++++++++++++++++++++++++++++++++++++
> >>>>>>   tests/qemu-iotests/263.out | 12 ++++++++++
> >>>>>>   tests/qemu-iotests/group   |  1 +
> >>>>>>   4 files changed, 60 insertions(+), 1 deletion(-)
> >>>>>>   create mode 100755 tests/qemu-iotests/263
> >>>>>>   create mode 100644 tests/qemu-iotests/263.out
> >>>>>
> >>>>> Thanks, applied to my block-next branch:
> >>>>>
> >>>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
> >>>>
> >>>> Oops, maybe not.  221 needs to be adjusted.
> >>>>
> >>>
> >>>
> >>> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, 
> >>> it
> >>> don't look good.
> >>>
> >>> So, it's not quite right to report DATA | RECURSE, we actually should 
> >>> report
> >>> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as 
> >>> otherwise
> >>> DATA will be set in final result (generic layer must not drop it, 
> >>> obviously).
> >>>
> >>> ALLOCATED never returned by drivers but seems it should be. I'll think a 
> >>> bit and
> >>> resend something new.
> >>>
> >>>
> >>
> >>
> >> Hmmm.. So, we have raw node, and assume backing chain under it. who should 
> >> loop through it,
> >> generic code or raw driver?
> >>
> >> Now it all looks like generic code is responsible for looping through 
> >> filtered chain (backing files
> >> and filters) and driver is responsible for all it's children except for 
> >> filtered child.
> >>
> >> Or, driver may return something that says to generic child to handle the 
> >> whole backing chain of returned
> >> file at once, as it's another backing chain. And seems even RECURSE don't 
> >> work correctly as it doesn't handle
> >> the backing chain in this recursion. Why it works better than RAW - just 
> >> because we return it together
> >> with DATA flags and this DATA flag is kept anyway, independently of 
> >> finding zeros or not.
> >>
> >>
> > 
> > 
> > Hmm, so, is it correct that we return DATA | RECURSE, if we are not really 
> > sure that it is data?
> > 
> > If we see at
> > 
> >   * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> > 
> > seems like we should report DATA only if there is allocation..
> > 
> >   * DATA ZERO OFFSET_VALID
> >   *  t    t        t       sectors read as zero, returned file is zero at 
> > offset
> >   *  t    f        t       sectors read as valid from file at offset
> >   *  f    t        t       sectors preallocated, read as zero, returned 
> > file not
> > 
> > so, ZERO alone doesn't guarantee that we may safely read?
> > 
> > So, for qcow2 metadata-preallocated images, what about zero-init? We report 
> > DATA, and probably get ZERO from
> > file and have finally DAYA | ZERO which guarantees that read will return 
> > zeros, but is it true?
> > 
> > Finally, what "DATA" mean? That space is allocated and occupies disk space? 
> > Or it only  means only ALLOCATED i.e.
> > "read from this layer, not from backing" otherwise, and adds additional 
> > meaning to ZERO when used together, that
> > read will return zeros for sure?

I think DATA means that the data for this block is provided by *file. I
wouldn't necessarily understand it to mean that the data actually takes
up physical disk space there.

> Continue self-discussion.
> 
> Consider closer the following case:
>  >   * DATA ZERO OFFSET_VALID
>  >   *  f    t        t       sectors preallocated, read as zero, returned 
> file not
> 
> It actually means that we must not read, as read will return wrong
> data, when clusters are actually zero for guest.

It means that you need to read from bs itself to get the correct data
(which will be zero). Even though OFFSET_VALID is set, reading from
*file (typically bs->file->bs) at the returned offset might not give the
right result.

> It's OK, when for ex. qcow2 returns this combination and link to its
> file child: it means that if you read from qcow2 node, you'll see
> correct zeros, as qcow2 has special metadata which shows that these
> clusters are zero. But if you read from file directly at returned
> offset you'll see garbage, so don't do it.

Correct.

> But what if some node returns this combination with file == itself? It
> actually means that you must not read, but you should call
> block-status to understand that there are zeros. So, if some format
> can return this combination with file == itself it means that you must
> not read it directly, but only after checking block status.

This doesn't make sense to me. Reading from a node is always correct.

But you're right that DATA seems to mean something slightly different at
the protocol level because *file cannot have a meaningful value for the
lower layer there. In this case, DATA still seems to mean that the data
is fetched from the lower layer (i.e. the block device on which the file
system resides). For holes, this is not the case.

> And file-posix is example of such driver. But file-posix holes are guaranteed 
> to read as zero, so we can report DATA | ZERO.
> But this will break user expirience which assumes that DATA means occupation 
> of real disk space.

With the above explanation, DATA shouldn't be set for holes.

But it's still kind of inconsistent because OFFSET_VALID and the offset
refer to bs itself and not to the lower layer.

> ...
> me go and re-read what we've documented in NBD protocol about block steus...
> 
> "DATA" turns into NBD_STATE_HOLE, which formally means nothing, and just 
> notes that probably there is no disk space occupation
> for this region.. So it's about disk space allocation and nothing about 
> correctness of read.
> and NBD_STATE_ZERO guarantees that region read as all zeroes.
> 
> Look at code in nbd/server.c.. Aha, it calls block_status_above and turns 
> !ALLOCATED into HOLE. Which means that it will never
> return HOLE for file-posix..

Hm... This is a mess. :-)

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]