qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] qcow2: Skip copy-on-write when allocating a zero cluster


From: Kevin Wolf
Subject: Re: [PATCH v2] qcow2: Skip copy-on-write when allocating a zero cluster
Date: Thu, 10 Sep 2020 14:04:39 +0200

Am 09.09.2020 um 21:23 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 27.08.2020 17:53, Alberto Garcia wrote:
> > Since commit c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a when a write
> > request results in a new allocation QEMU first tries to see if the
> > rest of the cluster outside the written area contains only zeroes.
> > 
> > In that case, instead of doing a normal copy-on-write operation and
> > writing explicit zero buffers to disk, the code zeroes the whole
> > cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK.
> > 
> > This improves performance very significantly but it only happens when
> > we are writing to an area that was completely unallocated before. Zero
> > clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and
> > are therefore slower to allocate.
> > 
> > This happens because the code uses bdrv_is_allocated_above() rather
> > bdrv_block_status_above(). The former is not as accurate for this
> > purpose but it is faster. However in the case of qcow2 the underlying
> > call does already report zero clusters just fine so there is no reason
> > why we cannot use that information.
> > 
> > After testing 4KB writes on an image that only contains zero clusters
> > this patch results in almost five times more IOPS.
> > 
> > Signed-off-by: Alberto Garcia <berto@igalia.com>
> > ---
> > v2:
> > - Add new, simpler API: bdrv_is_unallocated_or_zero_above()
> > 
> >   include/block/block.h |  2 ++
> >   block/io.c            | 24 ++++++++++++++++++++++++
> >   block/qcow2.c         | 37 +++++++++++++++++++++----------------
> >   3 files changed, 47 insertions(+), 16 deletions(-)
> > 
> > diff --git a/include/block/block.h b/include/block/block.h
> > index 6e36154061..477a72b2e9 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -496,6 +496,8 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t 
> > offset, int64_t bytes,
> >   int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
> >                               bool include_base, int64_t offset, int64_t 
> > bytes,
> >                               int64_t *pnum);
> > +int bdrv_is_unallocated_or_zero_above(BlockDriverState *bs, int64_t offset,
> > +                                      int64_t bytes);
> >   bool bdrv_is_read_only(BlockDriverState *bs);
> >   int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
> > diff --git a/block/io.c b/block/io.c
> > index ad3a51ed53..94faa3f9d7 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2557,6 +2557,30 @@ int bdrv_block_status(BlockDriverState *bs, int64_t 
> > offset, int64_t bytes,
> >                                      offset, bytes, pnum, map, file);
> >   }
> > +/*
> > + * Check @bs (and its backing chain) to see if the range defined
> > + * by @offset and @bytes is unallocated or known to read as zeroes.
> > + * Return 1 if that is the case, 0 otherwise and -errno on error.
> > + * This test is meant to be fast rather than accurate so returning 0
> > + * does not guarantee non-zero data.
> > + */
> > +int bdrv_is_unallocated_or_zero_above(BlockDriverState *bs, int64_t offset,
> > +                                      int64_t bytes)
> 
> _above prefix for block-status functions usually assume existing of "base"
> parameter, otherwise, it's not clear "above what?"
> 
> Also, actually the caller doesn't care about it it allocated or not. It only 
> wants to know is it read-as-zero or not. So, probably better name is 
> bdrv_is_zero_fast()
> 
> > +{
> > +    int ret;
> > +    int64_t pnum = bytes;
> > +
> > +    ret = bdrv_common_block_status_above(bs, NULL, false, offset,
> > +                                         bytes, &pnum, NULL, NULL);
> > +
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
> > +
> > +    return (pnum == bytes) &&
> > +        ((ret & BDRV_BLOCK_ZERO) || (!(ret & BDRV_BLOCK_ALLOCATED)));
> 
> Note that some protocol drivers returns unallocated status when it doesn't 
> read-as-zero, so in general, we can't use this function as is_zero.
> 
> I think, that better to keep only (pnum == bytes) && (ret & BDRV_BLOCK_ZERO) 
> here

Ah, very good, you already mentioned the main points I had, and more. (I
didn't realise that using BDRV_BLOCK_ALLOCATED is actually wrong, just
that it's more complicated than necessary.)

What I would like to add is that a bdrv_co_is_zero_fast() would be even
better. is_zero_cow() isn't marked as such yet, but semantically it's a
coroutine_fn, so we have no reason to go through the synchronous
wrappers.

> , and to make it work correctly improve bdrv_co_block_status like this:
> 
> diff --git a/block/io.c b/block/io.c
> index ad3a51ed53..33b2e91bcd 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2408,15 +2408,15 @@ static int coroutine_fn 
> bdrv_co_block_status(BlockDriverState *bs,
>      if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>          ret |= BDRV_BLOCK_ALLOCATED;
> -    } else if (want_zero && bs->drv->supports_backing) {
> -        if (bs->backing) {
> +    } else if (bs->drv->supports_backing) {
> +        if (bs->backing && want_zero) {
>              BlockDriverState *bs2 = bs->backing->bs;
>              int64_t size2 = bdrv_getlength(bs2);
>              if (size2 >= 0 && offset >= size2) {
>                  ret |= BDRV_BLOCK_ZERO;
>              }
> -        } else {
> +        } else if (!bs->backing) {
>              ret |= BDRV_BLOCK_ZERO;
>          }
>      }
> 
> - we can always add ZERO flag to backing-supporting formats if range is 
> unallocated and there is no backing file.

This makes sense to me, though it should be a separate patch. This one
wouldn't become incorrect without it, but it would be less effective.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]