qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 5/7] file-posix: Support BDRV_REQ_NO_FALLBACK for


From: Eric Blake
Subject: Re: [Qemu-devel] [PULL 5/7] file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes
Date: Wed, 14 Aug 2019 21:44:28 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0

On 3/26/19 10:51 AM, Kevin Wolf wrote:
> We know that the kernel implements a slow fallback code path for
> BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it.
> The other operations we call in the context of .bdrv_co_pwrite_zeroes
> should usually be quick, so no modification should be needed for them.
> If we ever notice that there are additional problematic cases, we can
> still make these conditional as well.

Are there cases where fallocate(FALLOC_FL_ZERO_RANGE) falls back to slow
writes?  It may be fast on some file systems, but when used on a block
device, that may equally trigger slow fallbacks.  The man page is not
clear on that fact; I suspect that there may be cases in there that need
to be made conditional (it would be awesome if the kernel folks would
give us another FALLOC_ flag when we want to guarantee no fallback).

By the way, is there an easy setup to prove (maybe some qemu-img convert
command on a specially-prepared source image) whether the no fallback
flag makes a difference?  I'm about to cross-post a series of patches to
nbd/qemu/nbdkit/libnbd that adds a new NBD_CMD_FLAG_FAST_ZERO which fits
the bill of BDRV_REQ_NO_FALLBACK, but would like to include some
benchmark numbers in my cover letter if I can reproduce a setup where it
matters.

And this patch has a bug:

> +++ b/block/file-posix.c
> @@ -652,7 +652,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
> *options,
>      }
>  #endif
>  
> -    bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
> +    bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
>      ret = 0;
>  fail:
>      if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) {
> @@ -1500,14 +1500,19 @@ static ssize_t 
> handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
{
    int ret = -ENOTSUP;
    BDRVRawState *s = aiocb->bs->opaque;

    if (!s->has_write_zeroes) {
        return -ENOTSUP;
>      }

At this point, ret is -ENOTSUP.

>  
>  #ifdef BLKZEROOUT
> -    do {
> -        uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
> -        if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
> -            return 0;
> -        }
> -    } while (errno == EINTR);
> +    /* The BLKZEROOUT implementation in the kernel doesn't set
> +     * BLKDEV_ZERO_NOFALLBACK, so we can't call this if we have to avoid slow
> +     * fallbacks. */
> +    if (!(aiocb->aio_type & QEMU_AIO_NO_FALLBACK)) {
> +        do {
> +            uint64_t range[2] = { aiocb->aio_offset, aiocb->aio_nbytes };
> +            if (ioctl(aiocb->aio_fildes, BLKZEROOUT, range) == 0) {
> +                return 0;
> +            }
> +        } while (errno == EINTR);
>  
> -    ret = translate_err(-errno);
> +        ret = translate_err(-errno);
> +    }

If the very first call to this function is with NO_FALLBACK, then this
'if' is skipped,

>  #endif
>  
>      if (ret == -ENOTSUP) {
        s->has_write_zeroes = false;
    }

and we set s->has_write_zeroes to false, permanently disabling any
BLKZEROOUT attempts in future calls, even if the future calls no longer
pass the NO_FALLBACK flag.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]