qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block devi


From: Eric Blake
Subject: Re: [PATCH] block: file-posix: Fail unmap with NO_FALLBACK on block device
Date: Tue, 16 Jun 2020 15:09:31 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0

On 6/16/20 10:32 AM, Kevin Wolf wrote:
Am 15.06.2020 um 21:32 hat Nir Soffer geschrieben:
We can zero 2.3 g/s:

# time blkdiscard -z test-lv

real 0m43.902s
user 0m0.002s
sys 0m0.130s

We can write 445m/s:

# dd if=/dev/zero bs=2M count=51200 of=test-lv oflag=direct conv=fsync
107374182400 bytes (107 GB, 100 GiB) copied, 241.257 s, 445 MB/s

So using FALLOC_FL_PUNCH_HOLE _is_ faster after all. What might not be
faster is zeroing out the whole device and then overwriting a
considerable part of it again.

Yeah, there can indeed be a difference between a pre-zeroing which can be super-fast (on a posix file, truncate to 0 and back to the desired size, for example), and where it is faster than writes but still slower than a single pass.


I think this means that we shouldn't fail write_zeroes at the file-posix
level even if BDRV_REQ_NO_FALLBACK is given. Instead, qemu-img convert
is where I see a fix.

Is the kernel able to tell us reliably when we can perform a fast pre-zero pass? If it can't, it's that much harder to define when BDRV_REQ_NO_FALLBACK makes a difference.


Certainly qemu-img could be cleverer and zero out more selectively. The
idea of doing a blk_make_zero() first seems to have caused some
problems, though of course its introduction was also justified with
performance, so improving one case might hurt another if we're not
careful.

However, when Peter Lieven introduced this (commit 5a37b60a61c), we
didn't use write_zeroes yet during the regular copy loop (we do since
commit 690c7301600). So chances are that blk_make_zero() doesn't
actually help any more now.

Can you run another test with the patch below? I think it should perform
the same as yours. Eric, Peter, do you think this would have a negative
effect for NBD and/or iscsi?

I'm still hoping to revive my work on making bdrv_make_zero a per-driver callback with smarts for the fastest possible pre-zeroing that driver is capable of, or fast failure when BDRV_REQ_NO_FALLBACK is set and it is no faster to pre-zero than it is to just write zeroes when needed. I can certainly construct NBD scenarios in either direction (where a pre-zeroing pass is faster because of less network traffic, or where a pre-zeroing pass is slower because of increased I/O - in fact, that was part of my KVM Forum 2019 demo on why the NBD protocol added a FAST_ZERO flag mirroring the idea of qemu's BDRV_REQ_NO_FALLBACK).


The other option would be providing an option and making it Someone
Else's Problem.

Matching what we recently did with --target-is-zero.


Kevin


diff --git a/qemu-img.c b/qemu-img.c
index d7e846e607..bdb9f6aa46 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2084,15 +2084,6 @@ static int convert_do_copy(ImgConvertState *s)
          s->has_zero_init = bdrv_has_zero_init(blk_bs(s->target));
      }

-    if (!s->has_zero_init && !s->target_has_backing &&
-        bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)))
-    {
-        ret = blk_make_zero(s->target, BDRV_REQ_MAY_UNMAP | 
BDRV_REQ_NO_FALLBACK);
-        if (ret == 0) {
-            s->has_zero_init = true;
-        }
-    }
-
      /* Allocate buffer for copied data. For compressed images, only one 
cluster
       * can be copied at a time. */
      if (s->compressed) {


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]