[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate
From: |
Nir Soffer |
Subject: |
[PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate |
Date: |
Mon, 31 Aug 2020 17:01:27 +0300 |
If fallocate() is not supported, posix_fallocate() falls back to
inefficient allocation, writing one byte for every 4k bytes[1]. This is
very slow compared with writing zeros. In oVirt we measured ~400%
improvement in allocation time when replacing posix_fallocate() with
manually writing zeroes[2].
We also know that posix_fallocated() does not work well when using OFD
locks[3]. We don't know the reason yet for this issue yet.
Change preallocate_falloc() to use fallocate() instead of
posix_falloate(), and fall back to full preallocation if not supported.
Here are quick test results with this change.
Before (qemu-img-5.1.0-2.fc32.x86_64):
$ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc
real 0m42.100s
user 0m0.602s
sys 0m4.137s
NFS stats:
calls retrans authrefrsh write
1571583 0 1572205 1571321
After:
$ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc
real 0m15.551s
user 0m0.070s
sys 0m2.623s
NFS stats:
calls retrans authrefrsh write
24620 0 24624 24567
[1]
https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96
[2] https://bugzilla.redhat.com/1850267#c25
[3] https://bugzilla.redhat.com/1851097
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
---
block/file-posix.c | 32 +++++++++-----------------
docs/system/qemu-block-drivers.rst.inc | 11 +++++----
docs/tools/qemu-img.rst | 11 +++++----
qapi/block-core.json | 4 ++--
4 files changed, 25 insertions(+), 33 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 341ffb1cb4..eac3c0b412 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1835,36 +1835,24 @@ static int allocate_first_block(int fd, size_t max_size)
static int preallocate_falloc(int fd, int64_t current_length, int64_t offset,
Error **errp)
{
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
int result;
if (offset == current_length)
return 0;
- /*
- * Truncating before posix_fallocate() makes it about twice slower on
- * file systems that do not support fallocate(), trying to check if a
- * block is allocated before allocating it, so don't do that here.
- */
-
- result = -posix_fallocate(fd, current_length,
- offset - current_length);
+ result = do_fallocate(fd, 0, current_length, offset - current_length);
if (result != 0) {
- /* posix_fallocate() doesn't set errno. */
- error_setg_errno(errp, -result,
- "Could not preallocate new data");
+ error_setg_errno(errp, -result, "Could not preallocate new data");
return result;
}
if (current_length == 0) {
/*
- * posix_fallocate() uses fallocate() if the filesystem supports
- * it, or fallback to manually writing zeroes. If fallocate()
- * was used, unaligned reads from the fallocated area in
- * raw_probe_alignment() will succeed, hence we need to allocate
- * the first block.
+ * Unaligned reads from the fallocated area in raw_probe_alignment()
+ * will succeed, hence we need to allocate the first block.
*
- * Optimize future alignment probing; ignore failures.
+ * Optimizes future alignment probing; ignore failures.
*/
allocate_first_block(fd, offset);
}
@@ -1973,10 +1961,12 @@ static int handle_aiocb_truncate(void *opaque)
}
switch (prealloc) {
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
case PREALLOC_MODE_FALLOC:
result = preallocate_falloc(fd, current_length, offset, errp);
- goto out;
+ if (result != -ENOTSUP)
+ goto out;
+ /* If fallocate() is not supported, fallback to full preallocation. */
#endif
case PREALLOC_MODE_FULL:
result = preallocate_full(fd, current_length, offset, errp);
@@ -3080,7 +3070,7 @@ static QemuOptsList raw_create_opts = {
.name = BLOCK_OPT_PREALLOC,
.type = QEMU_OPT_STRING,
.help = "Preallocation mode (allowed values: off"
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
", falloc"
#endif
", full)"
diff --git a/docs/system/qemu-block-drivers.rst.inc
b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..8e4acf397e 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -25,11 +25,12 @@ This section describes each format and the options that are
supported for it.
.. program:: raw
.. option:: preallocation
- Preallocation mode (allowed values: ``off``, ``falloc``,
- ``full``). ``falloc`` mode preallocates space for image by
- calling ``posix_fallocate()``. ``full`` mode preallocates space
- for image by writing data to underlying storage. This data may or
- may not be zero, depending on the storage location.
+ Preallocation mode (allowed values: ``off``, ``falloc``, ``full``).
+ ``falloc`` mode preallocates space for image by calling
+ ``fallocate()``, and falling back to ``full` mode if not supported.
+ ``full`` mode preallocates space for image by writing data to
+ underlying storage. This data may or may not be zero, depending on
+ the storage location.
.. program:: image-formats
.. option:: qcow2
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index c35bd64822..a2089bd1b7 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -750,11 +750,12 @@ Supported image file formats:
Supported options:
``preallocation``
- Preallocation mode (allowed values: ``off``, ``falloc``,
- ``full``). ``falloc`` mode preallocates space for image by
- calling ``posix_fallocate()``. ``full`` mode preallocates space
- for image by writing data to underlying storage. This data may or
- may not be zero, depending on the storage location.
+ Preallocation mode (allowed values: ``off``, ``falloc``, ``full``).
+ ``falloc`` mode preallocates space for image by calling
+ ``fallocate()``, and falling back to ``full` mode if not supported.
+ ``full`` mode preallocates space for image by writing data to
+ underlying storage. This data may or may not be zero, depending on
+ the storage location.
``qcow2``
diff --git a/qapi/block-core.json b/qapi/block-core.json
index db08c58d78..681d79ec63 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5021,8 +5021,8 @@
#
# @off: no preallocation
# @metadata: preallocate only for metadata
-# @falloc: like @full preallocation but allocate disk space by
-# posix_fallocate() rather than writing data.
+# @falloc: try to allocate disk space by fallocate(), and fallback to
+# @full preallocation if not supported.
# @full: preallocate all data by writing it to the device to ensure
# disk space is really available. This data may or may not be
# zero, depending on the image format and storage.
--
2.26.2