[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
From: |
Chun Yan Liu |
Subject: |
Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option |
Date: |
Thu, 26 Jun 2014 21:36:37 -0600 |
Hi, Stefan & Kevin,
Could you help to have a look at this version? We've discussed about
this last November and now switch it to QemuOpts.
Thanks,
Chunyan
>>> On 6/23/2014 at 05:17 PM, in message
<address@hidden>, Chunyan Liu
<address@hidden> wrote:
> Add 'nocow' option so that users could have a chance to set NOCOW flag to
> newly created files. It's useful on btrfs file system to enhance
> performance.
>
> Btrfs has low performance when hosting VM images, even more when the guest
> in those VM are also using btrfs as file system. One way to mitigate this
> bad
> performance is to turn off COW attributes on VM files. Generally, there are
> two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then
> all newly created files will be NOCOW. b) per file. Add the NOCOW file
> attribute. It could only be done to empty or new files.
>
> This patch tries the second way, according to the option, it could add NOCOW
> per file.
>
> For most block drivers, since the create file step is in raw-posix.c, so we
> can do setting NOCOW flag ioctl in raw-posix.c only.
>
> But there are some exceptions, like block/vpc.c and block/vdi.c, they are
> creating file by calling qemu_open directly. For them, do the same setting
> NOCOW flag ioctl work in them separately.
>
> Signed-off-by: Chunyan Liu <address@hidden>
> ---
> Changes to v2:
> * based on QemuOpts instead of old QEMUOptionParameters
> * add nocow description in man page and html doc
>
> Old v2 is here:
> http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html
>
> ---
> block/cow.c | 5 +++++
> block/qcow.c | 5 +++++
> block/qcow2.c | 5 +++++
> block/qed.c | 11 ++++++++---
> block/raw-posix.c | 25 +++++++++++++++++++++++++
> block/vdi.c | 29 +++++++++++++++++++++++++++++
> block/vhdx.c | 5 +++++
> block/vmdk.c | 11 ++++++++---
> block/vpc.c | 29 +++++++++++++++++++++++++++++
> include/block/block_int.h | 1 +
> qemu-doc.texi | 16 ++++++++++++++++
> qemu-img.texi | 16 ++++++++++++++++
> 12 files changed, 152 insertions(+), 6 deletions(-)
>
> diff --git a/block/cow.c b/block/cow.c
> index a05a92c..43b537c 100644
> --- a/block/cow.c
> +++ b/block/cow.c
> @@ -401,6 +401,11 @@ static QemuOptsList cow_create_opts = {
> .type = QEMU_OPT_STRING,
> .help = "File name of a base image"
> },
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> { /* end of list */ }
> }
> };
> diff --git a/block/qcow.c b/block/qcow.c
> index 1f2bac8..5b23540 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -928,6 +928,11 @@ static QemuOptsList qcow_create_opts = {
> .help = "Encrypt the image",
> .def_value_str = "off"
> },
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> { /* end of list */ }
> }
> };
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b9d2fa6..3a4cc8a 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2382,6 +2382,11 @@ static QemuOptsList qcow2_create_opts = {
> .help = "Postpone refcount updates",
> .def_value_str = "off"
> },
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> { /* end of list */ }
> }
> };
> diff --git a/block/qed.c b/block/qed.c
> index 092e6fb..460ac92 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs)
> static int qed_create(const char *filename, uint32_t cluster_size,
> uint64_t image_size, uint32_t table_size,
> const char *backing_file, const char *backing_fmt,
> - Error **errp)
> + QemuOpts *opts, Error **errp)
> {
> QEDHeader header = {
> .magic = QED_MAGIC,
> @@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t
> cluster_size,
> int ret = 0;
> BlockDriverState *bs;
>
> - ret = bdrv_create_file(filename, NULL, &local_err);
> + ret = bdrv_create_file(filename, opts, &local_err);
> if (ret < 0) {
> error_propagate(errp, local_err);
> return ret;
> @@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts
>
> *opts, Error **errp)
> }
>
> ret = qed_create(filename, cluster_size, image_size, table_size,
> - backing_file, backing_fmt, errp);
> + backing_file, backing_fmt, opts, errp);
>
> finish:
> g_free(backing_file);
> @@ -1644,6 +1644,11 @@ static QemuOptsList qed_create_opts = {
> .type = QEMU_OPT_SIZE,
> .help = "L1/L2 table size (in clusters)"
> },
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> { /* end of list */ }
> }
> };
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index dacf4fb..825a0c8 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -55,6 +55,9 @@
> #include <linux/cdrom.h>
> #include <linux/fd.h>
> #include <linux/fs.h>
> +#ifndef FS_NOCOW_FL
> +#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
> +#endif
> #endif
> #ifdef CONFIG_FIEMAP
> #include <linux/fiemap.h>
> @@ -1278,12 +1281,14 @@ static int raw_create(const char *filename, QemuOpts
> *opts, Error **errp)
> int fd;
> int result = 0;
> int64_t total_size = 0;
> + bool nocow = false;
>
> strstart(filename, "file:", &filename);
>
> /* Read out options */
> total_size =
> qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE;
> + nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false);
>
> fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
> 0644);
> @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts
> *opts, Error **errp)
> result = -errno;
> error_setg_errno(errp, -result, "Could not create file");
> } else {
> + if (nocow) {
> +#ifdef __linux__
> + /* Set NOCOW flag to solve performance issue on fs like btrfs.
> + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return
> value
> + * will be ignored since any failure of this operation should
> not
> + * block the left work.
> + */
> + int attr;
> + if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
> + attr |= FS_NOCOW_FL;
> + ioctl(fd, FS_IOC_SETFLAGS, &attr);
> + }
> +#endif
> + }
> +
> if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
> result = -errno;
> error_setg_errno(errp, -result, "Could not resize file");
> @@ -1477,6 +1497,11 @@ static QemuOptsList raw_create_opts = {
> .type = QEMU_OPT_SIZE,
> .help = "Virtual disk size"
> },
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> { /* end of list */ }
> }
> };
> diff --git a/block/vdi.c b/block/vdi.c
> index 01fe22e..197bd77 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -53,6 +53,13 @@
> #include "block/block_int.h"
> #include "qemu/module.h"
> #include "migration/migration.h"
> +#ifdef __linux__
> +#include <linux/fs.h>
> +#include <sys/ioctl.h>
> +#ifndef FS_NOCOW_FL
> +#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
> +#endif
> +#endif
>
> #if defined(CONFIG_UUID)
> #include <uuid/uuid.h>
> @@ -683,6 +690,7 @@ static int vdi_create(const char *filename, QemuOpts
> *opts, Error **errp)
> VdiHeader header;
> size_t i;
> size_t bmap_size;
> + bool nocow = false;
>
> logout("\n");
>
> @@ -699,6 +707,7 @@ static int vdi_create(const char *filename, QemuOpts
> *opts, Error **errp)
> image_type = VDI_TYPE_STATIC;
> }
> #endif
> + nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false);
>
> if (bytes > VDI_DISK_SIZE_MAX) {
> result = -ENOTSUP;
> @@ -716,6 +725,21 @@ static int vdi_create(const char *filename, QemuOpts
> *opts, Error **errp)
> goto exit;
> }
>
> + if (nocow) {
> +#ifdef __linux__
> + /* Set NOCOW flag to solve performance issue on fs like btrfs.
> + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
> will
> + * be ignored since any failure of this operation should not block
> the
> + * left work.
> + */
> + int attr;
> + if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
> + attr |= FS_NOCOW_FL;
> + ioctl(fd, FS_IOC_SETFLAGS, &attr);
> + }
> +#endif
> + }
> +
> /* We need enough blocks to store the given disk size,
> so always round up. */
> blocks = (bytes + block_size - 1) / block_size;
> @@ -818,6 +842,11 @@ static QemuOptsList vdi_create_opts = {
> .def_value_str = "off"
> },
> #endif
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> /* TODO: An additional option to set UUID values might be useful.
> */
> { /* end of list */ }
> }
> diff --git a/block/vhdx.c b/block/vhdx.c
> index fedcf9f..7bdb456 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1909,6 +1909,11 @@ static QemuOptsList vhdx_create_opts = {
> .type = QEMU_OPT_BOOL,
> .help = "Force use of payload blocks of type 'ZERO'.
> Non-standard."
> },
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> { NULL }
> }
> };
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 83dd6fe..94e1ff7 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1529,7 +1529,7 @@ static int coroutine_fn
> vmdk_co_write_zeroes(BlockDriverState *bs,
>
> static int vmdk_create_extent(const char *filename, int64_t filesize,
> bool flat, bool compress, bool zeroed_grain,
> - Error **errp)
> + QemuOpts *opts, Error **errp)
> {
> int ret, i;
> BlockDriverState *bs = NULL;
> @@ -1539,7 +1539,7 @@ static int vmdk_create_extent(const char *filename,
> int64_t filesize,
> uint32_t *gd_buf = NULL;
> int gd_buf_size;
>
> - ret = bdrv_create_file(filename, NULL, &local_err);
> + ret = bdrv_create_file(filename, opts, &local_err);
> if (ret < 0) {
> error_propagate(errp, local_err);
> goto exit;
> @@ -1845,7 +1845,7 @@ static int vmdk_create(const char *filename, QemuOpts
> *opts, Error **errp)
> path, desc_filename);
>
> if (vmdk_create_extent(ext_filename, size,
> - flat, compress, zeroed_grain, errp)) {
> + flat, compress, zeroed_grain, opts, errp)) {
> ret = -EINVAL;
> goto exit;
> }
> @@ -2153,6 +2153,11 @@ static QemuOptsList vmdk_create_opts = {
> .help = "Enable efficient zero writes "
> "using the zeroed-grain GTE feature"
> },
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> { /* end of list */ }
> }
> };
> diff --git a/block/vpc.c b/block/vpc.c
> index 798d854..8b376a4 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -29,6 +29,13 @@
> #if defined(CONFIG_UUID)
> #include <uuid/uuid.h>
> #endif
> +#ifdef __linux__
> +#include <linux/fs.h>
> +#include <sys/ioctl.h>
> +#ifndef FS_NOCOW_FL
> +#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
> +#endif
> +#endif
>
> /**************************************************************/
>
> @@ -751,6 +758,7 @@ static int vpc_create(const char *filename, QemuOpts
> *opts, Error **errp)
> int64_t total_size;
> int disk_type;
> int ret = -EIO;
> + bool nocow = false;
>
> /* Read out options */
> total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
> @@ -767,6 +775,7 @@ static int vpc_create(const char *filename, QemuOpts
> *opts, Error **errp)
> } else {
> disk_type = VHD_DYNAMIC;
> }
> + nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false);
>
> /* Create the file */
> fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
> 0644);
> @@ -775,6 +784,21 @@ static int vpc_create(const char *filename, QemuOpts
> *opts, Error **errp)
> goto out;
> }
>
> + if (nocow) {
> +#ifdef __linux__
> + /* Set NOCOW flag to solve performance issue on fs like btrfs.
> + * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
> will
> + * be ignored since any failure of this operation should not block
> the
> + * left work.
> + */
> + int attr;
> + if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
> + attr |= FS_NOCOW_FL;
> + ioctl(fd, FS_IOC_SETFLAGS, &attr);
> + }
> +#endif
> + }
> +
> /*
> * Calculate matching total_size and geometry. Increase the number of
> * sectors requested until we get enough (or fail). This ensures that
> @@ -884,6 +908,11 @@ static QemuOptsList vpc_create_opts = {
> "Type of virtual hard disk format. Supported formats are "
> "{dynamic (default) | fixed} "
> },
> + {
> + .name = BLOCK_OPT_NOCOW,
> + .type = QEMU_OPT_BOOL,
> + .help = "Turn off copy-on-write (valid only on btrfs)"
> + },
> { /* end of list */ }
> }
> };
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 7aa2213..4e5022a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -54,6 +54,7 @@
> #define BLOCK_OPT_LAZY_REFCOUNTS "lazy_refcounts"
> #define BLOCK_OPT_ADAPTER_TYPE "adapter_type"
> #define BLOCK_OPT_REDUNDANCY "redundancy"
> +#define BLOCK_OPT_NOCOW "nocow"
>
> typedef struct BdrvTrackedRequest {
> BlockDriverState *bs;
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index 88ec9bb..ad92c85 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -589,6 +589,22 @@ check -r all} is required, which may take some time.
>
> This option can only be enabled if @code{compat=1.1} is specified.
>
> address@hidden nocow
> +If this option is set to @code{on}, it will trun off COW of the file. It's
> only
> +valid on btrfs, no effect on other file systems.
> +
> +Btrfs has low performance when hosting a VM image file, even more when the
> guest
> +on the VM also using btrfs as file system. Turning off COW is a way to
> mitigate
> +this bad performance. Generally there are two ways to turn off COW on
> btrfs:
> +a) Disable it by mounting with nodatacow, then all newly created files will
> be
> +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this
> option
> +does.
> +
> +Note: this option is only valid to new or empty files. If there is an
> existing
> +file which is COW and has data blocks already, it couldn't be changed to
> NOCOW
> +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check
> if
> +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag).
> +
> @end table
>
> @item qed
> diff --git a/qemu-img.texi b/qemu-img.texi
> index c68b541..8496f3b 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -474,6 +474,22 @@ check -r all} is required, which may take some time.
>
> This option can only be enabled if @code{compat=1.1} is specified.
>
> address@hidden nocow
> +If this option is set to @code{on}, it will trun off COW of the file. It's
> only
> +valid on btrfs, no effect on other file systems.
> +
> +Btrfs has low performance when hosting a VM image file, even more when the
> guest
> +on the VM also using btrfs as file system. Turning off COW is a way to
> mitigate
> +this bad performance. Generally there are two ways to turn off COW on
> btrfs:
> +a) Disable it by mounting with nodatacow, then all newly created files will
> be
> +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this
> option
> +does.
> +
> +Note: this option is only valid to new or empty files. If there is an
> existing
> +file which is COW and has data blocks already, it couldn't be changed to
> NOCOW
> +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check
> if
> +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag).
> +
> @end table
>
> @item Other