qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 09/17] block/io: support int64_t bytes in bdrv_co_p{read,w


From: Eric Blake
Subject: Re: [PATCH v3 09/17] block/io: support int64_t bytes in bdrv_co_p{read,write}v_part()
Date: Fri, 22 May 2020 14:34:23 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

On 4/30/20 6:10 AM, Vladimir Sementsov-Ogievskiy wrote:
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their
remaining dependencies now.

Series: 64bit-block-status
Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---
  include/block/block_int.h |  4 ++--
  block/io.c                | 16 ++++++++--------
  block/trace-events        |  4 ++--
  3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index c8daba608b..3c2a1d741a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -975,13 +975,13 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
      int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
      BdrvRequestFlags flags);
  int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
-    int64_t offset, unsigned int bytes,
+    int64_t offset, int64_t bytes,
      QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);
  int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
      int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
      BdrvRequestFlags flags);
  int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
-    int64_t offset, unsigned int bytes,
+    int64_t offset, int64_t bytes,
      QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);

Callers for these two functions:

block-backend.c:blk_do_pwritev_part() - currently passes unsigned int

filter-compress.c:compress_co_preadv_part() - passes uint64_t from .bdrv_co_preadv_part, which in turn is called from: - io.c:bdrv_driver_preadv() - callers analyzed earlier this series, where we know we are currently capped at <2G - qcow2-cluster.c:do_perform_cow_read() - passes size_t qiov->size, but we further know qcow2 cow is limited to cluster size of 2M - qcow2.c:qcow2_load_vmstate() - passes size_t qiov->size, tracing whether that ever exceeds 32 bits (on 64-bit platforms) is harder

filter-compress.c:compress_co_pwritev_part() - ditto, but for .bdrv_co_pwritev_part, which in turn is called from: - io.c:bdrv_driver_pwritev() - callers analyzed earlier this series, where we know we are currently capped at <2G - qcow2.c:qcow2_save_vmstate() - passes size_t qiov->size, tracing whether that ever exceeds 32 bits (on 64-bit platforms) is harder

io.c:bdrv_co_preadv() - currently passes unsigned int

io.c:bdrv_co_pwritev() - currently passes unsigned int

qcow2.c:qcow2_co_preadv_task() - passes uint64_t from qcow2_co_preadv_part(), which in turn is called from:
 - .bdrv_co_preadv_part - analyzed above

qcow2.c:qcow2_co_pwritev_task() - passes uint64_t from qcow2_co_pwritev_part(), which in turn is called from:
 - .bdrv_co_pwritev_part - analyzed above
- qcow2_co_truncate() - passes uint64_t but it is clamped to 1 cluster, at most 2M

In summary, it looks like even with our new 64-bit bytes parameter, most (all?) callers are still clamped to 32 bits. But if we later relax callers, we want to see how bytes is used within these functions.

static inline int coroutine_fn bdrv_co_pread(BdrvChild *child,
diff --git a/block/io.c b/block/io.c
index d336e4e691..d7fd429345 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1488,7 +1488,7 @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
   */
  static bool bdrv_pad_request(BlockDriverState *bs,
                               QEMUIOVector **qiov, size_t *qiov_offset,
-                             int64_t *offset, unsigned int *bytes,
+                             int64_t *offset, int64_t *bytes,
                               BdrvRequestPadding *pad)
  {
      if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {

Callers:
bdrv_do_preadv_part() - adjusted to int64_t in this patch
bdrv_do_pwritev_part() - adjusted to int64_t in this patch

Usage:
    if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
 - takes int64_t, but now has to be checked for 64-bit safety below

    qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
                             *qiov, *qiov_offset, *bytes,
pad->buf + pad->buf_len - pad->tail, pad->tail); - takes size_t, risky on 32-bit platforms if any of our callers ever pass in a value larger than 32 bits. I'd feel much better with an assertion that bytes <= SIZE_MAX.

    *bytes += pad->head + pad->tail;
- corner-case risk of overflow for an image near 63-bit limits (nbdkit can generate such an image, but real images do not tickle this); the risk can be mitigated if we insist that no images are larger than QEMU_ALIGN_DOWN(INT64_MAX, request_alignment), as we would be unable to access the unaligned tail bytes of such an image


bdrv_init_padding():
    sum = pad->head + bytes + pad->tail;
- ouch: this sets 'size_t sum' to what has always been an int64_t parameter, but which prior to this patch was initialized by callers passing 32-bit values (with this patch, callers now pass in actual int64_t). But even pre-patch, there are values of bytes that fit in 32 bits but where this sum can overflow on 32-bit platforms, with that potential overflow not changed here. We need a separate patch to fix this (sum needs to be int64_t), preferably earlier in the series. (patch 6/17 also mentioned bdrv_init_padding as needing a fix)


@@ -1515,7 +1515,7 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
/* To be called between exactly one pair of bdrv_inc/dec_in_flight() */
  static int coroutine_fn bdrv_do_preadv_part(BdrvChild *child,
-    int64_t offset, unsigned int bytes,
+    int64_t offset, int64_t bytes,
      QEMUIOVector *qiov, size_t qiov_offset,
      BdrvRequestFlags flags)
  {

Callers:
bdrv_co_preadv_part() - adjusted this patch
bdrv_rw_co_entry() called by bdrv_prwv_co called by:
 - bdrv_pwrite_zeroes() - int bytes
 - bdrv_pwritev() - size_t bytes from qiov
 - bdrv_preadv() - size_t bytes from qiov

This fixes a latent issue where pre-patch callers could pass size_t and suffer truncation on a 64-bit platform; but in practice we never tickled that bug because of our insistence on <2G read/write.

Usage:

@@ -1524,7 +1524,7 @@ static int coroutine_fn bdrv_do_preadv_part(BdrvChild 
*child,
      BdrvRequestPadding pad;
      int ret;
- trace_bdrv_co_preadv(bs, offset, bytes, flags);
+    trace_bdrv_co_preadv_part(bs, offset, bytes, flags);
ret = bdrv_check_byte_request(bs, offset, bytes);

takes int64_t (patch 3/17 in this series), and ensures the transaction is <2G for the rest of the function. If we ever relax bdrv_check_byte_request, we also have to worry about:

    tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_READ);
 - takes int64_t, audited earlier in this series

    ret = bdrv_aligned_preadv(child, &req, offset, bytes,
                              bs->bl.request_alignment,
                              qiov, qiov_offset, flags);
 - takes int64_t, audited earlier in this series


      if (ret < 0) {
@@ -1562,7 +1562,7 @@ static int coroutine_fn bdrv_do_preadv_part(BdrvChild 
*child,
  }
int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
-    int64_t offset, unsigned int bytes,
+    int64_t offset, int64_t bytes,
      QEMUIOVector *qiov, size_t qiov_offset,
      BdrvRequestFlags flags)
  {

Callers analyzed above, usage within the function:

ret = bdrv_do_preadv_part(child, offset, bytes, qiov, qiov_offset, flags);
 - analyzed above

@@ -1866,7 +1866,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
                                                  int64_t offset,
-                                                unsigned int bytes,
+                                                int64_t bytes,
                                                  BdrvRequestFlags flags,
                                                  BdrvTrackedRequest *req)
  {

Caller: bdrv_do_pwritev_part(), adjusted in this patch

Usage:
    padding = bdrv_init_padding(bs, offset, bytes, &pad);
 - analyzed above

ret = bdrv_aligned_pwritev(child, req, offset, aligned_bytes, align,
                                   NULL, 0, flags);
 - takes int64_t, audited earlier in the series
        assert(align == pad.tail + bytes);
- changes from 'uint64_t == size_t + unsigned int' to 'uint64_t == size_t + int64_t', but does not change in semantics


@@ -1941,7 +1941,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
/* To be called between exactly one pair of bdrv_inc/dec_in_flight() */
  static int coroutine_fn bdrv_do_pwritev_part(BdrvChild *child,
-    int64_t offset, unsigned int bytes, QEMUIOVector *qiov, size_t qiov_offset,
+    int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset,
      BdrvRequestFlags flags)
  {
      BlockDriverState *bs = child->bs;

Callers:
bdrv_co_pwritev_part() - adjusted this patch
bdrv_do_pwrite_zeroes() - passes 'int'
bdrv_rw_co_entry() analyzed above

Usage:

@@ -1950,7 +1950,7 @@ static int coroutine_fn bdrv_do_pwritev_part(BdrvChild 
*child,
      BdrvRequestPadding pad;
      int ret;
- trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
+    trace_bdrv_co_pwritev_part(child->bs, offset, bytes, flags);
if (!bs->drv) {
          return -ENOMEDIUM;

    ret = bdrv_check_byte_request(bs, offset, bytes);
 - analyzed above; if we ever relax it, we must also worry about:

    tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_WRITE);
 - analyzed above

        ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, &req);
 - analyzed above

    if (bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad)) {
 - analyzed above

    ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
                               qiov, qiov_offset, flags);
 - takes int64_t, analyzed earlier this series


@@ -2009,7 +2009,7 @@ out:
  }
int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
-    int64_t offset, unsigned int bytes, QEMUIOVector *qiov, size_t qiov_offset,
+    int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset,
      BdrvRequestFlags flags)
  {
      int ret;

Callers analzyed above, usage:

ret = bdrv_do_pwritev_part(child, offset, bytes, qiov, qiov_offset, flags);
 - analyzed above

diff --git a/block/trace-events b/block/trace-events
index 179b47bf63..dd367a9b19 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -11,8 +11,8 @@ blk_root_attach(void *child, void *blk, void *bs) "child %p blk %p 
bs %p"
  blk_root_detach(void *child, void *blk, void *bs) "child %p blk %p bs %p"
# io.c
-bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset 
%"PRId64" nbytes %"PRId64" flags 0x%x"
-bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset 
%"PRId64" nbytes %"PRId64" flags 0x%x"
+bdrv_co_preadv_part(void *bs, int64_t offset, int64_t bytes, int flags) "bs %p offset %" PRId64 
" bytes %" PRId64 " flags 0x%x"
+bdrv_co_pwritev_part(void *bs, int64_t offset, int64_t bytes, int flags) "bs %p offset %" PRId64 
" bytes %" PRId64 " flags 0x%x"

Interesting that the trace always took 64-bit numbers, but you are renaming it to match the fact that they are now called from different functions (since commit 1acc3466a2). I'd feel better if 'flags' were kept unsigned int, even though semantically it doesn't matter.

  bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset 
%"PRId64" count %d flags 0x%x"
  bdrv_co_do_copy_on_readv(void *bs, int64_t offset, int64_t bytes, int64_t cluster_offset, int64_t cluster_bytes) 
"bs %p offset %" PRId64 " bytes %" PRId64 " cluster_offset %" PRId64 " cluster_bytes 
%" PRId64
  bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int 
read_flags, int write_flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes 
%"PRIu64" rw flags 0x%x 0x%x"


I think if you fix bdrv_init_padding in a separate patch, then everything else here is safe.
Reviewed-by: Eric Blake <address@hidden>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]