qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 02/17] block: Collapse padded I/O vecs exceeding IOV_MAX


From: Hanna Czenczek
Subject: Re: [PULL 02/17] block: Collapse padded I/O vecs exceeding IOV_MAX
Date: Tue, 6 Jun 2023 10:45:13 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

On 06.06.23 10:00, Michael Tokarev wrote:
05.06.2023 18:45, Hanna Czenczek wrote:
When processing vectored guest requests that are not aligned to the
storage request alignment, we pad them by adding head and/or tail
buffers for a read-modify-write cycle.

The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
with this padding, the vector can exceed that limit.  As of
4c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make
qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
limit, instead returning an error to the guest.

To the guest, this appears as a random I/O error.  We should not return
an I/O error to the guest when it issued a perfectly valid request.

Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector
longer than IOV_MAX, which generally seems to work (because the guest
assumes a smaller alignment than we really have, file-posix's
raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
so emulate the request, so that the IOV_MAX does not matter). However,
that does not seem exactly great.

Why it is not "exactly great"?  To me, it seems to be the best solution.
I'd say we should be able to tolerate "slight" increase over IOV_MAX for
"internal purposes", so to say.  We can limit guest-supplied vector to
IOV_MAX, but internally guard against integer overflow only.

That’s a good point that may have been worth investigating.

I considered it not great because I assumed that that 4c002cef0e9abe7135d7916c51abce47f7fc1ee2 was written with intent, i.e. that the IOV_MAX limit was put in because we just generally assume in the block layer that it isn’t exceeded.  It may have worked out fine before for one specific protocol driver (file-posix) most of the time[1], but I think it’s reasonable to assume that code in the block layer has generally been written under the assumption that vectors will not exceed the IOV_MAX limit (or otherwise we wouldn’t use that constant in the block layer).  So in addition to file-posix, we’d also need to inspect other code (like the blkio driver that will submit vectored requests to an external library) what the implications of exceeding that limit are in all places.

That is not to say that it might not have been the simpler solution to agree to exceed the limit internally, but it is to say that the full implications would need to be investigated first.  In contrast, the solution added here is more complicated in code, but is localized.

[1] It won’t be fine if all buffers are 4k in size and 4k-aligned, which I admit is unlikely for a request whose offset isn’t aligned, but which could happen with a partition that isn’t aligned to 4k.

I see two ways to fix this problem:
1. We split such long requests into two requests.
2. We join some elements of the vector into new buffers to make it
    shorter.

This seems to be over-complicated, both of them, no?

I would have preferred to have this discussion while the patch was still on-list for review (this specific version was for two months, counting from the first version was three).  Do you think it is so complicated and thus bug-prone that we must revert this series now and try the other route?

I can agree that perhaps the other route could have been simpler, but now we already have patches that are reviewed and in master, which solve the problem.  So I won’t spend more time on tackling this issue from another angle.  If you are happy to do so, patches are always welcome.

Hanna




reply via email to

[Prev in Thread] Current Thread [Next in Thread]