[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dma_blk helpers and infinite dma_memory_map retries
From: |
John Snow |
Subject: |
Re: dma_blk helpers and infinite dma_memory_map retries |
Date: |
Thu, 30 Jul 2020 14:26:42 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 7/29/20 7:43 AM, Paolo Bonzini wrote:
This is the question: what is special about this address_space_map, that
makes it never succeed? The usual case where address_space_map fails is
simply two or more concurrent users of the bounce buffer, and these
solve by themselves when address_space_unmap calls
cpu_notify_map_clients. Do we never call address_space_unmap?
TLDR: the dma-helpers handle alignment ... poorly, and it's not clear
what the right way to handle this might be for a few reasons.
Read on for my usual thinking-out-loud analysis tracing this problem,
though I do have a question for Kevin/Paolo at the end. (See the
numbered list 1-4 at the bottom.)
First, the (partially bogus, fuzzer-generated) IDE command wants to:
1. dma write 259 sectors starting at sector 1
2. Provides a PRDT at addr 0x00 whose first PRDT describes a data buffer
at 0xffffffff of length 0x10000. [a]
3. The remaining 8,191 PRD entries are uninitialized memory that all
wind up describing the same data buffer at 0x00 of length 0x10000.
Generally, the entire PRDT is going to be read, but truncated into an
SGList that's exactly as long as the IDE command. Here, that's 0x20600
bytes.
Yadda, yadda, yadda, that winds up turning into these map requests:
addr 0xffffffff; len 0x10000
-- mapped length: 0x01 (normal map return)
addr 0x100000000; len 0xffff
-- mapped length: 0x1000 (bounce buffer return)
addr 0x100001000; len 0xefff
-- bounce buffer is busy, cannot map
Then it proceeds and calls the iofunc. We return to dma_blk_cb and then:
unmap 0xffffffff; len 0x01; access_len 0x01;
... That unmaps the "direct" one, but we seemingly fail to unmap the
indirect one.
Uh, cool. When we build the IOV, we build it with two entries; but
qemu_iovec_discard_back discards the second entry entirely without
unmapping it.
IDE asks for an alignment of BDRV_SECTOR_SIZE (512 bytes). The IDE state
machine transfers an entire sector or nothing at all. The total IOV size
we have build thus far is 0x1001 bytes, which is not aligned as you
might have noticed.
So, we try to align it:
qemu_iovec_discard_back(&dbs->iov, QEMU_ALIGN_DOWN(4097, 512))
... I think we probably wanted to ask to shave off one byte instead of
asking to shave off 4096 bytes.
So, a few perceived problems with dma_blk_cb:
1. Our alignment math is wrong. discard_back takes as an argument the
number of bytes to discard, not the number of bytes you want to have
afterwards.
2. qemu_iovec_discard_back will happily unwind entire IO vectors that we
would need to unmap and have now lost. Worse, whenever we do any kind of
truncating at all, those bytes are not re-added to the source SGList, so
subsequent transfers will have skipped some bytes in the guest SGList.
3. the dma_blk_io interfaces don't ever check to see if the sg list is
an even multiple of the alignment. They don't return synchronous error
and no callers check for an error case. (Though BMDMA does carefully
prepare the SGList such that it is aligned in this way. AHCI does too,
IIRC.) This means we might have an unaligned tail that we will just drop
or ignore, leading to another partial DMA.
4. There's no guarantee that any given individual IO vector will have an
entire sector's worth of data in it. It is theoretically valid to
describe a series of vectors of two bytes each. If we can only map 1-2
vectors at a time, depending, we're never going to be able to scrounge
up enough buffer real estate to transfer an entire sector.
Hm, Ow.
Before I go on some quest to ascend to a higher plane of being, I wanted
to solicit feedback on the problem as described thus far and see if
anyone has a suggestion for a fix that's not too involved.
--js
[a] This is against the BMDMA spec. The address must be aligned to 0x02
and cannot cross a 64K boundary. bit0 is documented as always being
zero, but it's not clear what should happen if the boundary constraint
is violated. Absent other concerns, it might just be easiest to fulfill
the transfer if it's possible.