qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dma_blk helpers and infinite dma_memory_map retries


From: John Snow
Subject: Re: dma_blk helpers and infinite dma_memory_map retries
Date: Thu, 30 Jul 2020 14:26:42 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 7/29/20 7:43 AM, Paolo Bonzini wrote:
This is the question: what is special about this address_space_map, that
makes it never succeed?  The usual case where address_space_map fails is
simply two or more concurrent users of the bounce buffer, and these
solve by themselves when address_space_unmap calls
cpu_notify_map_clients.  Do we never call address_space_unmap?

TLDR: the dma-helpers handle alignment ... poorly, and it's not clear what the right way to handle this might be for a few reasons.


Read on for my usual thinking-out-loud analysis tracing this problem, though I do have a question for Kevin/Paolo at the end. (See the numbered list 1-4 at the bottom.)


First, the (partially bogus, fuzzer-generated) IDE command wants to:

1. dma write 259 sectors starting at sector 1

2. Provides a PRDT at addr 0x00 whose first PRDT describes a data buffer at 0xffffffff of length 0x10000. [a]

3. The remaining 8,191 PRD entries are uninitialized memory that all wind up describing the same data buffer at 0x00 of length 0x10000.

Generally, the entire PRDT is going to be read, but truncated into an SGList that's exactly as long as the IDE command. Here, that's 0x20600 bytes.

Yadda, yadda, yadda, that winds up turning into these map requests:

addr 0xffffffff; len 0x10000
  -- mapped length: 0x01 (normal map return)

addr 0x100000000; len 0xffff
  -- mapped length: 0x1000 (bounce buffer return)

addr 0x100001000; len 0xefff
  -- bounce buffer is busy, cannot map

Then it proceeds and calls the iofunc. We return to dma_blk_cb and then:

unmap 0xffffffff; len 0x01; access_len 0x01;

... That unmaps the "direct" one, but we seemingly fail to unmap the indirect one.

Uh, cool. When we build the IOV, we build it with two entries; but qemu_iovec_discard_back discards the second entry entirely without unmapping it.

IDE asks for an alignment of BDRV_SECTOR_SIZE (512 bytes). The IDE state machine transfers an entire sector or nothing at all. The total IOV size we have build thus far is 0x1001 bytes, which is not aligned as you might have noticed.

So, we try to align it:

qemu_iovec_discard_back(&dbs->iov, QEMU_ALIGN_DOWN(4097, 512))

... I think we probably wanted to ask to shave off one byte instead of asking to shave off 4096 bytes.


So, a few perceived problems with dma_blk_cb:

1. Our alignment math is wrong. discard_back takes as an argument the number of bytes to discard, not the number of bytes you want to have afterwards.

2. qemu_iovec_discard_back will happily unwind entire IO vectors that we would need to unmap and have now lost. Worse, whenever we do any kind of truncating at all, those bytes are not re-added to the source SGList, so subsequent transfers will have skipped some bytes in the guest SGList.

3. the dma_blk_io interfaces don't ever check to see if the sg list is an even multiple of the alignment. They don't return synchronous error and no callers check for an error case. (Though BMDMA does carefully prepare the SGList such that it is aligned in this way. AHCI does too, IIRC.) This means we might have an unaligned tail that we will just drop or ignore, leading to another partial DMA.

4. There's no guarantee that any given individual IO vector will have an entire sector's worth of data in it. It is theoretically valid to describe a series of vectors of two bytes each. If we can only map 1-2 vectors at a time, depending, we're never going to be able to scrounge up enough buffer real estate to transfer an entire sector.


Hm, Ow.


Before I go on some quest to ascend to a higher plane of being, I wanted to solicit feedback on the problem as described thus far and see if anyone has a suggestion for a fix that's not too involved.

--js


[a] This is against the BMDMA spec. The address must be aligned to 0x02 and cannot cross a 64K boundary. bit0 is documented as always being zero, but it's not clear what should happen if the boundary constraint is violated. Absent other concerns, it might just be easiest to fulfill the transfer if it's possible.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]