[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-ppc] [Qemu-devel] [PATCH v4 00/15] Invert Endian bit in SPARCv9 MM
From: |
tony.nguyen |
Subject: |
[Qemu-ppc] [Qemu-devel] [PATCH v4 00/15] Invert Endian bit in SPARCv9 MMU TTE |
Date: |
Thu, 25 Jul 2019 09:51:14 +0000 |
This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE.
It is an attempt of the instructions outlined by Richard Henderson to Mark
Cave-Ayland.
Tested with OpenBSD on sun4u. Solaris 10 is my actual goal, but unfortunately a
separate keyboard issue remains in the way.
On 01/11/17 19:15, Mark Cave-Ayland wrote:
>On 15/08/17 19:10, Richard Henderson wrote:
>
>> [CC Peter re MemTxAttrs below]
>>
>> On 08/15/2017 09:38 AM, Mark Cave-Ayland wrote:
>>> Working through an incorrect endian issue on qemu-system-sparc64, it has
>>> become apparent that at least one OS makes use of the IE (Invert Endian)
>>> bit in the SPARCv9 MMU TTE to map PCI memory space without the
>>> programmer having to manually endian-swap accesses.
>>>
>>> In other words, to quote the UltraSPARC specification: "if this bit is
>>> set, accesses to the associated page are processed with inverse
>>> endianness from what is specified by the instruction (big-for-little and
>>> little-for-big)".
A good explanation by Mark why the IE bit is required.
>>>
>>> Looking through various bits of code, I'm trying to get a feel for the
>>> best way to implement this in an efficient manner. From what I can see
>>> this could be solved using an additional MMU index, however I'm not
>>> overly familiar with the memory and softmmu subsystems.
>>
>> No, it can't be solved with an MMU index.
>>
>>> Can anyone point me in the right direction as to what would be the best
>>> way to implement this feature within QEMU?
>>
>> It's definitely tricky.
>>
>> We definitely need some TLB_FLAGS_MASK bit set so that we're forced through
>> the
>> memory slow path. There is no other way to bypass the endianness that we've
>> already encoded from the target instruction.
>>
>> Given the tlb_set_page_with_attrs interface, I would think that we need a new
>> bit in MemTxAttrs, so that the target/sparc tlb_fill (and subroutines) can
>> pass
>> along the TTE bit for the given page.
>>
>> We have an existing problem in softmmu_template.h,
>>
>> /* ??? Note that the io helpers always read data in the target
>> byte ordering. We should push the LE/BE request down into io. */
>> res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr);
>> res = TGT_BE(res);
>>
>> We do not want to add a third(!) byte swap along the i/o path. We need to
>> collapse the two that we have already before considering this one.
>>
>> This probably takes the form of:
>>
>> (1) Replacing the "int size" argument with "TCGMemOp memop" for
>> a) io_{read,write}x in accel/tcg/cputlb.c,
>> b) memory_region_dispatch_{read,write} in memory.c,
>> c) adjust_endianness in memory.c.
>> This carries size+sign+endianness down to the next level.
>>
>> (2) In memory.c, adjust_endianness,
>>
>> if (memory_region_wrong_endianness(mr)) {
>> - switch (size) {
>> + memop ^= MO_BSWAP;
>> + }
>> + if (memop & MO_BSWAP) {
>>
>> For extra credit, re-arrange memory_region_wrong_endianness
>> to something more explicit -- "wrong" isn't helpful.
>
>Finally I've had a bit of spare time to experiment with this approach,
>and from what I can see there are currently 2 issues:
>
>
>1) Using TCGMemOp in memory.c means it is no longer accelerator agnostic
>
>For the moment I've defined a separate MemOp in memory.h and provided a
>mapping function in io_{read,write}x to map from TCGMemOp to MemOp and
>then pass that into memory_region_dispatch_{read,write}.
>
>Other than not referencing TCGMemOp in the memory API, another reason
>for doing this was that I wasn't convinced that all the MO_ attributes
>were valid outside of TCG. I do, of course, strongly defer to other
>people's knowledge in this area though.
>
>
>2) The above changes to adjust_endianness() fail when
>memory_region_dispatch_{read,write} are called recursively
>
>Whilst booting qemu-system-sparc64 I see that
>memory_region_dispatch_{read,write} get called recursively - once via
>io_{read,write}x and then again via flatview_read_continue() in exec.c.
>
>The net effect of this is that we perform the bswap correctly at the
>tail of the recursion, but then as we travel back up the stack we hit
>memory_region_dispatch_{read,write} once again causing a second bswap
>which means the value is returned with the incorrect endian again.
>
>
>My understanding from your softmmu_template.h comment above is that the
>memory API should do the endian swapping internally allowing the removal
>of the final TGT_BE/TGT_LE applied to the result, or did I get this wrong?
>
>> (3) In tlb_set_page_with_attrs, notice attrs.byte_swap and set
>> a new TLB_FORCE_SLOW bit within TLB_FLAGS_MASK.
>>
>> (4) In io_{read,write}x, if iotlbentry->attrs.byte_swap is set,
>> then memop ^= MO_BSWAP.
Thanks all for the v1 and v2 feedback.
v2:
- Moved size+sign+endianness attributes from TCGMemOp into MemOp.
In v1 TCGMemOp was re-purposed entirely into MemOp.
- Replaced MemOp MO_{8|16|32|64} with TCGMemOp MO_{UB|UW|UL|UQ} alias.
This is to avoid warnings on comparing and coercing different enums.
- Renamed get_memop to get_tcgmemop for clarity.
- MEMOP is now SIZE_MEMOP, which is just ctzl(size).
- Split patch 3/4 so one memory_region_dispatch_{read|write} interface
is converted per patch.
- Do not reuse TLB_RECHECK, use new TLB_FORCE_SLOW instead.
- Split patch 4/4 so adding the MemTxAddrs parameters and converting
tlb_set_page() to tlb_set_page_with_attrs() is separate from usage.
- CC'd maintainers.
v3:
- Like v1, the entire TCGMemOp enum is now MemOp.
- MemOp target dependant attributes are conditional upon NEED_CPU_H
v4:
- Added Paolo Bonzini as include/exec/memop.h maintainer
Tony Nguyen (15):
tcg: TCGMemOp is now accelerator independent MemOp
memory: Access MemoryRegion with MemOp
target/mips: Access MemoryRegion with MemOp
hw/s390x: Access MemoryRegion with MemOp
hw/intc/armv7m_nic: Access MemoryRegion with MemOp
hw/virtio: Access MemoryRegion with MemOp
hw/vfio: Access MemoryRegion with MemOp
exec: Access MemoryRegion with MemOp
cputlb: Access MemoryRegion with MemOp
memory: Access MemoryRegion with MemOp semantics
memory: Single byte swap along the I/O path
cpu: TLB_FLAGS_MASK bit to force memory slow path
cputlb: Byte swap memory transaction attribute
target/sparc: Add TLB entry with attributes
target/sparc: sun4u Invert Endian TTE bit
MAINTAINERS | 1 +
accel/tcg/cputlb.c | 71 +++++++++--------
exec.c | 6 +-
hw/intc/armv7m_nvic.c | 12 ++-
hw/s390x/s390-pci-inst.c | 8 +-
hw/vfio/pci-quirks.c | 5 +-
hw/virtio/virtio-pci.c | 7 +-
include/exec/cpu-all.h | 10 ++-
include/exec/memattrs.h | 2 +
include/exec/memop.h | 112 +++++++++++++++++++++++++++
include/exec/memory.h | 9 ++-
memory.c | 37 +++++----
memory_ldst.inc.c | 18 ++---
target/alpha/translate.c | 2 +-
target/arm/translate-a64.c | 48 ++++++------
target/arm/translate-a64.h | 2 +-
target/arm/translate-sve.c | 2 +-
target/arm/translate.c | 32 ++++----
target/arm/translate.h | 2 +-
target/hppa/translate.c | 14 ++--
target/i386/translate.c | 132 ++++++++++++++++----------------
target/m68k/translate.c | 2 +-
target/microblaze/translate.c | 4 +-
target/mips/op_helper.c | 5 +-
target/mips/translate.c | 8 +-
target/openrisc/translate.c | 4 +-
target/ppc/translate.c | 12 +--
target/riscv/insn_trans/trans_rva.inc.c | 8 +-
target/riscv/insn_trans/trans_rvi.inc.c | 4 +-
target/s390x/translate.c | 6 +-
target/s390x/translate_vx.inc.c | 10 +--
target/sparc/cpu.h | 2 +
target/sparc/mmu_helper.c | 40 ++++++----
target/sparc/translate.c | 14 ++--
target/tilegx/translate.c | 10 +--
target/tricore/translate.c | 8 +-
tcg/README | 2 +-
tcg/aarch64/tcg-target.inc.c | 26 +++----
tcg/arm/tcg-target.inc.c | 26 +++----
tcg/i386/tcg-target.inc.c | 24 +++---
tcg/mips/tcg-target.inc.c | 16 ++--
tcg/optimize.c | 2 +-
tcg/ppc/tcg-target.inc.c | 12 +--
tcg/riscv/tcg-target.inc.c | 20 ++---
tcg/s390/tcg-target.inc.c | 14 ++--
tcg/sparc/tcg-target.inc.c | 6 +-
tcg/tcg-op.c | 38 ++++-----
tcg/tcg-op.h | 86 ++++++++++-----------
tcg/tcg.c | 2 +-
tcg/tcg.h | 99 ++----------------------
trace/mem-internal.h | 4 +-
trace/mem.h | 4 +-
52 files changed, 562 insertions(+), 488 deletions(-)
create mode 100644 include/exec/memop.h
--
1.8.3.1