qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 4/4] x86-disas: add x86-mini disassembler implementation


From: Michael Clark
Subject: Re: [PATCH v3 4/4] x86-disas: add x86-mini disassembler implementation
Date: Thu, 15 May 2025 00:52:52 +1200
User-agent: Mozilla Thunderbird

On 5/14/25 21:33, Daniel P. Berrangé wrote:
On Wed, May 14, 2025 at 09:23:58PM +1200, Michael Clark wrote:
On 5/14/25 20:17, Daniel P. Berrangé wrote:
On Wed, May 14, 2025 at 07:39:27PM +1200, Michael Clark wrote:
diff --git a/disas/x86-core.c b/disas/x86-core.c
new file mode 100644
index 000000000000..c4f7034e3420
--- /dev/null
+++ b/disas/x86-core.c
@@ -0,0 +1,2716 @@
+/*
+ * Copyright (c) 2024-2025 Michael Clark
+ *
+ * SPDX-License-Identifier: MIT

Note that we expect contributions to be under GPL-2.0-or-later, unless
derived from existing code that forces use of a different license, which
needs to be explained in the commit message

okay no problem, I can do that. there is a freestanding external origin:

https://github.com/michaeljclark/x86

IIUC, that would only apply to the x86-core.c file - the other files
tagged with MIT look like thy were written just for QEMU inclusion.

there are two files that should stay MIT licensed:

- disas/x86.h
- disas/x86-core.c

# which bits are neutral

half of x86.h is neutral and represents an expression of C structures
and enumerations that map precisely to the structures and enumerations
in the Intel SDM for the core encoding. things like prefixes, ModRM,
SIB, VEX, EVEX, and the VEX maps which are general and would come
out the same had someone else transcribed them from the Intel SDM,
given that the enum values precisely map to the binary encoding.

# which bits are unique

on the other hand there is a rather unique compression for the opcode
encoding metadata related to densely packing the encoding in the Intel
CSV metadata, as well as a completely new LEX format which is unusual
in that it makes sense had Intel encoded the metadata like this in the
first place, given some reflection on the VEX and EVEX encodings. it
took a surprising amount of time to do this because I started on this
about 5 years ago in May 2020 from looking at my home directory. and
I had several false starts where I completely discarded prior work.

- x86_enc_*, x86_opr_*, and x86_ord_* and x86_codec are unique and
  represent a very densely packed encoding of x86 codec metadata.

it was quite weird to write because I wrote no code for three months,
August to October 2024, just metadata. and I started from scratch
and completely threw out previous attempts which had included some
code from TCG. you can see that the emitter is radically different.

see x86-core.c:x86_codec_write

# what is QEMU-specific

the disassembly stub could change to GPL-2.0-or-later no problem:

- disas/x86-disas.c

# tangent on MIT licensed TCG headers

tangential to this. I have extracted TCG MIT headers from QEMU and
have a separate goal to write a new TCG compiler with the same API
but using this new x86 back-end. I have an unsent draft with some
licensing questions but I decided to just believe the MIT license.

I am choosing to use the interface portion for a new freestanding
TCG-workalike compiler. the Google LLC v. Oracle America Inc.
Supreme Court ruling on fair-use doctrine in relation to interface
header portions of existing works seems to make that plausible.

# tangent on instruction selection

there is an exhaustively complete encoding of AVX-512 that has been
fuzz tested against LLVM and it is small in comparison to capstone.
it could potentially be used as an EVEX emitter inside of QEMU.

but I don't have instruction selection yet. I note the metadata has
been de-duplicated compared to NASM. it does not use data from NASM
but I adopted a consistent coding scheme because NASM has been most
faithful to the Intel SDM metadata, which makes it very easy to add
new instructions because we can just copy-paste from the Intel SDM.

in this way LEX seems like something that should have been there
in the first place. because we don't have extraneous opcode bytes.
it ends up as 2-byte OPC+ModRM with masks, plus maps and prefixes,
either legacy or via VEX/EVEX. it makes the decoder very uniform.

for instruction selection I plan to do a combinatorial expansion
to generate enums mapping to subsets of the encodings for memory
or register operands, or other options like broadcast, more like
the denormalized NASM metadata which has thousands more entries
but auto-generated instead, and with type sizes or without for a
selection based on best fit. enums on the right are work-in-progress
from a new generator so that I can add instruction selection. so
it can't be used as an emitter yet until we have enums because
at the moment the emitter requires the opcode from decode to round
trip as opposed to being populated by instruction selection code.

# typed instruction selection enum expansions

  add rw,rw/mw                ADD_r32_r32
  add rw,rw/mw                ADD_r32_m32
  add rw,rw/mw                ADD_r64_r64
  add rw,rw/mw                ADD_r64_m64

  adc rw/mw,iw                ADC_r32_i32
  adc rw/mw,iw                ADC_m32_i32
  adc rw/mw,iw                ADC_r64_i32
  adc rw/mw,iw                ADC_m64_i32

  vxorps xmm,xmm,xmm/m128     VXORPS_v128_v128_v128
  vxorps xmm,xmm,xmm/m128     VXORPS_v128_v128_m128

# untyped instruction selection enum expansions

  add rw,rw/mw                ADD_rr
  add rw,rw/mw                ADD_rm
  adc rw/mw,iw                ADC_ri
  adc rw/mw,iw                ADC_mi

  vxorps xmm,xmm,xmm/m128     VXORPS_vvv
  vxorps xmm,xmm,xmm/m128     VXORPS_vvm

Michael.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]