[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH] target/arm: use x86 intrinsics to implement AES instruct
From: |
Ard Biesheuvel |
Subject: |
Re: [RFC PATCH] target/arm: use x86 intrinsics to implement AES instructions |
Date: |
Tue, 30 May 2023 18:58:53 +0200 |
On Tue, 30 May 2023 at 18:43, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 5/30/23 06:52, Ard Biesheuvel wrote:
> > +#ifdef __x86_64__
> > + if (have_aes()) {
> > + __m128i *d = (__m128i *)rd;
> > +
> > + *d = decrypt ? _mm_aesdeclast_si128(rk.vec ^ st.vec, (__m128i){})
> > + : _mm_aesenclast_si128(rk.vec ^ st.vec, (__m128i){});
>
> Do I correctly understand that the ARM xor is pre-shift
>
> > + return;
> > + }
> > +#endif
> > +
> > /* xor state vector with round key */
> > rk.l[0] ^= st.l[0];
> > rk.l[1] ^= st.l[1];
>
> (like so)
>
> whereas the x86 xor is post-shift
>
> > void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg
> > *s)
> > {
> > int i;
> > Reg st = *v;
> > Reg rk = *s;
> >
> > for (i = 0; i < 8 << SHIFT; i++) {
> > d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i & 15] + (i &
> > ~15))]);
> > }
>
> (like so, from target/i386/ops_sse.h)?
>
Indeed. Using the primitive operations defined in the AES paper, we
basically have the following for n rounds of AES (for n in {10, 12,
14})
for (n-1 rounds) {
AddRoundKey
ShiftRows
SubBytes
MixColumns
}
AddRoundKey
ShiftRows
SubBytes
AddRoundKey
AddRoundKey is just XOR, but it is incorporated into the instructions
that combine a couple of these steps.
So on x86, we have
aesenc:
ShiftRows
SubBytes
MixColumns
AddRoundKey
aesenclast:
ShiftRows
SubBytes
AddRoundKey
and on ARM we have
aese:
AddRoundKey
ShiftRows
SubBytes
aesmc:
MixColumns
> What might help: could we do the reverse -- emulate the x86 aesdeclast
> instruction with
> the aarch64 aesd instruction?
>
Help in what sense? To emulate the x86 instructions on a ARM host?
But yes, aesenclast can be implement using aese in a similar way,
i.e., by passing a {0} vector as the round key into the instruction,
and performing the XOR explicitly using the real round key afterwards.