Re: [Qemu-arm] [PATCH 1/2] target/arm: Vectorize USHL and SSHL

qemu-arm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [PATCH 1/2] target/arm: Vectorize USHL and SSHL

From:	Richard Henderson
Subject:	Re: [Qemu-arm] [PATCH 1/2] target/arm: Vectorize USHL and SSHL
Date:	Mon, 3 Jun 2019 12:23:29 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0

On 5/23/19 8:03 AM, Peter Maydell wrote:
> On Thu, 23 May 2019 at 13:44, Peter Maydell <address@hidden> wrote:
>>
>> On Sat, 18 May 2019 at 20:19, Richard Henderson
>> <address@hidden> wrote:
>>>
>>> These instructions shift left or right depending on the sign
>>> of the input, and 7 bits are significant to the shift.  This
>>> requires several masks and selects in addition to the actual
>>> shifts to form the complete answer.
>>>
>>> That said, the operation is still a small improvement even for
>>> two 64-bit elements -- 13 vector operations instead of 2 * 7
>>> integer operations.
>>>
>>> +void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
>>> +{
>>> +    TCGv_i32 lval = tcg_temp_new_i32();
>>> +    TCGv_i32 rval = tcg_temp_new_i32();
>>> +    TCGv_i32 lsh = tcg_temp_new_i32();
>>> +    TCGv_i32 rsh = tcg_temp_new_i32();
>>> +    TCGv_i32 zero = tcg_const_i32(0);
>>> +    TCGv_i32 max = tcg_const_i32(32);
>>> +
>>> +    /*
>>> +     * Perform possibly out of range shifts, trusting that the operation
>>> +     * does not trap.  Discard unused results after the fact.
>>> +     */
>>
>> This comment reads to me like we're relying on a guarantee
>> that TCG doesn't make, but in fact the readme says it does:
>> out-of-range shifts are "unspecified behavior" which may give
>> bogus results but won't crash. Perhaps phrasing the comment
>> as "relying on the TCG guarantee that these are only
>> 'unspecified behavior' and not 'undefined behavior' and so
>> won't crash" would be clearer ?

I've adjusted the comment along these lines.

> I had a look through the rest of the patch, but there is
> too much code here and I don't have enough context to
> figure out how all the various new gvec helpers are
> called and what jobs they are doing compared to the
> actual instruction operation. Maybe I'll have another try later.
If the host supports vectors, then the .fniv expander will be called.
Otherwise, .fni4 or .fni8 will be used to expand with 32-bit or 64-bit
integers.  Finally, the .fno expander calls an out-of-line helper.

Not strictly kosher perhaps, but in some places we Know that MAX_UNROLL is set
to 4 in tcg/tcg-op-gvec.c, and that since AdvSIMD uses 128-bit vectors, 4 *
32-bit will always be expanded inline, and so omit the out-of-line helper.

I'm not sure why I didn't do this here, since this is not one of the cases in
which we could share helpers with SVE.  (SVE dropped the right-shift as
negative shift count thing.)

> Why can we get rid of the 8-bit, 32-bit and 64-bit old shift
> helpers, but not 16-bit ?

It's still used by gen_neon_shift_narrow.


r~

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-arm] [PATCH 1/2] target/arm: Vectorize USHL and SSHL, Richard Henderson <=

Prev by Date: Re: [Qemu-arm] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load
Next by Date: [Qemu-arm] [PULL v2 01/14] hw/acpi: Consolidate build_mcfg to pci.c
Previous by thread: [Qemu-arm] [RFC PATCH] cputlb: use uint64_t for interim values for unaligned load
Next by thread: [Qemu-arm] [PULL v2 01/14] hw/acpi: Consolidate build_mcfg to pci.c
Index(es):
- Date
- Thread