[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [qemu-s390x] [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECT
From: |
David Hildenbrand |
Subject: |
Re: [qemu-s390x] [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK |
Date: |
Tue, 26 Feb 2019 22:23:26 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
On 26.02.19 20:23, David Hildenbrand wrote:
> On 26.02.19 20:12, Richard Henderson wrote:
>> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>>> +static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
>>> +{
>>> + const uint16_t i2 = get_field(s->fields, i2);
>>> + TCGv_i32 ones = tcg_const_i32(-1u);
>>> + TCGv_i32 zeroes = tcg_const_i32(0);
>>> + int i;
>>> +
>>> + for (i = 0; i < 16; i++) {
>>> + if (extract32(i2, 15 - i, 1)) {
>>> + write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8);
>>> + } else {
>>> + write_vec_element_i32(zeroes, get_field(s->fields, v1), i,
>>> MO_8);
>>> + }
>>> + }
>>> + tcg_temp_free_i32(ones);
>>> + tcg_temp_free_i32(zeroes);
>>> + return DISAS_NEXT;
>>> +}
>>
>> While this works, it's not in the spirit of
>>
>>> Programming Note: VECTOR GENERATE BYTE
>>> MASK is the preferred method for setting a vector
>>> register to all zeroes or ones.
>
> Good point, I skipped that note so far.
>
>>
>> Better, I think, with
>
> Many instructions to implement, so little time to fine tune stuff so
> far. However I have tests for VGBM, so I can easily get it working. Will
> play with it!
>
>>
>> uint64_t generate_byte_mask(uint8_t mask)
>> {
>> uint64_t r = 0;
>> int i;
>> for (i = 0; i < 8; i++) {
>> if ((mask >> i) & 1) {
>> r |= 0xffull << (i * 8);
>> }
>> }
>> return r;
>> }
>>
>> if (i2 == (i2 & 0xff) * 0x0101) {
>> /* masks for both halves of the vector are the same.
>> trust tcg to produce a good constant loading. */
>> tcg_gen_gvec_dup64i(vec_full_reg_offset(s, v1), 16, 16,
>> generate_byte_mask(i2 & 0xff));
>> } else {
>> TCGv_i64 t = tcg_temp_new_i64();
>> tcg_gen_movi_i64(t, generate_byte_mask(i2 >> 8));
>> write_vec_element_i64(t, v1, 0, MO_64);
>> tcg_gen_movi_i64(t, generate_byte_mask(i2 & 0xff));
>> write_vec_element_i64(t, v1, 1, MO_64);
>> tcg_temp_free_i64();
>> }
>>
>> Somewhere behind tcg_gen_gvec_dup64i, I check to see if the constant can be
>> decomposed further, which will eventually bottom out at
>>
>> vpxor %xmm0,%xmm0,%xmm0 // all zeros
>> vpcmpeq %xmm0,%xmm0,%xmm0 // all ones
>>
Just tested with minor adaptions, works like a charm!
--
Thanks,
David / dhildenb
- [qemu-s390x] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1, David Hildenbrand, 2019/02/26
- [qemu-s390x] [PATCH v1 02/33] s390x/tcg: Check vector register instructions at central point, David Hildenbrand, 2019/02/26
- [qemu-s390x] [PATCH v1 03/33] s390x: Add one temporary vector register in CPU state for TCG, David Hildenbrand, 2019/02/26
- [qemu-s390x] [PATCH v1 09/33] s390x/tcg: Implement VECTOR LOAD AND REPLICATE, David Hildenbrand, 2019/02/26
- [qemu-s390x] [PATCH v1 10/33] s390x/tcg: Implement VECTOR LOAD ELEMENT, David Hildenbrand, 2019/02/26
- [qemu-s390x] [PATCH v1 04/33] s390x/tcg: Utilities for vector instruction helpers, David Hildenbrand, 2019/02/26