qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [Qemu-devel] [PATCH v1 1/5] s390x/tcg: Implement VECTOR


From: David Hildenbrand
Subject: Re: [qemu-s390x] [Qemu-devel] [PATCH v1 1/5] s390x/tcg: Implement VECTOR FIND ANY ELEMENT EQUAL
Date: Thu, 23 May 2019 09:50:54 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 22.05.19 20:46, Richard Henderson wrote:
> On 5/22/19 2:16 PM, David Hildenbrand wrote:
>> On 22.05.19 17:59, Richard Henderson wrote:
>>> On Wed, 22 May 2019 at 07:16, David Hildenbrand <address@hidden> wrote:
>>>>> Also plausible.  I guess it would be good to know, anyway.
>>>>
>>>> I'll dump the parameters when booting Linux. My gut feeling is that the
>>>> cc option is basically never used ...
>>>
>>> It looks like our intuition is wrong about that.
>>
>> Thanks for checking!
>>
>>>
>>> address@hidden:~/glibc/src/sysdeps/s390$ grep -r vfaezbs * | wc -l
>>> 15
>>>
>>> These set cc, use zs, and do not use rt.
>>>
>>> address@hidden:~/glibc/src/sysdeps/s390$ grep -r 'vfaeb' * | wc -l
>>> 3
>>>
>>> These do not set cc, do not use zs, and do use rt.
>>>
>>> Those are the only two VFAE forms used by glibc (note that the same
>>> variants as 'f' are used by the wide-character strings).
>>>
>>
>> I guess "rt" and "cc" make the biggest difference. Maybe special case
>> these two, result in 4 variants for each of the 3 element sizes?
> 
> Sounds good.
> 

So .... after all it might not be necessary, at least not for this
helper :) Using your crazy helper functions, I have this right now:

/*
 * Returns the number of bits composing one element.
 */
static uint8_t get_element_bits(uint8_t es)
{
    return (1 << es) * BITS_PER_BYTE;
}

/*
 * Returns the bitmask for a single element.
 */
static uint64_t get_single_element_mask(uint8_t es)
{
    return -1ull >> (64 - get_element_bits(es));
}

/*
 * Returns the bitmask for a single element (excluding the MSB).
 */
static uint64_t get_single_element_lsbs_mask(uint8_t es)
{
    return -1ull >> (65 - get_element_bits(es));
}

/*
 * Returns the bitmasks for multiple elements (excluding the MSBs).
 */
static uint64_t get_element_lsbs_mask(uint8_t es)
{
    return dup_const(es, get_single_element_lsbs_mask(es));
}

static int vfae(void *v1, const void *v2, const void *v3, bool in,
                bool rt, bool zs, uint8_t es)
{
    const uint64_t mask = get_element_lsbs_mask(es);
    const int bits = get_element_bits(es);
    uint64_t a0, a1, b0, b1, e0, e1, t0, t1, z0, z1;
    uint64_t first_zero = 16;
    uint64_t first_equal;
    int i;

    a0 = s390_vec_read_element64(v2, 0);
    a1 = s390_vec_read_element64(v2, 1);
    b0 = s390_vec_read_element64(v3, 0);
    b1 = s390_vec_read_element64(v3, 1);
    e0 = 0;
    e1 = 0;
    /* compare against equality with every other element */
    for (i = 0; i < 64; i += bits) {
        t0 = i ? rol64(b0, i) : b0;
        t1 = i ? rol64(b1, i) : b1;
        e0 |= zero_search(a0 ^ t0, mask);
        e0 |= zero_search(a0 ^ t1, mask);
        e1 |= zero_search(a1 ^ t0, mask);
        e1 |= zero_search(a1 ^ t1, mask);
    }
    /* invert the result if requested - invert only the MSBs */
    if (in) {
        e0 = ~e0 & ~mask;
        e1 = ~e1 & ~mask;
    }
    first_equal = match_index(e0, e1);

    if (zs) {
        z0 = zero_search(a0, mask);
        z1 = zero_search(a1, mask);
        first_zero = match_index(z0, z1);
    }

    if (rt) {
        e0 = (e0 >> (bits - 1)) * get_single_element_mask(es);
        e1 = (e1 >> (bits - 1)) * get_single_element_mask(es);
        s390_vec_write_element64(v1, 0, e0);
        s390_vec_write_element64(v1, 1, e1);
    } else {
        s390_vec_write_element64(v1, 0, MIN(first_equal, first_zero));
        s390_vec_write_element64(v1, 1, 0);
    }

    if (first_zero == 16 && first_equal == 16) {
        return 3; /* no match */
    } else if (first_zero == 16) {
        return 1; /* matching elements, no match for zero */
    } else if (first_equal < first_zero) {
        return 2; /* matching elements before match for zero */
    }
    return 0; /* match for zero */
}


At least the kernel boots with it - am i missing something or does this
indeed work?

Cheers!


-- 

Thanks,

David / dhildenb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]