tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] vectorize the curent hash implementation


From: Vladimir Vissoultchev
Subject: Re: [Tinycc-devel] vectorize the curent hash implementation
Date: Sat, 23 Apr 2016 11:26:26 +0300

Hi,

I did some test with the original rolling hash slightly tweaked to this:

#define TOK_HASH_INIT 0
#define TOK_HASH_PRIME 263
#define TOK_HASH_FUNC(h, c) ((h)*TOK_HASH_PRIME + (c))

... and was able to replace this hash calculating loop:

    h = TOK_HASH_INIT;
    for(i=0;i<len;i++)
        h = TOK_HASH_FUNC(h, ((unsigned char *)str)[i]);
    h &= (TOK_HASH_SIZE - 1);

... with this equivalent SSE2 calculation with step of 8 bytes:
    
    short *pp = hash_primepow + STRING_MAX_SIZE - (len);
    __m128i vh = _mm_setzero_si128(), v;
    for (i = 0; i < len; i += 8) {
        v = _mm_unpacklo_epi8(_mm_loadl_epi64((const __m128i *)(str + i)),
_mm_setzero_si128());
        vh = _mm_add_epi32(vh, _mm_madd_epi16(_mm_loadu_si128((const __m128i
*)(pp + i)), v));
    }
    vh = _mm_add_epi32(vh, _mm_srli_si128(vh, 8));
    vh = _mm_add_epi32(vh, _mm_srli_si128(vh, 4));
    h = _mm_cvtsi128_si32(vh) & (TOK_HASH_SIZE - 1);

... where `hash_primepow` is declared as

static short hash_primepow[STRING_MAX_SIZE + 9] = { 0 };

... and is initialized in `preprocess_new` like this
    
    /* init rolling hash prime powers table in reverse */
    for (h = 1, i = STRING_MAX_SIZE - 1; i >= 0; i--) {
        hash_primepow[i] = h;
        h = h * TOK_HASH_PRIME & (TOK_HASH_SIZE - 1);
    }

(Note: SSE calc loop oversteps input string so file buffer has to
overallocated by 8 bytes in `tcc_open_bf`)

Unfortunately there was no noticable improvement, which leads me to beleave
that hash calculation is not the main culprit for the performance of
`next_nomacro1`

cheers,
</wqw>

-----Original Message-----
From: Tinycc-devel [mailto:address@hidden
On Behalf Of Sergey Korshunoff
Sent: Saturday, April 23, 2016 10:38 AM
To: address@hidden
Subject: [Tinycc-devel] vectorize the curent hash implementation

Hi!
> IMO, the way to go w/ performance improvement is to vectorize the curent
hash implementation. Instead of computing the hash one character at a time,
take next 4 characters, process these in parallel and compute 4 hashes

How to implement this?
There is some hash optimization for the tcc compiled by gcc.
https://github.com/seyko2/tinycc/commit/b89f0d63af4f494a83c91fb0360d7d37c0f6
f9a3

_______________________________________________
Tinycc-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/tinycc-devel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]