tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] vectorize the curent hash implementation


From: Vladimir Vissoultchev
Subject: Re: [Tinycc-devel] vectorize the curent hash implementation
Date: Sat, 23 Apr 2016 13:43:17 +0300

Rolling hashes have a nice property -- associativity -- i.e. hash(concat(a,
b)) = [hash(a) * TOK_HASH_PRIME^len(b) + hash(b)] in GF(TOK_HASH_SIZE)

What I tried based on this was to pre-hash all of file->buffer and calc hash
on current (pos, len) with no loop like this:

    h = (ph[len] - (ph[0] - TOK_HASH_INIT) * hash_prime_powers[len]) &
(TOK_HASH_SIZE - 1)

... where ph is pointing to current pos in pre-hash buffer.

Unfortunately complete file->buffer pre-hashing is slow, I'm currently
working on impl pre-hash part in SSE2 but it looks like it's going to be too
slow again as there is no performant way to calculate all intermediate
hashes.

cheers,
</wqw>

-----Original Message-----
From: Tinycc-devel [mailto:address@hidden
On Behalf Of KHMan
Sent: Saturday, April 23, 2016 12:56 PM
To: address@hidden
Subject: Re: [Tinycc-devel] vectorize the curent hash implementation

On 4/23/2016 4:26 PM, Vladimir Vissoultchev wrote:
> [snip snip snip]
> (Note: SSE calc loop oversteps input string so file buffer has to 
> overallocated by 8 bytes in `tcc_open_bf`)
>
> Unfortunately there was no noticable improvement, which leads me to 
> beleave that hash calculation is not the main culprit for the 
> performance of `next_nomacro1`

IIRC there are fast LZ compressors that are trying to use vector registers
to speed up hashing. But those have nice and small inner loops.

But compilers branch all over the place, there are no nice inner loops to
let L1 caches go hot-rodding, there are many data dependencies which limit a
CPU's ability to schedule multiple ops. 
Branch and cache mispredictions are like train crashes these days.

Perhaps someone can test the limits on one of those academic CPU simulators
that has an 'oracle' cache setting. How much speed can we further squeeze?
I'm not optimistic there is a lot of performance left to squeeze...






reply via email to

[Prev in Thread] Current Thread [Next in Thread]