coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL


From: Sam Russell
Subject: Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL
Date: Wed, 25 Dec 2024 20:07:04 +0100

I agree, also looking over CPU specs it looks like this is actually going
to be a regression as a lot of 5-10 year old CPUs have 32-64kB L1 cache and
not much more for L2 (whereas AMD is doing 3MB L2 caches which explains the
boost there).

I have some old laptops at home I can play around with so I'll tune on
there and submit again when I have some more confidence on the speed boost

On Wed, Dec 25, 2024, 19:57 Pádraig Brady <P@draigbrady.com> wrote:

> On 25/12/2024 16:55, Sam Russell wrote:
> > Thanks for the results, looks like I'll need to get access to some older
> hardware and try some different combinations. There's a few things I can
> tune (loading all 8 values at the start vs loading one per fold, different
> BUFSIZE values), I'd be interested in finding a setup that definitely
> offers an improvement across the board.
> >
> > Did you test this with the first patch or the second patch? At a minimum
> cutting out the final table-based fold should be a consistent ~5%
> improvement on any platform.
>
> It would be good to test chorba without also increasing the buffer size
> so we're comparing just the algorithms.
>
> We can tweak the buffer sizes after,
> though note ioblksize.h is currently set to 256KiB
> so it would be good to be <= that.
>
> cheers,
> Pádraig
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]