coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] cksum: use pclmul instead of slice-by-32 for final bytes


From: Sam Russell
Subject: [PATCH] cksum: use pclmul instead of slice-by-32 for final bytes
Date: Sun, 24 Nov 2024 12:19:26 +0100

The current implementation reads 64kB blocks and uses lookup tables for the
final 0-31 bytes (normally 16 bytes, meaning 16 lookups). I've replaced
this with the smaller folds and Barrett reduction from the intel paper.
Benchmarking is hard as there's a lot of variance, but it appears to give
around a noticeable improvement for a 4GB ISO (fastest time is 0.215s user
compared with fastest 0m0.451s on a AMD Ryzen 5 5600).

Future work is to remove this final reduction from the loop completely as
we're reading in multiples of 32 bytes and we can use the 4-fold method
exclusively until we get to the end of the file stream.

Open any feedback, especially as I've probably violated the code style
somewhere along the line.

Copyright: all my own work and have completed GNU copyright paperwork, the
algorithm is based off the Intel paper that the rest of the implementation
is also based on.

Attachment: 0001-cksum-use-pclmul-instead-of-slice-by-32-for-final-by.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]