[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-idutils] using %= in hash_find_slot, and optimized STRING_HASH_1
From: |
Sami Farin |
Subject: |
[bug-idutils] using %= in hash_find_slot, and optimized STRING_HASH_1 |
Date: |
Sat, 4 Dec 2010 03:24:47 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Seems like you have power of two's as the size.
The patch to idu-hash.c makes mkid run 15% faster.
Tested with icudt42l_dat.s (25631066 bytes) from Chrome.
%=
97,173,499,117 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=34,788,193,052,
run=34,788,193,052)
17,018,624,785 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=34,788,193,052,
run=34,788,193,052)
&=
81,802,113,759 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=29,281,007,448,
run=29,281,007,448)
18,096,072,339 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=29,281,007,448,
run=29,281,007,448)
Still pretty slow at 3191 cycles/byte.
I checked out STRING_HASH_1, it's not very clever.
After some twiddling, here's the results:
9,667,074,280 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=3,458,109,799,
run=3,458,109,799)
6,793,017,489 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=3,458,109,799,
run=3,458,109,799)
A little better, 377 cycles/byte.
When there are not that many tokens, it can get slower.
Tested mkid with current git repository:
original STRING_HASH_1
1,222,419,290 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=437,088,646,
run=437,088,646)
1,240,023,877 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=437,088,646,
run=437,088,646)
new STRING_HASH_1
1,312,536,207 PERF_COUNT_HW_CPU_CYCLES (0.00% scaling, ena=470,344,153,
run=470,344,153)
1,303,048,530 PERF_COUNT_HW_INSTRUCTIONS (0.00% scaling, ena=470,344,153,
run=470,344,153)
I have x86_64 arch, Pentium D.
On some other systems rotate with variable constants can be slow.
--- idutils-4.5/libidu/idu-hash.c.bak 2010-01-03 20:17:22.000000000 +0200
+++ idutils-4.5/libidu/idu-hash.c 2010-12-04 01:38:51.308570025 +0200
@@ -89,7 +89,7 @@ hash_find_slot (struct hash_table* ht, v
ht->ht_lookups++;
for (;;)
{
- hash_1 %= ht->ht_size;
+ hash_1 &= (ht->ht_size - 1);
slot = &ht->ht_vec[hash_1];
if (*slot == 0)
--- a/libidu/idu-hash.h
+++ b/libidu/idu-hash.h
@@ -70,8 +70,13 @@ extern void *hash_deleted_item;
#define STRING_HASH_1(_key_, _result_) { \
unsigned char const *kk = (unsigned char const *) (_key_) - 1; \
- while (*++kk) \
- (_result_) += (*kk << (kk[1] & 0xf)); \
+ unsigned int __rot = 30097; \
+ while (*++kk) { \
+ __rot += kk[1]; \
+ (_result_) += (unsigned long)*kk << (__rot & 15); \
+ __rot = (__rot << 1) | (__rot >> 31); \
+ (_result_) <<= 1; \
+ } \
} while (0)
#define return_STRING_HASH_1(_key_) do { \
unsigned long result = 0; \
--
Do what you love because life is too short for anything else.
- [bug-idutils] using %= in hash_find_slot, and optimized STRING_HASH_1,
Sami Farin <=