[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-hackers] better hash function
From: |
Alex Shinn |
Subject: |
[Chicken-hackers] better hash function |
Date: |
Wed, 19 Mar 2008 14:26:07 +0900 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.1.50 (darwin) |
The current Chicken hash function is very bad - it computes
quickly but doesn't do a good job of shifting the bits
around. More importantly, it ignores all but the last 8
bytes of the input. I compared it against two other
functions, the function described as "good" at
http://burtleburtle.net/bob/hash/hashfaq.html
and the FNV hash, a well known function also used in at
least MIT Scheme.
Benchmarking hash tables is tricky because it's very data
dependent. If you use an even distribution of inputs
(e.g. all 4-letter words) then even bad hash functions may
distribute them evenly, and the benchmark may thus reward
the bad hash function if it just computes faster. On the
other hand, you can generally deliberately attack bad hash
functions. For example, the simple benchmark:
;; let's count bunnies!
(do ((i 0 (+ i 1)))
((>= i 100000))
(string->symbol
(string-append (number->string i 16) " bunnies")))
took more than 36 seconds on average with Chicken's current
hash, as opposed to 0.47 seconds with FNV (almost a 100x
improvement).
For a more everyday life example, I also benchmarked just
putting /usr/share/dict/words into a hash-table:
#!/usr/local/bin/csi -script
(use extras)
(define dict (make-hash-table string-ci=?))
(define (load-dict . o)
(let ((in (if (pair? o) (car o) (current-input-port))))
(let lp ()
(let ((line (read-line in)))
(unless (eof-object? line)
(hash-table-update! dict line add1 (constantly 0))
(lp))))))
(if (pair? (command-line-arguments))
(call-with-input-file (car (command-line-arguments)) load-dict)
(load-dict))
(print (hash-table-size dict))
The average running times (running four times and discarding
the initial) were:
Original: 1.573 seconds
FAQ Hash: 1.220 seconds
FNV Hash: 1.162 seconds
--
Alex
runtime.c.diff
Description: hash improvement
- [Chicken-hackers] better hash function,
Alex Shinn <=