chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] better hash function


From: Alex Shinn
Subject: [Chicken-hackers] better hash function
Date: Wed, 19 Mar 2008 14:26:07 +0900
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1.50 (darwin)

The current Chicken hash function is very bad - it computes
quickly but doesn't do a good job of shifting the bits
around.  More importantly, it ignores all but the last 8
bytes of the input.  I compared it against two other
functions, the function described as "good" at

  http://burtleburtle.net/bob/hash/hashfaq.html

and the FNV hash, a well known function also used in at
least MIT Scheme.

Benchmarking hash tables is tricky because it's very data
dependent.  If you use an even distribution of inputs
(e.g. all 4-letter words) then even bad hash functions may
distribute them evenly, and the benchmark may thus reward
the bad hash function if it just computes faster.  On the
other hand, you can generally deliberately attack bad hash
functions.  For example, the simple benchmark:

    ;; let's count bunnies!
    (do ((i 0 (+ i 1)))
        ((>= i 100000))
      (string->symbol
       (string-append (number->string i 16) " bunnies")))

took more than 36 seconds on average with Chicken's current
hash, as opposed to 0.47 seconds with FNV (almost a 100x
improvement).

For a more everyday life example, I also benchmarked just
putting /usr/share/dict/words into a hash-table:

#!/usr/local/bin/csi -script

(use extras)

(define dict (make-hash-table string-ci=?))

(define (load-dict . o)
  (let ((in (if (pair? o) (car o) (current-input-port))))
    (let lp ()
      (let ((line (read-line in)))
        (unless (eof-object? line)
          (hash-table-update! dict line add1 (constantly 0))
          (lp))))))

(if (pair? (command-line-arguments))
    (call-with-input-file (car (command-line-arguments)) load-dict)
    (load-dict))
(print (hash-table-size dict))

The average running times (running four times and discarding
the initial) were:

    Original: 1.573 seconds
    FAQ Hash: 1.220 seconds
    FNV Hash: 1.162 seconds

-- 
Alex

Attachment: runtime.c.diff
Description: hash improvement


reply via email to

[Prev in Thread] Current Thread [Next in Thread]