Poor hashtable collision performance

bug-m4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Poor hashtable collision performance

From:	Eric Blake
Subject:	Poor hashtable collision performance
Date:	Sat, 17 Apr 2021 07:38:54 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

Consider the following program I wrote to divide a 50,000 byte string
into a stack of one character per definition (this is supposed to be an
O(n log n) task, because it repeatedly cuts the input in half, and
should be more efficient than iterating on
substr(`$1',0,1)$0(substr(`$1',1) with its O(n^2) complexity):

$ cat tmp
define(`chew', `ifelse($1, 1, `pushdef(`stack', substr(`$2',0, 1))',
  `$0(eval($1/2), substr(`$2', 0, eval($1/2))_)$0(eval($1 - $1/2),
  substr(`$2', eval($1/2)))')')dnl
chew(50000, format(`%050000d', 0))dnl
$ time m4 tmp

real    0m22.264s
user    0m22.180s
sys     0m0.010s
$ time m4 -H517 tmp

real    0m0.321s
user    0m0.316s
sys     0m0.004s

What gives?  It turns out that with the default hash size of 509,
'stack' and 'substr' hash to the same bucket, and as the pushdef depth
of stack increases, the lookup time for 'substr' gets progressively
worse.  With -H517, the two names hash to different buckets and no
longer interfere.  I'm working on a patch to separate the pushdef chain
from the hash bucket chain, so that hash collisions only have to visit
one instance of a colliding name, not the entire stack.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[Prev in Thread]

Current Thread

[Next in Thread]

Poor hashtable collision performance, Eric Blake <=

Next by Date: Re: Latest M4 fails M4 checks
Next by thread: Re: Latest M4 fails M4 checks
Index(es):
- Date
- Thread