[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gcl-devel] efficiency of package symbol tables
From: |
Peter Wood |
Subject: |
Re: [Gcl-devel] efficiency of package symbol tables |
Date: |
Tue, 9 Mar 2004 05:42:58 +0100 |
User-agent: |
Mutt/1.4i |
Hi
On Thu, Mar 04, 2004 at 10:05:04PM +0100, Bruno Haible wrote:
> Hi,
>
> Sam Steingold and I found a performance problem in clisp, and it appears
> that GCL has the same problem: When tens of thousands of symbols are added
> to a package, further package lookups and symbol additions become slower
> and slower. This is of course not the proper behaviour for good hash tables.
>
> Here is the test code:
>
> (defun random-symbol ()
> (intern (write-to-string (random 1000000000000000000) :base 36)
> *package*))
> (compile 'random-symbol)
>
> (defun fill-symtab (n)
> (dotimes (i n) (random-symbol)))
> (compile 'fill-symtab)
...
> What helped us in order to understand the problem, was to look at the
> internal distribution, how many symbols in each bucket of the hash table
> on average...
GCL computes the hash index based on the length of the
symbol-name... Collecting some statistics on your #'random-symbol:
(defun symlen-stat (n)
(let ((statarr (make-array 13 :element-type 'fixnum :initial-element 0)))
(dotimes (i n)
(incf (aref statarr (length (write-to-string (random 1000000000000000000)
:base 36)))))
statarr))
>(symlen-stat 10)
#(0 0 0 0 0 0 0 0 0 0 0 1 9)
>(symlen-stat 100)
#(0 0 0 0 0 0 0 0 0 0 0 15 85)
>(symlen-stat 1000)
#(0 0 0 0 0 0 0 0 0 0 1 130 869)
>(symlen-stat 10000)
#(0 0 0 0 0 0 0 0 0 3 41 1299 8657)
>(symlen-stat 100000)
#(0 0 0 0 0 0 0 0 0 14 379 12797 86810)
So although the symbols generated by #'random-symbol differ randomly
in their symbol-name they don't do so in their length, but are heavily
weighted towards the longest length possible. Presumably, one could
work around this bug (if it is a bug) by generating symbols which
differed randomly in their length over a wide range.
In the best case, with a variation of 1 to 12 chars per symbol, you
will still only be storing the symbols in 12 different buckets.
Eg, if it doesn't matter if some symbols will be ||, you could do:
(defun random-symbol ()
(let ((symlen (random 50)))
(intern (map 'string #'(lambda (c) (setf c (code-char (+ 48 (random (- 122
48))))))
(make-string symlen)))))
Now I get the following results (with gc notification turned on so we
know where the time is really being spent).
>(time (fill-symtab 10000))
[GC for 0 RELOCATABLE-BLOCKS pages..(T=1).GC finished]
real time : 1.000 secs
run time : 0.390 secs
NIL
>(time (fill-symtab 10000))
real time : 0.000 secs
run time : 0.420 secs
NIL
>(time (fill-symtab 10000))
[GC for 0 RELOCATABLE-BLOCKS pages..(T=10).GC finished]
real time : 0.000 secs
run time : 0.510 secs
NIL
>(time (fill-symtab 10000))
real time : 0.000 secs
run time : 0.400 secs
NIL
>(time (fill-symtab 10000))
real time : 1.000 secs
run time : 0.410 secs
NIL
>(time (fill-symtab 10000))
[GC for 1000 SYMBOL pages..(T=12).GC finished]
real time : 0.000 secs
run time : 0.530 secs
NIL
>(time (fill-symtab 10000))
[GC for 0 CONTIGUOUS-BLOCKS pages..(T=18).GC finished]
[GC for 0 CONTIGUOUS-BLOCKS pages..(T=16).GC finished]
real time : 1.000 secs
run time : 0.790 secs
NIL
>(time (fill-symtab 10000))
real time : 1.000 secs
run time : 0.400 secs
NIL
>(time (fill-symtab 10000))
[GC for 1500 SYMBOL pages..(T=14).GC finished]
real time : 1.000 secs
run time : 0.550 secs
NIL
>(time (fill-symtab 10000))
real time : 0.000 secs
run time : 0.410 secs
NIL
>>(time (fill-symtab 10000))
real time : 0.000 secs
run time : 0.410 secs
NIL
>(time (fill-symtab 10000))
real time : 0.000 secs
run time : 0.410 secs
NIL
>(time (fill-symtab 10000))
[GC for 1000 STRING pages..(T=22).GC finished]
[GC for 2250 SYMBOL pages..(T=20).GC finished]
real time : 0.000 secs
run time : 0.830 secs
NIL
>(time (fill-symtab 10000))
[GC for 0 CONTIGUOUS-BLOCKS pages..(T=33).GC finished]
[GC for 0 CONTIGUOUS-BLOCKS pages..(T=29).GC finished]
real time : 1.000 secs
run time : 1.170 secs
NIL
>(time (fill-symtab 10000))
real time : 1.000 secs
run time : 0.400 secs
NIL
Regards,
Peter
- Re: [Gcl-devel] efficiency of package symbol tables, (continued)
- Re: [Gcl-devel] efficiency of package symbol tables, Bruno Haible, 2004/03/21
- Re: [Gcl-devel] efficiency of package symbol tables, Camm Maguire, 2004/03/21
- Re: [Gcl-devel] efficiency of package symbol tables, Bruno Haible, 2004/03/22
- Re: [Gcl-devel] efficiency of package symbol tables, Camm Maguire, 2004/03/22
- Re: [Gcl-devel] efficiency of package symbol tables, Bruno Haible, 2004/03/22
- Re: [Gcl-devel] efficiency of package symbol tables, Camm Maguire, 2004/03/22
- Re: [Gcl-devel] efficiency of package symbol tables, Bruno Haible, 2004/03/23
- Re: [Gcl-devel] efficiency of package symbol tables, Camm Maguire, 2004/03/23
- Re: [Gcl-devel] efficiency of package symbol tables, Bruno Haible, 2004/03/24
- Re: [Gcl-devel] efficiency of package symbol tables, Camm Maguire, 2004/03/30
Re: [Gcl-devel] efficiency of package symbol tables,
Peter Wood <=
Re: [Gcl-devel] efficiency of package symbol tables, Peter Wood, 2004/03/10