chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] [PATCH] Use heuristic for analysis database size


From: Peter Bex
Subject: [Chicken-hackers] [PATCH] Use heuristic for analysis database size
Date: Sat, 11 Feb 2012 18:53:48 +0100
User-agent: Mutt/1.4.2.3i

Hi!

I've had another crack at the numbers test performance.  All the profiling
I did kept pointing at ##sys#hash-table-ref as the source of most of the
run-time overhead.  In fact, there was a rather big difference between
time spent in ##sys#hash-table-ref and ##sys#hash-symbol, which seemed to
me that the hash table would be overcrowded; otherwise it would spend
some time hashing the symbol, fetching the bucket and returning the item
in the bucket.  Only if it needed to traverse a rather long list of bucket
contents I would expect a difference between the runtimes.

These hash tables are used for storing the analysis database which stores
some properties that have been deduced for all identifiers in the program.

I printed the number of entries stored in the analysis hash table while
compiling the numbers test, and it turns out that the initial number of
entries is 111486, whereas the number of buckets the hash table is
initialized with is only 3001.  This means we have an expected bucket
population of 37 items, which is way overcrowded.

Since the numbers tests bottleneck was worst in analysis phase 2, this
gave me the idea to re-use the previous run's total number of hash table
entries as a predictor of the next run's size; the database only shrinks
during the compiler's run as variables are eliminated or replaced by
constants, so it would normally be a good, but slightly optimistic guess.

After experimenting with different hash table sizes, I figured that to
avoid collisions the hash table should have about three times as many
buckets as entries it is expected to have to minimize the number of
collisions; here are the compilation times for the numbers test with
various multiplication factors:

1: 95.75s
2: 81.84s
3: 69.22s
4: 69.37s
5: 69.35s

As a comparison, the original compilation time with the fixed number of
buckets (3001) was 102.34s; this about halves the time.

This also explains the reason behind the fact that the individual
compilation times of the tests were still a lot lower when added
together than the compilation time of the full numbers test (which
simply includes all the other tests).  Right now the numbers are much
closer to a simple addition:

numbers-test-ashinn.scm  0.87s
numbers-test-gauche.scm 12.82s
r4rstest.scm             3.86s
string-conversion.scm   24.66s
numbers-test.scm         4.31s
sum-of-exponents.scm     0.12s +
==============================
total                   46.64s

run.scm:                69.22s

This is a whole lot closer to the expected time than 102.34s versus 51s
(which was the added compilation times of these files before this patch)

I think there's still some to be gained elsewhere; hash-table-ref is still
high up in the profiling output.  Of course the compiler calls this a
*lot* (all the "get" and "put!" calls to populate the analysis database
invoke ##sys#hash-table-ref) so it's to be expected that most of the
bottlenecks are going to be here.  I'll investigate some more, and maybe
do some tests with different hashing algorithms, too.
Meanwhile, I think this patch is a good improvement.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                        -- Donald Knuth

Attachment: 0001-Use-previous-run-s-identifier-database-size-as-a-heu.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]