gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gcl-devel] hash redux


From: Camm Maguire
Subject: [Gcl-devel] hash redux
Date: Mon, 11 Nov 2013 12:49:41 -0500

Greetings!

Just a followup -- the remaining time appears to be in sethash for an
'equal hash-table:

=============================================================================
index % time    self  children    called     name
                             103979625             sethash [1]
[1]     84.2    2.11   49.03       0+103979625 sethash [1]
               22.58    6.16 167566772/167566772     fShash_equal [2]
                0.00   20.28  119656/131885      alloc_relblock [6]
                0.01    0.00  119656/205048      alloc_object [47]
                             103979625             sethash [1]
-----------------------------------------------
               22.58    6.16 167566772/167566772     sethash [1]
[2]     47.3   22.58    6.16 167566772         fShash_equal [2]
                5.25    0.00 363849475/363849475     hash_eql [12]
                0.91    0.00 1174935219/1174940911     eql1 [18]
                0.00    0.00      12/2577623     Fand <cycle 2> [151]
-----------------------------------------------
                0.29    5.78       3/14          make_cons [9]
                1.06   21.19      11/14          alloc_relblock [6]
[3]     46.6    1.35   26.97      14         GBC [3]
               26.93    0.00 25304834/25331171     sgc_mark_object1 <cycle 1> 
[5]
=============================================================================

This is somewhat remarkable, as the 'eql gethash calls which greatly
dominate in number are no longer on the radar.  Presumably the algorithm
makes some complex cons, (definitely not your grandmother's '(1 2 3)
list), uses an 'equal hashtable to make it equal-unique, and then uses
that as a key in an 'eql hashtable for the real heavy work.

This just reminded me of the work we did earlier regarding the loading
of complex conses in compiled files, which overloaded the #= reader
until we memoized the routine calculating the hash-equal index.  This is
barely necessary to the gcl compiler -- the point is to catch errors
where the constant list to be compiled in changes during compilation.
And as I indicated earlier, we flush the memoizing hash tables on each
compile-file.  This, together with the implementation of the 'hybrid' #=
algorithm suggested by Jared, made the loading of these conses very
fast.

My question is if we've learned anything which might make the above
results yet faster.  By default, the hash-equal index descends no more
than three levels, car and cdr, into a cons to xor up the index.  It
does not attempt to descend the entire structure memoizing as one goes
like the compiler version.  There the depth limit is much greater (1000)
due to its purpose and the absence of any table.  My intuition tells me
that there is no way a memoized version of the generic hash-equal would
pay off.  It seems we would have to flush on each call, or never.  It
would only speed up index calculations of great depth, which is only
useful in hash tables if your index is insufficiently random at the
default depth of 3.  This does not appear the case, as #'equal itself is
absent from the profiling report, implying that the hit rate to the
index is good.

I suppose an 'equal hashtable could keep an 'eq hashtable internally for
the life of the table.  That might be interesting.

In any case, I don't want to waste a lot of time reinventing some
wheel.  If you or any of the other hashtable experts have some wisdom
here, I'd be most appreciative.

Take care,
--
Camm Maguire                                        address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah




reply via email to

[Prev in Thread] Current Thread [Next in Thread]