chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] [PATCH] Hash all the record slots


From: Peter Bex
Subject: Re: [Chicken-hackers] [PATCH] Hash all the record slots
Date: Sat, 16 Jul 2016 20:35:56 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

On Sun, May 29, 2016 at 10:27:25PM +0200, lemonboy wrote:
> Hello,
> While investigating the ticket #1293 [1] I noticed that the hashing
> procedures would stop hashing all the vectors
> at the 4th element and while that's a valuable way to speed up the
> hashing process it causes some problems in
> the wild.

Hi Lemonboy,

Sorry for getting back to this mail so late.

> If you use a record with N > 4 slots as a key and then set! one of the
> last slots after the 4th one the equal?-hash
> procedure would return the same hash where equal? definitely fails.

This is not really a problem is it?  If the hash is identical, these
two objects would simply be put in the same bucket of the hash table,
but it wouldn't confuse the two on lookup or insertion AFAICT; that's
what the "test" procedure is for.  Upping the limit is simply trading
one cost (lookup/insertion) for another (hashing).  Removing it
altogether just removes the upper bound on hashing time, which doesn't
seem like a great idea.  The limit probably was put in there for a
reason.  Maybe Felix can enlighten us here?

On the other hand, limiting the hash calculation at an arbitrarily low
value like we do seems broken, too.  However, if we fix this, we should
also fix it for vectors.  I don't really see why vectors should be
treated fundamentally different.

> The attached patch fixes this problem by lifting the limit when
> hashing tagged vectors aka records, the rationale
> behind this decision is that the number of slots a record has is
> usually very small and it's more likely to use a
> record as a key instead of a huge vector.

The record wouldn't typically be very huge, but it *could* be.  And
the vector could be any size, too and has the same problems if the vector
is 5 or 6 slots big with a distinction only on the later slots.  Wouldn't
it make more sense to just increase this limit to 8, 16 or some other
arbitrarily chosen value that's a little better for larger objects?

> While inspecting this I also came across another issue, eq?-hash
> crunches some plain values by just xor-ing them
> with a random seed, while it calls hash-equal? for complex objects
> like records/vectors, that's IMO quite a convoluted
> way to implement it (and it might also be wrong as eq? performs a
> shallow comparison only) that could be replaced by
> stirring [2] the raw C_Word value as that's faster (and incidentally
> also what guile does [3]).
> (Rinse and repeat for the eqv? version)

I don't think that would work in general, because the raw C_word can be
a pointer which gets changed when the target object is moved around by
the GC.  But I'm sure you know this.  We could do this for immediates,
of course.  But wouldn't the typical structure of these words (with some
key bits set for all values of this type) interfere with this "stirring"?

In any case, we can all agree that the srfi-69 module is pretty
(almost hopelessly) broken.  That's one reason we made it into an egg
for CHICKEN 5.  Might you be interested in taking over maintainership
of this egg?  It sounds like you have some interesting ideas, and it
makes more sense to iterate quickly on an egg than keep tweaking this
code in core, just before what (hopefully) will be the final release
of a CHICKEN in the 4 series.

What do you think?

Cheers,
Peter

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]