chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] [PATCH] Fix #1173 (weak symbol GC)


From: Peter Bex
Subject: [Chicken-hackers] [PATCH] Fix #1173 (weak symbol GC)
Date: Wed, 24 Aug 2016 22:06:08 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

Hi all,

After several days immersion in the garbage collector code, I fixed the
weak symbol GC.  Originally, because rewriting code is easier than
understanding it, I wrote a patch that refactors and considerably
simplifies the GC by removing the whole weak symbol table kerfuffle.
I still think this is the way forward, but since it subtly changes
how symbols need to be handled, I think it's better to delay this for
CHICKEN 5.

While preparing said patch I actually did build up an understanding
about what was going wrong in the first place, with the weak reference
table.  The attached patch explains and fixes the situation, and I also
explain it a bit in the ticket, but let's try again in other words:

First, during minor GC and reallocing GC, the weak symbol table is
ignored, so symbols and buckets (the "pairs" that form the linked list
chain in the symbol hash table buckets) are simply treated as regular
Scheme objects, no problem.

So we're dealing only with major GC.  There's special casing going on
here.  The basic idea is to perform a reference count of the symbol,
and if only one reference exists (i.e., the global symbol table),
we can drop the symbol.  In slightly more detail, really_mark() has
these special-cased actions for symbols and buckets:

- If we are marking a symbol, something referred to it, so we look it
   up in the weak item table and, if found, increment its counter by
   one (to a maximum of 2).  If the symbol has no entry, we don't do
   anything.
- If we are marking a bucket containing a symbol, we create an entry
   in the weak item table for the symbol with a counter of 0, unless
   the symbol inside was already forwarded: then the counter is
   "saturated" at 2.  That's because we obviously already marked the
   symbol, so we missed the previous bullet an unknown number of times.
- After the major GC, walk the entire symbol table and check if it has
   a corresponding entry in the weak item table with a counter that's
   less than 2.  If it does, and the symbol has no global binding or
   plist, we can drop it: only one reference exists (the symbol itself).

Unfortunately, this doesn't work in all cases.  The reason is that
when we store the item in the forwarding table, we store the pointer
that's inside the bucket's first slot.  Because we are inside a major
GC, this pointer is typically a heap (fromspace) pointer, even if the
symbol was previously moved from the nursery during minor GC.

Let's say the bucket's slot was also updated itself during the minor GC,
so the bucket itself refers to fromspace as well.  But, there may be
other unmarked references which still refer to the symbol's original
location in the nursery.  Because the major GC step 1 above checks the
item's pointer before chasing forwarding pointers, it will not even
see that we're marking a symbol because the header bits are for a
forwarding pointer, not a symbol.

Alternatively, the reference may be to the symbol's location in the
fromspace, but the symbol has already moved to the tospace.

The patch "simply" adds another check of the header type after chasing
the forwarding pointer.  If it's a symbol, we look up the *original*
location in the weak table and increment the counter.  This also happens
if there are two forwarding pointers, then we need to look up the
original pointer *or* the intermediate pointer in the weak table.

This is all extremely fiddly, hard to understand and even if you
understand it (somewhat), it's hard to explain too (it is *very* clever
and impressive code, though!).  That's why I think we're better off
refactoring the whole thing, for which a patch will follow.  Besides,
after *that* patch, I think we can get rid of the special BUCKET_TYPE,
which means we can free up a reserved tag number, re: the discussion
about my extended numbers refactoring patch.

But first, this patch should be applied to master.  It also applies
cleanly to the chicken-5 branch, so I think to preserve sanity in the
meantime we should apply it to both branches: having a change only in
master is too confusing.

PS: Testing this seems tricky, but a simple way is to simply compile
with "SYMBOLGC=1" on make, which will unconditionally enable weak
symbol GC.  Go ahead, try it with an unpatched CHICKEN and watch the
test suite crash and burn :)

PPS: The "knucleotide" benchmark is a great one to test performance of
weak symbol GC, it creates boatloads of symbols.

Cheers,
Peter

Attachment: 0001-Fix-symbol-GC-add-wep-lookup-after-fptr-chasing.patch
Description: Text Data

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]