pika-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pika-dev] issues in choosing a broken heart


From: Matthew Dempsky
Subject: [Pika-dev] issues in choosing a broken heart
Date: Fri, 16 Jan 2004 19:04:26 +0100
User-agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3 (gnu/linux)

Just so that these issues don't get lost into the void, I thought I
would document the issues involved in the implementation of Pika's
copying GC.  Those familiar with copying GCs can probably skip over
the first few paragraphs -- I include them only for completeness.
This is really more in depth than needed, but I've found Tom's
thoroughness on the SRFI-50 mailing to be crucial in understanding
many aspects of Pika I took for granted, so maybe others can benefit
from this similarly.

Whenever Pika tries to allocate a new object, it first checks if
memory is available from a singly linked list.  If so, it simply
claims that memory and removes it from the list.  If not available, it
runs a the gc routines and if memory is still not available, allocates
a new block of memory and creates a new linked list from it.
(Initially the GC routines were just stubs, so it would always
allocate a new page.)

The GC routines begin by walking the stack frames and pushing pointers
to all the non-immediate t_scm_word's (pairs, boxed doubles, and
vtable objects) into an array.  After this is done, we enter a loop
until the array is empty.  One at a time, elements are removed from
the array and the memory address they point to is checked for a
'broken heart' (just a constant value).  If not found, in a new arena,
space for the object is allocated and the old value copied over (along
with any t_scm_words held by the object being added to the array);
then the broken heart value is written over the original memory
followed by a pointer to the new memory address and the t_scm_word on
the stack is updated with a new value.  If the broken heart value was
found, then the local value is updated without copying the object.
Once the array is empty, we free the old arena's pages and copy the
new ones in its place.

Now for the issues involved in choosing a broken heart:

We need a broken heart value that will never be written to where we're
looking for it except by the GC.

For the vtable objects this idiom is fine - the broken heart value
currently gets written to where a vtable pointer is normally found so
all we need is a value which will never be used as a vtable pointer.

For pairs, the broken heart overwrites the car address which is a
t_scm_word, not a struct vtable *.  We need to be more careful in our
choice of the broken heart value now, because, for example, if we
chose 0x12344321, then once a user performs (cons 76353736 <x>) and
then the GC runs, any pointers to that will be randomly relocated.  In
the best case, the program crashes -- worst case, the user finds some
way to exploit the new "cons cell".  It gets worse though...

What about for boxed doubles?

On x86 platforms at least, the first word holds the lower 32-bits of
the mantissa of the double so essentially, any 32-bit bitstring is
valid and naturally occuring there.  Crap...

One possibility is, that since boxed doubles are only assigned once it
would be feasible to check if the 32-bits is equal to our broken heart
and if so toggle the least significant bit.  This would be a
relatively simple fix to make, but has the ugliness that it loses
precision once every 4294967296 floating point operations.

Another possibility is to reorder our broken heart and new pointer
value - it would be just as well to use the second word for the broken
heart and first for the new pointer.  IEEE doubles define a range of
2^52nd NaNs which can be uniquely identified by the upper word, which
(on x86) is stored second.  Without being able to find any
documentation explaining the significance of different NaNs, it seems
like it would be a safe bet to claim a certain range of these values
for ourselves and either silently change specific binary
representations of NaN into others (or return an error).  However,
this implementation then would depend on byte order, word size, and
wouldn't work in the presence of vtable objects who claim the memory
following the vtable pointer as binary data since we have _no_
control over that (unless we special cased boxed double handling).

So arguably the best (and at least simplest) solution is to allocate
four words for boxed doubles and store the double in the second memory
address.  (Comments in reps/tags.h imply that boxed doubles should
take four words of memory anyways.)

Note: as of now (address@hidden/scm--devo--0.1--patch-10) this
isn't correctly handled, but it should be soon.  (And once we have a
better GC none of this will matter anymore.)

-jivera




reply via email to

[Prev in Thread] Current Thread [Next in Thread]