[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-hackers] [PATCH] More efficient numbers integration through
From: |
Peter Bex |
Subject: |
Re: [Chicken-hackers] [PATCH] More efficient numbers integration through "scratchspace" |
Date: |
Tue, 21 Apr 2015 13:27:00 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Tue, Apr 21, 2015 at 12:11:18PM +0200, Felix Winkelmann wrote:
> > The scratch area is seen as an extension of the nursery: the
> > C_demand() and C_stack_probe() checks will add the size of the scratch
> > space to the stack's size, and trigger a minor GC when the sum of the
> > two exceeds the maximum nursery size.
>
> So this is just a heuristic, as the data in scratch-space never makes
> it into the nursery, but will end up in the heap, right?
That's right. That's why I described it as "an extension of the nursery":
it isn't actually stored on the stack, but you could see the nursery as the
(virtual) union of the data on the stack and the data in the scratchspace.
> > Because numbers are "atomic" in Scheme, we
> > can also assume that nothing else is going to mutate this slot.
>
> Ingenious.
Thank you :)
> > To visualise this, the memory layout of the scratch space looks like this:
> >
> > -------------------------------------------------------------------
> > LO | size_0 | slot_0 | ..digits.. | ... | size_n | slot_n | ..digits.. | HI
> > -------------------------------------------------------------------
> >
> > If we take "LO" to be the starting address of the scratch space and look at
> > the first word that's stored there, we first see a "size_0" which indicates
> > how many words the first object uses. If we calculate (LO + size_0 + 2),
> > we reach the location of size_1, and so on, until we reach the address of
> > size_n. If we add size_n + 2 to the address of size_n, we go beyond "HI",
> > which is the scratch space's upper bound (stored in C_scratchspace_limit).
>
> Are the "..digits.." parts full data-objects, including header?
Yes. They are the complete internal strings pointed to by the data slot
of the bignum record object. This keeps everything relatively simple, and
makes it easy to let the minor GC copy it from the pseudo-nursery into the
heap: the only thing I had to tweak in the GC itself is the pointer checks
to whether it's in the nursery or heap.
> > When bignum X must be "freed" (which is done manually in the algorithms
> > written in C), we can simply update slot_X to NULL. This way, when the
> > next reallocation of the scratch space occurs, we can simply move over
> > this object and skip copying it to the new scratch space.
>
> Sp the scratchspace for this particular object ("X") is then lost,
> until the next GC occurs, which will clear the scratch-space
> completely?
Right. It is still sitting there, taking up space, until either the
scratch space needs to be resized, or until the next GC occurs.
> > We also need to do some bookkeeping so that when a function returns,
> > it will transfer all its "live" objects to buffers allocated by the
> > caller, and "free" any objects that are no longer needed, even if the
> > next function's local data would overwrite the stack anyway. My initial
> > implementation didn't do that, but I quickly ran into trouble, because
> > sometimes another machine word may end up exactly on the place where a
> > bignum lived in the previous function, and the object had the same bit
> > pattern as the bignum, so on realloc the system thought it could replace
> > the machine word with the new location of the stale bignum.
>
> What functions do you mean here? All functions in general, or only
> numerical primitives?
Any function which allocates scratch data in a local buffer which is then
dropped (usually by returning). Generated C functions don't do this
because they never return, and in fact most C functions accept an
allocation pointer, which means the buffer is under the caller's control,
so they won't need to do this either.
In short: only special primitives that use the scratch space need to do
this.
> > So, to move objects around, there's a "C_migrate_buffer_object" function
> > which receives the boundaries of the local buffer and an allocation
> > pointer. It scans for objects still living in the local buffer and
> > moves those to the allocation pointer. Any slots pointing to objects
> > in the scratch area are detected, and the "slot_N" word immediately
> > preceding the object in the scratch area is updated to point to the
> > moved object's slot. This is done recursively, so that for example
> > a cplxnums containing two ratnums, which both contain two bignums will
> > be correctly updated, regardless of where any of the 7 objects lives.
> > To keep things manageable, C_migrate_buffer_object also accepts a NULL
> > allocation pointer to indicate that any objects in scratch space must
> > be freed.
>
> I thought wrappers exist in the stack and only immutable (raw) data is
> located in the scratch-space, or is that assumption wrong? Can "obj"
> in C_migrate_buffer_object refer to any kind of object, or only to
> data-blocks (i.e. bignum digit strings)? As only bignum-digits can be
> of unknown size, wouldn't it be sufficient to restrict the use of
> scratchspace for these?
You're right about the wrapper objects: Currently, only bignum data
string objects are allocated in scratchspace. However, any object
can be passed to C_migrate_buffer_object: for example, a complex number
allocated in a local buffer will need to be traversed, because it may
contain fixnums, flonums, bignum wrapper objects or ratnums (which
themselves may contain fixnums or bignum wrapper objects) that may or
may not also exist in the local buffer. These objects need to be
migrated to the other buffer, recursively.
> The performance impact looks acceptable. That runtime.c has grown to
> 12000 lines is unfortunate and I just hope you'll never get run over
> by a bus, because a) you are a fine chap, and b) nobody will want to
> maintain the horrible mess of code that is required for all this to
> work (I understand that there isn't much one can do, but I just wanted
> to mention it.) I do take a certain pride in the fact that we have a
> function called "bignum_divide_burnikel_ziegler" in our codebase,
> though.
Yay! :)
I do want to point out that there's a whole boatload of code in
runtime.c that I marked "OBSOLETE", which can be removed after we
have a working snapshot tarball. This will reduce the size of
runtime.c again, considerably. But yeah, it's messy and big and
complicated. That's the biggest disadvantage of making everything
inlineable. In the non-inlineable codebase, a lot of this stuff
was shorter and simpler and beautiful Scheme code. Now it's just
a big ball of hairy C that nobody wants to touch.
> So merging this into chicken-5 is fine for me, even if it's just for
> the work you put into that stuff. But I would really like to
> understand the scratchspace thing better.
Of course, it wouldn't be useful if we merged it but nobody else
understood it! I'm here to answer any question y'all may have.
I'm also very happy that you find this version acceptable.
Hopefully we may find other places that could make use of the scratch
space, to get more performance wins to (sort of) make up for the
minor performance loss this change introduces.
Cheers,
Peter
signature.asc
Description: Digital signature
Re: [Chicken-hackers] [PATCH] More efficient numbers integration through "scratchspace", Felix Winkelmann, 2015/04/21
- Re: [Chicken-hackers] [PATCH] More efficient numbers integration through "scratchspace",
Peter Bex <=