pika-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pika-dev] SCM_LSET question


From: Tom Lord
Subject: Re: [Pika-dev] SCM_LSET question
Date: Fri, 6 Feb 2004 08:50:52 -0800 (PST)


    > From: Andreas Rottmann <address@hidden>

    > I'm not totally clear about this macro. It's to be used for
    > setting locals, which are protected by
    > SCM_PROTECT_FRAME. However, those can be set by function calls,
    > like scm_make_false (&l.bar, arena). Since the code in
    > scm_make_false doesn't use SCM_LSET for setting l.bar, this
    > leads me to the question if there shouldn't be a macro like
    > SCM_RSET (for result-set) or if SCM_LSET can't be dropped...

Ok, here's the deal.

The FFI is designed so that:

  ~ GC can be precise
  ~ GC can be concurrent
  ~ GC can be asynchronous
  ~ GC can be copying
  ~ GC can be incremental
  ~ The representation of values, t_scm_word, can be any C type

To achieve those goals, the FFI has to have a some properties:

* No t_scm_word rvalues

   The FFI interface must never _require_ that Scheme values
   (t_scm_word values) wind up in anonymous storage known only to the
   compiler.

   For example, the interface to CAR can _not_ be:

        l.answer = scm_car (arena, pair)

   because after `scm_car' returns and before the result is stored
   in `answer', the `t_scm_word' value would exist only in a register
   or similar anonymous location managed by the compiler.   GC would
   not be able to find it -- so it would not be GC protected and would
   not be updated if the value were relocated by a copying GC.

   Instead, the inteface is:

        scm_car (&l.answer, arena, pair)

   We regard `scm_car' as  primitive: internally, it has to coordinate
   with GC.   But to a caller, that interface means that we never have
   the return value as an "intermediate value" -- by the time
   `scm_car' returns, the result is stored in `l.answer' where the GC
   can reliably find it.

   This rule effects some other idioms, too.   You can't copy local
   variables or parameters in a way that creates an rvalue.  
   So you can _not_ say:

        l.tail = l.fast

   or

        *answer = l.tail

   or

        l.list = *parameter

   but have to instead say:

        SCM_LSET (&l.tail, &l.fast)

   and

        SCM_LSET (answer, &l.tail)

   and

        SCM_LSET (&l.list, parameter)


   To better understand that, you can think of Scheme memory as being
   abstractly described as a set of Locations, each of which holds a
   single Scheme value.  For example, a cons pair contains two
   Locations.  A local Scheme variable in your C code is a Location.

   In the FFI, the type `t_scm_word' is, in essense, the type of
   a Scheme Location.

   So when you declare some local variables:

     struct length_locals
         {
           SCM_FRAME;
           t_scm_word slow;
           t_scm_word fast;
           t_scm_word tail;
         } l;

     SCM_PROTECT_FRAME (l);
     [...]

   what you're doing is creating some new Locations (slow, fast, tail,
   in this case) which happen to be GC roots.

   Everywhere in the FFI, Locations are referred to by _address_, not
   name.   So, for example, one of the locations created by that 
   declaration is called:

        &l.slow

   All of the primitives in the FFI operate on locations, not scheme
   values.  So:

        SCM_LSET (to, from)

   means, "copy the value in location `from' to location `to'"
   and 

        scm_car (&l.answer, arena, pair)

   means "store the CAR of the value in location `pair' in the 
   location `&l.answer'"

   When you write a function and it takes `t_scm_word *' input
   parameters, really, that means that that it's taking as paremters
   some locations to operate on.   It's `t_scm_word *' output
   parameters are the (possibly overlapping) locations where results
   should be stored.

   You could think of the FFI primitives as the instruction set of
   a virtual machine that has no registers -- only locations.  When
   you write new functions in libscm, you're creating "macro
   instructions" out of those primitives.


* treat t_scm_word as an opaque type

  This is really implied by the "no rvalues" rule, but it's worth
  mentioning separately.

  You mustn't write code like:

        if (l.fast == l.slow) /* circular list? */

  For one thing, you're using `t_scm_word' as an rvalue, violating the
  rules above.    But for another thing, you're treating `t_scm_word'
  as a non-opaque type and assuming that it can be compared for
  equality using `=='.

  Instead, that has to be written:

        if (scm_values_eq (arena, &l.fast, &l.slow)) /* circular list? */
   
  which asks the FFI implementation to tell you if the values stored
  in locations `&l.fast' and `&l.slow' are EQ?.    For one thing,
  that allows the equality test to be more complicated than just
  EQ? (as it might have to be in, say, an incrementally copying GC).


Those rules are consistent with the goals:

  ~ GC can be precise

    Because the GC always knows where all of the Locations are and 
    Values are stored only in Locations.

  ~ GC can be concurrent
  ~ GC can be asynchronous

    Because Values are only ever stored in Locations -- never 
    in registers or "elsewhere".

  ~ GC can be copying
  ~ GC can be incremental

    Because Values are only in Locations and Locations are never read
    or written to directly.  The GC is free to impose a read or write
    barrier on Locations and to update them at any time.

  ~ The representation of values, t_scm_word, can be any C type

    FFI-using code never does anything but take the address of a
    Location.  It doesn't care what C-type is used to represent that
    Location.



Now the tricky part:

   > Since the code in scm_make_false doesn't use SCM_LSET for setting
   > l.bar, this leads me to the question if there shouldn't be a
   > macro like SCM_RSET (for result-set) or if SCM_LSET can't be
   > dropped...

No.   You have to understand the code of Pika is being split into two
layers:

                
        everything else 
        ---------------
             reps

and their function is:

        everything else:
           use the "core FFI" and extend it
        -----------------------------------
        reps:
           implement the "core FFI"

For example, you earlier fixed a bug by moving the declaration for
`t_scm_word' from the "everything else" part of the code to the "reps"
part where it belongs.   That declaration is part of the
implementation of the core FFI.

SCM_LSET is another part of the core FFI -- defined in reps.

`scm_make_false' is a part of the core FFI.

The reps part, the core FFI, is designed to be "swappable".   For
example, if you want to use a different GC, you can do that just by
modifying or replacing the reps layer -- all the other code remains
unchanged.

The current "reps" layer is being designed with the primary goals:

  ~ get it working quickly and easily
  ~ use it for bootstrapping the project

It _doesn't_ try to implement the core FFI in a way that is thread
safe or that has incremental GC.   Internally, it doesn't have to
follow the strict rules about manipulating `t_scm_word' values.

So, for example, `scm_make_false' consists of just the code:

        *result = scm_false;

because that's the simplest thing given the primary goals.

Later on, another version of `scm_make_false' in a fancier
implementation of REPS might look more like:

        block_gc_tracer ();
        *result = scm_false;
        unblock_gc_tracer ();

or might look like:

        mutator_yield ();
        *result = scm_false;

or

        *result = scm_false;
        mutator_yield ();

or

        suspicious_location_begin (result);
        *result = scm_false;
        suspicious_location_end (result);

or

        SCM__REPS_STORE_IMMEDIATE (result, scm_false);

or
        (who knows ...)

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]