chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] "argvector" chicken (was: ABI woes)


From: felix . winkelmann
Subject: [Chicken-hackers] "argvector" chicken (was: ABI woes)
Date: Tue, 21 Jul 2015 18:28:19 +0200

Hello!

I have implemented an alternative approach for compiling CPS calls in CHICKEN
to avoid the problem our current way of doing CPS calls.

To recapitulate the situation: Apple uses a modification of the ARM64 ABI that
ruthlessly punishes C code that assumes that non-vararg function calls are
compatible with functions declared as varargs (and vice versa, actually even
depending on the exact numbers of vararg/non-vararg arguments.)

Previously, we passed fixed arguments via normal C function parameters, in the
hope of generating faster code, as more arguments can be passed in registers
(on machines that have sufficient registers, that is.) This worked for several
years but C compilers have recently been starting to cut corners by exploiting
anything not explicitly defined in the C standard. As we need generic function
pointers (where the call-site may not know the exact type of function beeing
called), something else needs to be done.

The new approach passes all arguments in a stack-allocated C_word array. Since
CPS calls never return, the array just gets popped after the next minor
garbage collection. The advantage is that CPS calls become much simpler 
(including
the code that compiles this) as every CPS function is of type

  void (func)(C_word c, C_word *av) [noreturn]

The disadvantage is more allocation in the nursery. This doesn't increase GC
time as such, because only live data is traced during a reclamation, but
may increase the number of minor collections (the nursery fills up faster.)

The system seems to work, I was able to run the tests-suite completely. I have
not tested any other code so far. The performance is, surprisingly (and
according to my experiments, which may be flawed), quite good. Actually not
significantly slower and in some cases even faster. This is strange, and more
real-world testing with long-running, heavily-allocating code may have
different results. On the other hand CPS calls are much simpler, there is no
need to use varargs (with a few small exceptions), the implementation of
multiple values and argument-save/-restore is vastly simpler and the code is
smaller, as "trampolines" (C functions that take arguments saved in a 
previously triggered GC and unpacks them, calling the original function again)
can be completely dropped.

There is even some room for more optimization: "av"s (argument vectors) may be
reused from call to call (if the following call doesn't use more arguments as
the current), or we could even use the same av for all calls (effectively using
global variables for call arguments). Also, multiple value handling could in
some cases be inlined, I think, reducing the overhead of multiple value forms
quite a lot (and they were quite slow with the old way of compiling stuff.)

Some notable changes in the source code:

- The "apply hack" is gone, completely.

- The hackery for AMD64 is gone, as is the evil way we generate C_procXXX
  types and the generic apply code in chicken.h/runtime.c.

- The maximal number of arguments is limited by the "temporary stack". Note
  that this is not fixed (and depends on temp-stack usage), and I had to remove
  some code in "apply-test.scm", as it assumed a fixed limit. The "official"
  arg-limit is 2000 now.

- I have pushed to branches: "argvector" and "argvector-bootstrap" (containing
  only the changes in the C compiler backend.)

- To compile it, you need a modified bootstrapping compiler. The simplest way
  is to checkout "argvector-bootstrap", make a static "boot-chicken", checkout
  "argvector", touch all *.scm files and recompile with the static bootstrapping
  compiler.

Feedback is welcome. As this seems to run well, is not significantly slower,
solves current and future ABI problems, and simplifies the runtime-system quite
a lot, I strongly recommend to consider changing CHICKENs code generation
generally to use this approach. Porting this to CHICKEN 5 should be some work,
but doable. The changes for hand-coded CPS functions (in runtime.c, which grew
considerably in CHICKEN 5) are straightforward, but still need manual adaption.
I can help here, but would like to hear Peter's opinion about this, since he
wrote the bignum code (the largest part of the changes in runtime.c.) 


felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]