[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-hackers] "argvector" chicken (was: ABI woes)
From: |
Ivan Raikov |
Subject: |
Re: [Chicken-hackers] "argvector" chicken (was: ABI woes) |
Date: |
Tue, 21 Jul 2015 09:52:00 -0700 |
Hi Felix,
If you are interested in further testing for potential performance
impact, the rb-tree and kd-tree libraries rely heavily on CPS calls
for tree traversal, and the test data size can be easily increased to
millions of elements. Unfortunately I won't have time to test this
until mid-August at the earliest.
-Ivan
On Tue, Jul 21, 2015 at 9:28 AM, <address@hidden> wrote:
> Hello!
>
> I have implemented an alternative approach for compiling CPS calls in CHICKEN
> to avoid the problem our current way of doing CPS calls.
>
> To recapitulate the situation: Apple uses a modification of the ARM64 ABI that
> ruthlessly punishes C code that assumes that non-vararg function calls are
> compatible with functions declared as varargs (and vice versa, actually even
> depending on the exact numbers of vararg/non-vararg arguments.)
>
> Previously, we passed fixed arguments via normal C function parameters, in the
> hope of generating faster code, as more arguments can be passed in registers
> (on machines that have sufficient registers, that is.) This worked for several
> years but C compilers have recently been starting to cut corners by exploiting
> anything not explicitly defined in the C standard. As we need generic function
> pointers (where the call-site may not know the exact type of function beeing
> called), something else needs to be done.
>
> The new approach passes all arguments in a stack-allocated C_word array. Since
> CPS calls never return, the array just gets popped after the next minor
> garbage collection. The advantage is that CPS calls become much simpler
> (including
> the code that compiles this) as every CPS function is of type
>
> void (func)(C_word c, C_word *av) [noreturn]
>
> The disadvantage is more allocation in the nursery. This doesn't increase GC
> time as such, because only live data is traced during a reclamation, but
> may increase the number of minor collections (the nursery fills up faster.)
>
> The system seems to work, I was able to run the tests-suite completely. I have
> not tested any other code so far. The performance is, surprisingly (and
> according to my experiments, which may be flawed), quite good. Actually not
> significantly slower and in some cases even faster. This is strange, and more
> real-world testing with long-running, heavily-allocating code may have
> different results. On the other hand CPS calls are much simpler, there is no
> need to use varargs (with a few small exceptions), the implementation of
> multiple values and argument-save/-restore is vastly simpler and the code is
> smaller, as "trampolines" (C functions that take arguments saved in a
> previously triggered GC and unpacks them, calling the original function again)
> can be completely dropped.
>
> There is even some room for more optimization: "av"s (argument vectors) may be
> reused from call to call (if the following call doesn't use more arguments as
> the current), or we could even use the same av for all calls (effectively
> using
> global variables for call arguments). Also, multiple value handling could in
> some cases be inlined, I think, reducing the overhead of multiple value forms
> quite a lot (and they were quite slow with the old way of compiling stuff.)
>
> Some notable changes in the source code:
>
> - The "apply hack" is gone, completely.
>
> - The hackery for AMD64 is gone, as is the evil way we generate C_procXXX
> types and the generic apply code in chicken.h/runtime.c.
>
> - The maximal number of arguments is limited by the "temporary stack". Note
> that this is not fixed (and depends on temp-stack usage), and I had to
> remove
> some code in "apply-test.scm", as it assumed a fixed limit. The "official"
> arg-limit is 2000 now.
>
> - I have pushed to branches: "argvector" and "argvector-bootstrap" (containing
> only the changes in the C compiler backend.)
>
> - To compile it, you need a modified bootstrapping compiler. The simplest way
> is to checkout "argvector-bootstrap", make a static "boot-chicken", checkout
> "argvector", touch all *.scm files and recompile with the static
> bootstrapping
> compiler.
>
> Feedback is welcome. As this seems to run well, is not significantly slower,
> solves current and future ABI problems, and simplifies the runtime-system
> quite
> a lot, I strongly recommend to consider changing CHICKENs code generation
> generally to use this approach. Porting this to CHICKEN 5 should be some work,
> but doable. The changes for hand-coded CPS functions (in runtime.c, which grew
> considerably in CHICKEN 5) are straightforward, but still need manual
> adaption.
> I can help here, but would like to hear Peter's opinion about this, since he
> wrote the bignum code (the largest part of the changes in runtime.c.)
>
>
> felix
>
> _______________________________________________
> Chicken-hackers mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/chicken-hackers
- [Chicken-hackers] "argvector" chicken (was: ABI woes), felix . winkelmann, 2015/07/21
- Re: [Chicken-hackers] "argvector" chicken (was: ABI woes),
Ivan Raikov <=
- Re: [Chicken-hackers] "argvector" chicken (was: ABI woes), John Cowan, 2015/07/21
- [Chicken-hackers] "argvector" chicken (was: ABI woes), felix . winkelmann, 2015/07/21
- Re: [Chicken-hackers] "argvector" chicken, Mario Domenech Goulart, 2015/07/21
- Re: [Chicken-hackers] "argvector" chicken (was: ABI woes), Peter Bex, 2015/07/24
- [Chicken-hackers] "argvector" chicken (was: ABI woes), felix . winkelmann, 2015/07/26