chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] Release blocker? Stack checks completely FUBAR? Or


From: Peter Bex
Subject: Re: [Chicken-hackers] Release blocker? Stack checks completely FUBAR? Or am I just paranoid?
Date: Mon, 29 Feb 2016 22:28:39 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Feb 25, 2016 at 07:37:46PM +0100, Peter Bex wrote:
> I'm a bit ill at the moment, so maybe it's just my fevered brain
> talking, but if I understand correctly there are two problems here:
> 
> 1) Generated CPS functions will allocate what they need on the
>    stack for local variables up front, then do a C_stack_probe()
>    or a C_demand, but they will allocate an arbitrary amount of
>    words for the argvectors *WITHOUT ANY CHECK WHETHER IT WILL FIT*.

OK, here we go!

Attached are two patches.  The first patch reworks the code generation
a bit to make the second patch possible.  The second patch adds argvector
sizes to the calculation performed by C_demand().

This problem was indeed caused by argvectors being allocated, but it
was exacerbated by the fact that the code generation would only include
a stack check for  "(and (not direct) (> demand 0))".  In other words,
only CPS functions that actually *allocated* data would get a stack
check.

This is not exactly correct, because a non-allocating CPS function might
call another non-allocating CPS function or a direct function which calls
another direct function, and so on.  In other words, non-allocating CPS
functions can still build up stack!  This wasn't much of a problem in the
past (pre-argvector) because the buildup was small and C_STACK_RESERVE
big enough in most situations to avoid running out of stack.

With argvectors the buildup can be much bigger, of course.  Anyway, the
first patch reworks the code to use C_demand() everywhere for simplicity
and to make the second patch possible, but it *also* removes the
"(> demand 0)" part of the aforementioned check.

This can have a performance impact (I haven't benchmarked how big the
impact is yet), but I'm afraid we can't do without; I kept getting
random stack overflow and "recursion too deep or circular data encountered"
errors if I left (> demand 0) in there.  This also explains why #1236
suddenly cropped up in the "numbers" tests after the argvector change.
I think/hope this patch will reduce the urgency of fixing #1236.


Please note that the "max-av" calculation is not very precise: it will
be somewhat pessimistic; if a called function is customizable, it will
not cause an argvector to be allocated, but we don't know that where the
signatures are collected.  To limit the impact/scope/size of the patch
I re-used the signature machinery (which seems to have been unused until
now?).  A better patch might be more precise and actually count the
exact argvector size.

Also, the signature collection doesn't know about the "selfarg" that is
used in "push-args" to determine whether to add 1 to the argvector size,
so it will sometimes be slightly optimistic.  I think this is acceptable,
because the one word stack use will be neglible when compared to the red
zone, return address and arguments that gcc will allocate.

(speaking of which, has anyone tried to find out what the performance
 benefit is, if any, of -mno-red-zone?)

Finally, I'm not terribly happy with the C_needed_stack() macro; it's
rather ugly.  But at least it doesn't require a too big change (we can't
mess with the signature of C_demand() I think, for compat reasons).

> 2) C_stack_probe() and C_demand() will check if the stack pointer exceeds
>    C_stack_limit and trigger a GC.  However, C_stack_overflow_check
>    will call C_stack_check1() which adds C_STACK_RESERVE.
>    That means that a C_demand() check might say "yeah, sure, it fits in
>    the stack", whereas the function that it calls will do a
>    C_stack_overflow_check which says "nope, it won't fit in the stack
>    minus the stack reserve".

This was fever-driven bullshit.  The C_stack_probe and C_demand checks
will ensure we only allocate up to C_stack_limit, and C_stack_check1
allows the stack usage to go beyond C_stack_limit, up to C_STACK_RESERVE
bytes extra.  This is correct and by design.

The brainfart I made was that I assumed that C_stack_limit was the
absolute limit and that we had to stay inside it at all costs, with a
margin of C_STACK_RESERVE on the *inside*, which is untrue.  The stack
growing upwards or downwards is an extra mindfuck that makes this code
extra tricky to follow, that's why I questioned it.

Cheers,
Peter

Attachment: 0001-Simplify-code-generation-by-only-using-C_demand.chicken-5.patch
Description: Text Data

Attachment: 0002-Include-argvector-size-in-C_demand-calculations.chicken-5.patch
Description: Text Data

Attachment: 0001-Simplify-code-generation-by-only-using-C_demand.master.patch
Description: Text Data

Attachment: 0002-Include-argvector-size-in-C_demand-calculations.master.patch
Description: Text Data

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]