|
From: | Peter Bex |
Subject: | Re: [Chicken-hackers] Release blocker? Stack checks completely FUBAR? Or am I just paranoid? |
Date: | Mon, 29 Feb 2016 22:28:39 +0100 |
User-agent: | Mutt/1.5.21 (2010-09-15) |
On Thu, Feb 25, 2016 at 07:37:46PM +0100, Peter Bex wrote: > I'm a bit ill at the moment, so maybe it's just my fevered brain > talking, but if I understand correctly there are two problems here: > > 1) Generated CPS functions will allocate what they need on the > stack for local variables up front, then do a C_stack_probe() > or a C_demand, but they will allocate an arbitrary amount of > words for the argvectors *WITHOUT ANY CHECK WHETHER IT WILL FIT*. OK, here we go! Attached are two patches. The first patch reworks the code generation a bit to make the second patch possible. The second patch adds argvector sizes to the calculation performed by C_demand(). This problem was indeed caused by argvectors being allocated, but it was exacerbated by the fact that the code generation would only include a stack check for "(and (not direct) (> demand 0))". In other words, only CPS functions that actually *allocated* data would get a stack check. This is not exactly correct, because a non-allocating CPS function might call another non-allocating CPS function or a direct function which calls another direct function, and so on. In other words, non-allocating CPS functions can still build up stack! This wasn't much of a problem in the past (pre-argvector) because the buildup was small and C_STACK_RESERVE big enough in most situations to avoid running out of stack. With argvectors the buildup can be much bigger, of course. Anyway, the first patch reworks the code to use C_demand() everywhere for simplicity and to make the second patch possible, but it *also* removes the "(> demand 0)" part of the aforementioned check. This can have a performance impact (I haven't benchmarked how big the impact is yet), but I'm afraid we can't do without; I kept getting random stack overflow and "recursion too deep or circular data encountered" errors if I left (> demand 0) in there. This also explains why #1236 suddenly cropped up in the "numbers" tests after the argvector change. I think/hope this patch will reduce the urgency of fixing #1236. Please note that the "max-av" calculation is not very precise: it will be somewhat pessimistic; if a called function is customizable, it will not cause an argvector to be allocated, but we don't know that where the signatures are collected. To limit the impact/scope/size of the patch I re-used the signature machinery (which seems to have been unused until now?). A better patch might be more precise and actually count the exact argvector size. Also, the signature collection doesn't know about the "selfarg" that is used in "push-args" to determine whether to add 1 to the argvector size, so it will sometimes be slightly optimistic. I think this is acceptable, because the one word stack use will be neglible when compared to the red zone, return address and arguments that gcc will allocate. (speaking of which, has anyone tried to find out what the performance benefit is, if any, of -mno-red-zone?) Finally, I'm not terribly happy with the C_needed_stack() macro; it's rather ugly. But at least it doesn't require a too big change (we can't mess with the signature of C_demand() I think, for compat reasons). > 2) C_stack_probe() and C_demand() will check if the stack pointer exceeds > C_stack_limit and trigger a GC. However, C_stack_overflow_check > will call C_stack_check1() which adds C_STACK_RESERVE. > That means that a C_demand() check might say "yeah, sure, it fits in > the stack", whereas the function that it calls will do a > C_stack_overflow_check which says "nope, it won't fit in the stack > minus the stack reserve". This was fever-driven bullshit. The C_stack_probe and C_demand checks will ensure we only allocate up to C_stack_limit, and C_stack_check1 allows the stack usage to go beyond C_stack_limit, up to C_STACK_RESERVE bytes extra. This is correct and by design. The brainfart I made was that I assumed that C_stack_limit was the absolute limit and that we had to stay inside it at all costs, with a margin of C_STACK_RESERVE on the *inside*, which is untrue. The stack growing upwards or downwards is an extra mindfuck that makes this code extra tricky to follow, that's why I questioned it. Cheers, Peter
0001-Simplify-code-generation-by-only-using-C_demand.chicken-5.patch
Description: Text Data
0002-Include-argvector-size-in-C_demand-calculations.chicken-5.patch
Description: Text Data
0001-Simplify-code-generation-by-only-using-C_demand.master.patch
Description: Text Data
0002-Include-argvector-size-in-C_demand-calculations.master.patch
Description: Text Data
signature.asc
Description: Digital signature
[Prev in Thread] | Current Thread | [Next in Thread] |