tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Proposal for handling alloca(). Anyone see a problem


From: Rob Landley
Subject: Re: [Tinycc-devel] Proposal for handling alloca(). Anyone see a problem with it?
Date: Mon, 7 May 2007 17:33:51 -0400
User-agent: KMail/1.9.1

On Monday 07 May 2007 12:31 pm, David A. Wheeler wrote:
> Please speak up ASAP if you see a big problem, or think of a better idea.
> And if you have a better idea, please let us know if you're willing to code
> it :-).  

Oh I just have stupid questions. :)

> =============================================
> 
> The function alloca allocates memory (like malloc): given a size (in
> chars/bytes), it allocates that memory and returns a pointer to it.  The
> difference is that on function return, all memory allocated by alloca() is
> supposed to be automatically released.   A number of programs REQUIRE
> alloca.    
> 
> Many C compilers implement alloca() by simply manipulating the stack
> pointer.  Indeed, grischka's patch does this (though only for x86-32
> Windows).  But in tinycc's case, the obvious way to do this will fail
> horrifically in many cases, because alloca() can be embedded inside an
> expression, like this:    
>   a( b(4+c()), alloca(x*g()), h()+n());

I always mentally translated alloca() to something like (typecast)(char 
anonymous[size]).  Does c99 allow you to do something like that in 
expressions?  (I know it allows it in for loops...)

I.E. I treated it more as an anonymous variable declaration than an allocation 
function, but it's entirely possible I've misunderstood how it's supposed to 
work...

> In this example, a naive alloca implementation would be in the middle of
> computing this expression, with some intermediate values on the stack, and
> the alloca() call would then screw up the stack DURING the expression
> calculation... causing the whole expression to miscalculate.

And we haven't got any reordering to speak of, which would make it easier to 
move thing just enough to not stomp intermediate values...

> Trying to handle alloca() greatly complicates declaring other values later
> (including nested {...} with declarations in them).

More so than c99's ability to declare local variables anywhere?

> grischk's approach also appears to silently fail for other reasons in some
> cases (e.g., when you declare large variables after the alloca), but I'm
> beating a dead horse.        

He lost me at "only worked on windows".

> In other C compilers, handling alloca() in the middle of an expression is a
> mere matter of a few more lines of code that split up handling of the
> allocation through alloca() separately from the handling of expressions,
> declarations, etc.   That's no big deal in a multi-pass compiler.  But
> tinycc doesn't HAVE an extra processing phase where that kind of logic can
> be easily inserted - we've ALREADY generated code by that point, and we
> won't know till later what declarations are ahead.  There are possible
> tricks, like modifying already-generated code to self-patch, but they're
> hideous, hard to get right, and easy to get wrong.

For the intermediate results problem, we only care about the current 
statement, right?  Is there any way to scan ahead to the end of this 
statement and go "

In terms of the code generator I've thought for a while of checkpointing the 
stream and being able to back up.  (Possibly by outputting to a temporary 
buffer.)  The main reason is dead code elimination, and the main complication 
seems to be also checkpointing new symbols and yanking 'em back out of the 
variable arrays.

But I need to read the rest of the code and then think about it a long time 
before worrying about that much.

> It's very painful to 
> handle {..} inside blocks, or later declarations in the same function, in
> this way.

Huh?  When you say "{..}" do you mean somehting other than adding another 
nesting level (a new block)?

Could you elaborate there?

> Something SIMPLE and CORRECT is important; absolute maximum 
> performance is NOT required, it just has to have 'reasonable' performance.    
>        
> 
> Another naive approach is to simply map alloca() to malloc().  At least that
> gets programs running, and it IS simple, but it leaks memory like no
> tomorrow.  Not nice.  We can do MUCH better.  

In some cases we can replace
  thingy *x=alloca(blah);
with
  char _12345[blah]; thingy *x=(thingy *)_12345;

Just not in all cases.  Is it easy to detect when we can do that, maybe set a 
flag or something to say "alloca becomes hard now".

> Instead, I intend to start with public-domain code by D.A. Gwyn.  It uses
> malloc() to obtain memory, and maintains a linked list containing stack
> depth and allocation information.  Every time you use alloca(), it first
> frees all previous alloca() allocations that were from a deeper stack
> position.  This means that alloca() allocations get freed once a less-deep
> alloca() call occurs.     

So:
  for (i=0; i<999999; i++) {
    int i=snprintf(NULL, 0, stuff);
    char *c=alloca(i);
    write(fd, c, i);
  }

Is going to SUCK then.

> It's not perfect, but for a lot of circumstances this is "more than good
> enough".  Though if all the alloca allocations are at the same level, an
> occasional call to alloca(0) at the "main loop" may be needed to fully free
> them all.    

I need to read Griscka's alloca patch to see where it's inserting stuff.

> I think I can easily tweak this slightly so that alloca() would REALLY keep
> its semantics.   In essence, modify the code generators so that any function
> that calls alloca() would use a DIFFERENT function epilogue. (You can easily
> detect an alloca() use and set a flag, and then look at this flag when
> generating the epilogue.  This means the single-pass nature of tinycc is no
> problem here.)  The alloca-specialized epilogue would walk the linked list
> and free all the allocations at that stack depth or greater before
> returning.  I would also modify longjmp/siglongjmp so that after they reset
> the stack pointer, they'd free everything allocated at a deeper location in
> the stack.  The advantages of this approach are:         

Function epilogue, or block epilogue?

> 1. alloca() would work EXACTLY correctly.  ALL memory allocations would
> ALWAYS be freed, immediately, on function return or stack-unwind due to
> longjmp/siglongjmp.  And you can use alloca() anywhere it's legal to use,
> including deeply embedded inside expressions.  Nice!

That does neutralize my earlier objection, yes.

> 2. It doesn't require major brain surgery to tinycc.  The changes are
> relatively local, and much of the functionality can be in the tinycc library
> (easing maintenance).  
> 3. It only impacts functions that USE alloca().  Functions that don't call
> alloca() don't pay the cost of checking if there's anything to free().  If
> you don't use alloca, it only adds to longjmp a quick check of a pointer (if
> the pointer is non-null, you have work to do anyway).   
> 
> [Oh, why have a single linked list, instead of a per-function linked list? 
> Because on a longjmp you may unwind many functions; it's easier to unwind
> starting from a single master list.  When  freeing, you stop at the first
> item that isn't relevant, so it's probably faster too.]   

No argument.

> There are minor downsides. It's not QUITE as efficient as a traditional
> alloca(), because you're fiddling with malloc() instead of changing the
> stack pointer.  If threading were introduced, these alloca lists would need
> to be stored per-thread and malloc would have to be thread-safe, which is
> doable though annoying.  This approach could introduce memory fragmentation
> via malloc() in theory, though in practice I think that's a non-issue.  I
> think it's worth it.

How do we know when someone is and isn't using the code to build something 
threaded?

> 
> A possible future optimization would be noticing special cases where it'd be 
> okay to use stack pointer manipulations directly, and do that instead.  
> That'd be nifty.  But it's quite complicated even outside complex 
> expressions, because blocks can LATER contain declarators (even standard C 
> requires support for this, via embedded {...}).

Yeah, but isn't it a statement we have to care about, rather than a block?

Nested function calls seem to be the failure case where stuff on the stack 
goes wonky...

> I think this approach will be "efficient enough", it will work correctly in 
> ALL cases, the implementation will be small, and it should be fairly easy to 
> show correct.  It's quite likely that no one will bother to implement that 
> future optimization, since I expect this approach to work "good enough" in 
> all cases.  I'd rather support the general case first, and then only if it 
> MATTERS worry about optimizing special cases.

I thought the original reason malloc() and alloca() differed was one allocated 
on the heap, and one on the stack.  You're proposing emulating allocating on 
the stack with allocating on the heap, in the name of correctness.

Okay...

> Interestingly, once alloca() is implemented, we can implement extensions 
> like non-constant-length automatic arrays.  E.G., we could implement:
> double f(int n) {
>     double x[n];
>     ....
> }
> by having the compiler detect that "n" is not constant and in such cases 
> generate code equivalent to:
>  double * const x = alloca(sizeof(double)*n);

I'm still a touch unclear on what's wrong with going the other way instead, 
and emulating alloca() with an anonymous local char array...

> This would mean that non-constant-length automatic arrays would NOT be 
> stored in the stack but on the heap; code that DEPENDS on exact placement
> for non-constant-length arrays is already insane anyway.
> 
> --- David A. Wheeler 

Rob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]