tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correc


From: Kirill Smelkov
Subject: Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correctly
Date: Thu, 13 Dec 2012 20:55:41 +0400
User-agent: Mutt/1.5.21 (2010-09-15)

Daniel, Thomas, first of all I'm sorry it took me so long to reply.

( Also, if there are some stupid mistakes wrt arm assembly, please forgive
  me. I've tried to make my analisys resilent to maybe-present
  off-by-ones, and reply sooner than later, not spending more
  20min*several-days to be 100% correct. )

On Thu, Dec 06, 2012 at 02:23:25PM +0100, Thomas Preud'homme wrote:
> Le jeudi 6 décembre 2012 04:50:55, vous avez écrit :
> > > 
> > > Hi Kirill,
> > > 
> > > did you make any progress on the issue since Daniel's comments? Could you
> > > let me know when you push a patch so that I can test it and bump master
> > > to equal the mob branch?
> > 
> > Hi Thomas,
> > 
> > Unfortunately not that much. I was trying to investigate
> > bounds-checking failures for `tcc -b -run tcc ...` tests first and it looks
> > I'm close to understand what is going on (old bcheck code is not adapted
> > to /proc/sys/kernel/randomize_va_space = 2). The plan was to go to arm
> > issues then, but I'm going very slow, because for tcc I have only 15-20
> > minutes per day and it is all in underground. Sorry. Could this issue
> > please wait me for some time? (probably week or so...)
> 
> That's a worthwhile bug to fix. The ARM issue can wait, sure. We can also 
> disable the test for ARM as this code never worked for ARM and it's thus not 
> a 
> regression.


On Fri, Nov 30, 2012 at 12:43:07AM +0100, Daniel Glöckner wrote:
> On Thu, Nov 29, 2012 at 09:43:33PM +0400, Kirill Smelkov wrote:
> > tcc first saves r0 & r1, and only then fp:
> > 
> >     $ ./arm-eabi-tcc -c y.c
> >     $ arm-linux-gnueabi-objdump -d y.o
> > 
> >     00000000 <f>:
> >        0:   e1a0c00d        mov     ip, sp
> >        4:   e92d0003        push    {r0, r1}
> >        8:   e92d5800        push    {fp, ip, lr}
> >        c:   e28db00c        add     fp, sp, #12
> >       10:   e1a00000        nop                     ; (mov r0, r0)
> >       14:   e91ba800        ldmdb   fp, {fp, sp, pc}
> 
> I chose this stack frame format because it simplifies access to the
> arguments from within tinycc. At that time I didn't think of trying to
> be compatible to anything just to allow stack traces.

I see. Let's first see what is happenning with gcc and decide how it
should be, then come back here to maybe fixing/adjusting tcc.


> > gcc does not, but it will save r5,r6,etc. before fp(=r13 iirc) ...
> > 
> >     $ arm-linux-gnueabihf-gcc-4.7 -marm -c y.c 
> >     $ arm-linux-gnueabihf-objdump  -d y.o
> >     
> >     00000000 <f>:
> >        0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
> >        4:   e28db000        add     fp, sp, #0
> >        8:   e24dd00c        sub     sp, sp, #12
> >        c:   e50b0008        str     r0, [fp, #-8]
> >       10:   e50b100c        str     r1, [fp, #-12]
> >       14:   e28bd000        add     sp, fp, #0
> >       18:   e8bd0800        pop     {fp}
> >       1c:   e12fff1e        bx      lr
> 
> You should not look at a leaf function to derive the GCC stack frame.
> It is probably different from the generic stack frame because GCC
> knows this function will never be part of a stack trace done by
> another function of the final program.

Fair enough, thanks.


> Take a look at arch/arm/kernel/stacktrace.c inside the Linux kernel.
> They assume that every function pushes {fp, sp, lr, pc} and that
> fp points to the address where pc is stored. I don't know why
> GCC pushes pc. I can only imagine this being done to keep the
> stack aligned to 8 bytes. Or maybe it is for exception handling
> or association of stack frame debug info..

from arch/arm/kernel/stacktrace.c:

    * With framepointer enabled, a simple function prologue looks like this:
    *   mov     ip, sp
    *   stmdb   sp!, {fp, ip, lr, pc}
    *   sub     fp, ip, #4
    *
    * A simple function epilogue looks like this:
    *   ldm     sp, {fp, sp, pc}

so since fp is initialized from ip-4 (old_sp-4) it points to (near ?)
fp, not pc, right?

also linux kernel is compiled with

    -mapcs -mno-sched-prolog -mabi=apcs-gnu

and this changes frame layout compared to what gcc produces by default.

(sorry for that near - i don't remember stack and stmdb arm details
 good enough)

> 
> >        8:   e92d0870        push    {r4, r5, r6, fp}        ; NOTE r4... go 
> > before fp
> 
> >       14:   e28db00c        add     fp, sp, #12
> 
> >       5c:   e59b0000        ldr     r0, [fp]                ; BUG here!
> 
> Why is this a bug in GCC? It will load the old value of fp to r0.

Hmm, after push here is how stack looks like:

        r4  <- old_sp
        r5  <- sp + 12
        r6
        fp
            <- sp

(again, off-by-one error probably, but anywayy that sp+12 can't point to
old fp).

~~~~

Anyway, I've prepared more simple complete example to demonstrate the
bug:

    ---- 8< ---- fp.c
    #include <stdio.h>
    
    void *func(int x)
    {
        char *fp1 = __builtin_frame_address(1);
    
        printf("Hello World! %i\n", x);
    
        fp1 += x;
        return fp1;
    }
    
    int main()
    {
        void *fp0 = __builtin_frame_address(0);
        void *fp1 = func(0);
    
        printf("fp0: %p  fp1: %p\n", fp0, fp1);
        return 0;
    }

The code should output

    Hello World! 0
    fp0: <addr>  fp1: <the-same-addr!>

right? on i386:

    $ gcc -o fp_i386 -O2 fp.c
    $ ./fp_i386 
    Hello World! 0
    fp0: 0xafeda538  fp1: 0xafeda538


but it does not on arm, look:

    $ arm-linux-gnueabi-gcc-4.4 -marm -o fp_arm -O2 fp.c
    $ qemu-arm -L /usr/arm-linux-gnueabi ./fp_arm 
    Hello World! 0
    fp0: 0x4080010c  fp1: 0x8414

and the problem here is that gcc, as I think, incorrectly setups fp:

    $ arm-linux-gnueabi-gcc-4.4 -marm -o fp.s -S -O2 fp.c
    $ cat fp.s
            .cpu arm9tdmi
            .fpu softvfp
            .eabi_attribute 20, 1
            .eabi_attribute 21, 1
            .eabi_attribute 23, 3
            .eabi_attribute 24, 1
            .eabi_attribute 25, 1
            .eabi_attribute 26, 2
            .eabi_attribute 30, 2
            .eabi_attribute 18, 4
            .file   "fp.c"
            .text
            .align  2
            .global func
            .type   func, %function
    func:
            @ Function supports interworking.
            @ args = 0, pretend = 0, frame = 0
            @ frame_needed = 1, uses_anonymous_args = 0
            stmfd   sp!, {r4, r5, fp, lr}
            add     fp, sp, #12
            mov     r4, r0
            ldr     r5, [fp, #0]        ; <- bug here, should be [fp, #-4]
            mov     r1, r4
            ldr     r0, .L3
            bl      printf
            add     r0, r5, r4
            sub     sp, fp, #12
            ldmfd   sp!, {r4, r5, fp, lr}
            bx      lr
    .L4:
            .align  2
    .L3:
            .word   .LC0
            .size   func, .-func
            .align  2
            .global main
            .type   main, %function
    main:
            @ Function supports interworking.
            @ args = 0, pretend = 0, frame = 0
            @ frame_needed = 1, uses_anonymous_args = 0
            stmfd   sp!, {fp, lr}
            mov     r0, #0
            add     fp, sp, #4
            bl      func
            mov     r1, fp
            mov     r2, r0
            ldr     r0, .L7
            bl      printf
            mov     r0, #0
            sub     sp, fp, #4
            ldmfd   sp!, {fp, lr}
            bx      lr
    .L8:
            .align  2
    .L7:
            .word   .LC1
            .size   main, .-main
            .section        .rodata.str1.4,"aMS",%progbits,1
            .align  2
    .LC0:
            .ascii  "Hello World! %i\012\000"
            .space  3
    .LC1:
            .ascii  "fp0: %p  fp1: %p\012\000"
            .ident  "GCC: (Debian 4.4.6-11) 4.4.6"
            .section        .note.GNU-stack,"",%progbits


and if I apply the following patch

    diff --git a/t/fp.s b/t/fp1.s
    index 2047b05..85bb402 100644
    --- a/t/fp.s
    +++ b/t/fp1.s
    @@ -20,7 +20,7 @@ func:
            stmfd   sp!, {r4, r5, fp, lr}
            add     fp, sp, #12
            mov     r4, r0
    -       ldr     r5, [fp, #0]
    +       ldr     r5, [fp, #-4]
            mov     r1, r4
            ldr     r0, .L3
            bl      printf

it does work:

    $ arm-linux-gnueabi-gcc-4.4 -o fp fp.s
    $ arm-linux-gnueabi-gcc-4.4 -o fp1 fp1.s
    $ qemu-arm -L /usr/arm-linux-gnueabi ./fp
    Hello World! 0
    fp0: 0x4080012c  fp1: 0x8414

    $ qemu-arm -L /usr/arm-linux-gnueabi ./fp1
    Hello World! 0
    fp0: 0x4080012c  fp1: 0x4080012c

that's why I think gcc generated wrong code here.

And to the reference, here is how code would look like if being part of linux:

    $ arm-linux-gnueabi-gcc-4.4 -o fpapcs.s     -marm -mapcs -mno-sched-prolog  
-S -O2 fp.c
    $ arm-linux-gnueabi-gcc-4.4 -o fpapcs_gnu.s -marm -mapcs -mno-sched-prolog 
-mabi=apcs-gnu -S -O2 fp.c
    $ diff -u fp.s fpapcs.s
    --- fp.s    2012-12-13 20:34:09.000000000 +0400
    +++ fpapcs.s        2012-12-13 20:34:09.000000000 +0400
    @@ -17,16 +17,17 @@
        @ Function supports interworking.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 1, uses_anonymous_args = 0
    -   stmfd   sp!, {r4, r5, fp, lr}
    -   add     fp, sp, #12
    +   mov     ip, sp
    +   stmfd   sp!, {r4, r5, fp, ip, lr, pc}
    +   sub     fp, ip, #4
        mov     r4, r0
        ldr     r5, [fp, #0]
        mov     r1, r4
        ldr     r0, .L3
        bl      printf
        add     r0, r5, r4
    -   sub     sp, fp, #12
    -   ldmfd   sp!, {r4, r5, fp, lr}
    +   sub     sp, fp, #20
    +   ldmfd   sp, {r4, r5, fp, sp, lr}
        bx      lr
     .L4:
        .align  2
    @@ -40,17 +41,18 @@
        @ Function supports interworking.
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 1, uses_anonymous_args = 0
    -   stmfd   sp!, {fp, lr}
    +   mov     ip, sp
    +   stmfd   sp!, {fp, ip, lr, pc}
    +   sub     fp, ip, #4
        mov     r0, #0
    -   add     fp, sp, #4
        bl      func
        mov     r1, fp
        mov     r2, r0
        ldr     r0, .L7
        bl      printf
        mov     r0, #0
    -   sub     sp, fp, #4
    -   ldmfd   sp!, {fp, lr}
    +   sub     sp, fp, #12
    +   ldmfd   sp, {fp, sp, lr}
        bx      lr
     .L8:
        .align  2

    $ diff -u fpapcs.s fpapcs_gnu.s
    --- fpapcs.s    2012-12-13 20:34:09.000000000 +0400
    +++ fpapcs_gnu.s        2012-12-13 20:32:51.000000000 +0400
    @@ -1,13 +1,3 @@
    -       .cpu arm9tdmi
    -       .fpu softvfp
    -       .eabi_attribute 20, 1
    -       .eabi_attribute 21, 1
    -       .eabi_attribute 23, 3
    -       .eabi_attribute 24, 1
    -       .eabi_attribute 25, 1
    -       .eabi_attribute 26, 2
    -       .eabi_attribute 30, 2
    -       .eabi_attribute 18, 4
            .file   "fp.c"
            .text
            .align  2


but it does not work either:

    $ arm-linux-gnueabi-gcc-4.4 -o fpapcs -mapcs -mno-sched-prolog fpapcs.s
    $ qemu-arm -L /usr/arm-linux-gnueabi ./fpapcs
    Hello World! 0
    fp0: 0x4080011c  fp1: 0x83e0


~~~~

There are many arm abis and maybe I'm doing something stupid, but to me
all this looks like gcc generates incorrect code.



> > ... as probably required by arm calling convention (not looked at the spec
> > yet, but it seems reasanoble, given how push is really a block str and
> > that str stores register in ascending order).
> 
> I have once seen a page in MSDN describing the ARM stack frame of
> Windows CE. I don't know if ARM specified how stack frames should look
> like.

>From gcc manual:

`-mapcs-frame'
     Generate a stack frame that is compliant with the ARM Procedure
     Call Standard for all functions, even if this is not strictly
     necessary for correct execution of the code.  Specifying
     `-fomit-frame-pointer' with this option will cause the stack
     frames not to be generated for leaf functions.  The default is
     `-mno-apcs-frame'.

so at least there is "ARM Procedure Call Standard". Sorry, no time to
search for it...


Thanks,
Undertimed Kirill



reply via email to

[Prev in Thread] Current Thread [Next in Thread]