tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correc


From: Thomas Preud'homme
Subject: Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correctly
Date: Sat, 5 Jan 2013 15:36:49 +0100
User-agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )

Le jeudi 13 décembre 2012 17:55:41, vous avez écrit :

[SNIP]

> 
> from arch/arm/kernel/stacktrace.c:
> 
>     * With framepointer enabled, a simple function prologue looks like
> this: *       mov     ip, sp
>     * stmdb   sp!, {fp, ip, lr, pc}
>     * sub     fp, ip, #4
>     *
>     * A simple function epilogue looks like this:
>     * ldm     sp, {fp, sp, pc}
> 
> so since fp is initialized from ip-4 (old_sp-4) it points to (near ?)
> fp, not pc, right?

stmdb means sp is decremented first (sp = sp - 4 * nbregs) and then register 
are saved in ascending order from this new location. So, pc will end up at 
initial_sp - 4 :) Hence ip points at pc.

> 
> also linux kernel is compiled with
> 
>     -mapcs -mno-sched-prolog -mabi=apcs-gnu
> 
> and this changes frame layout compared to what gcc produces by default.
> 
> (sorry for that near - i don't remember stack and stmdb arm details
>  good enough)

stmdb stands for STore Multiple Decrement Before

> 
> > >        8:   e92d0870        push    {r4, r5, r6, fp}        ; NOTE
> > >        r4... go before fp
> > >       
> > >       14:   e28db00c        add     fp, sp, #12
> > >       
> > >       5c:   e59b0000        ldr     r0, [fp]                ; BUG here!
> > 
> > Why is this a bug in GCC? It will load the old value of fp to r0.
> 
> Hmm, after push here is how stack looks like:
> 
>         r4  <- old_sp
>         r5  <- sp + 12
>         r6
>         fp
>             <- sp
> 
> (again, off-by-one error probably, but anywayy that sp+12 can't point to
> old fp).

Nope. push store register in ascending order from the lowest address to the 
highest address. push {regs} is actually the same as stmdb sp! {regs}. So, 
after that initial push, you'd have:

x  <- old_sp
fp <- sp + 12
r6
r5
r4 <- new_sp

so in fp is stored sp+12 that is the address where is stored the old fp :)

Hence the ldr does load the address of the previous frame in the return 
register (r0).

> 
> ~~~~
> 
> Anyway, I've prepared more simple complete example to demonstrate the
> bug:
> 
>     ---- 8< ---- fp.c
>     #include <stdio.h>
> 
>     void *func(int x)
>     {
>         char *fp1 = __builtin_frame_address(1);
> 
>         printf("Hello World! %i\n", x);
> 
>         fp1 += x;
>         return fp1;
>     }
> 
>     int main()
>     {
>         void *fp0 = __builtin_frame_address(0);
>         void *fp1 = func(0);
> 
>         printf("fp0: %p  fp1: %p\n", fp0, fp1);
>         return 0;
>     }
> 
> The code should output
> 
>     Hello World! 0
>     fp0: <addr>  fp1: <the-same-addr!>
> 
> right? on i386:
> 
>     $ gcc -o fp_i386 -O2 fp.c
>     $ ./fp_i386
>     Hello World! 0
>     fp0: 0xafeda538  fp1: 0xafeda538
> 
> 
> but it does not on arm, look:
> 
>     $ arm-linux-gnueabi-gcc-4.4 -marm -o fp_arm -O2 fp.c
>     $ qemu-arm -L /usr/arm-linux-gnueabi ./fp_arm
>     Hello World! 0
>     fp0: 0x4080010c  fp1: 0x8414
> 
> and the problem here is that gcc, as I think, incorrectly setups fp:
> 
>     $ arm-linux-gnueabi-gcc-4.4 -marm -o fp.s -S -O2 fp.c
>     $ cat fp.s
>             .cpu arm9tdmi
>             .fpu softvfp
>             .eabi_attribute 20, 1
>             .eabi_attribute 21, 1
>             .eabi_attribute 23, 3
>             .eabi_attribute 24, 1
>             .eabi_attribute 25, 1
>             .eabi_attribute 26, 2
>             .eabi_attribute 30, 2
>             .eabi_attribute 18, 4
>             .file   "fp.c"
>             .text
>             .align  2
>             .global func
>             .type   func, %function
>     func:
>             @ Function supports interworking.
>             @ args = 0, pretend = 0, frame = 0
>             @ frame_needed = 1, uses_anonymous_args = 0
>             stmfd   sp!, {r4, r5, fp, lr}
>             add     fp, sp, #12
>             mov     r4, r0
>             ldr     r5, [fp, #0]        ; <- bug here, should be [fp, #-4]
>             mov     r1, r4
>             ldr     r0, .L3
>             bl      printf
>             add     r0, r5, r4
>             sub     sp, fp, #12
>             ldmfd   sp!, {r4, r5, fp, lr}
>             bx      lr
>     .L4:
>             .align  2
>     .L3:
>             .word   .LC0
>             .size   func, .-func
>             .align  2
>             .global main
>             .type   main, %function
>     main:
>             @ Function supports interworking.
>             @ args = 0, pretend = 0, frame = 0
>             @ frame_needed = 1, uses_anonymous_args = 0
>             stmfd   sp!, {fp, lr}
>             mov     r0, #0
>             add     fp, sp, #4
>             bl      func
>             mov     r1, fp
>             mov     r2, r0
>             ldr     r0, .L7
>             bl      printf
>             mov     r0, #0
>             sub     sp, fp, #4
>             ldmfd   sp!, {fp, lr}
>             bx      lr
>     .L8:
>             .align  2
>     .L7:
>             .word   .LC1
>             .size   main, .-main
>             .section        .rodata.str1.4,"aMS",%progbits,1
>             .align  2
>     .LC0:
>             .ascii  "Hello World! %i\012\000"
>             .space  3
>     .LC1:
>             .ascii  "fp0: %p  fp1: %p\012\000"
>             .ident  "GCC: (Debian 4.4.6-11) 4.4.6"
>             .section        .note.GNU-stack,"",%progbits

Agreed, that's weird. I fails equally on my machine (though the code is 
different, maybe because I'm using armv7-a as machine arch).

> 
> 
> and if I apply the following patch
> 
>     diff --git a/t/fp.s b/t/fp1.s
>     index 2047b05..85bb402 100644
>     --- a/t/fp.s
>     +++ b/t/fp1.s
>     @@ -20,7 +20,7 @@ func:
>             stmfd   sp!, {r4, r5, fp, lr}
>             add     fp, sp, #12
>             mov     r4, r0
>     -       ldr     r5, [fp, #0]
>     +       ldr     r5, [fp, #-4]
>             mov     r1, r4
>             ldr     r0, .L3
>             bl      printf
> 
> it does work:
> 
>     $ arm-linux-gnueabi-gcc-4.4 -o fp fp.s
>     $ arm-linux-gnueabi-gcc-4.4 -o fp1 fp1.s
>     $ qemu-arm -L /usr/arm-linux-gnueabi ./fp
>     Hello World! 0
>     fp0: 0x4080012c  fp1: 0x8414
> 
>     $ qemu-arm -L /usr/arm-linux-gnueabi ./fp1
>     Hello World! 0
>     fp0: 0x4080012c  fp1: 0x4080012c
> 
> that's why I think gcc generated wrong code here.

In this case I agree. It seems that no matter what, __builtin_frame_address 
loads fp, no matter where it points.
 
> 
> And to the reference, here is how code would look like if being part of
> linux:
> 
>     $ arm-linux-gnueabi-gcc-4.4 -o fpapcs.s     -marm -mapcs
> -mno-sched-prolog  -S -O2 fp.c $ arm-linux-gnueabi-gcc-4.4 -o fpapcs_gnu.s
> -marm -mapcs -mno-sched-prolog -mabi=apcs-gnu -S -O2 fp.c $ diff -u fp.s
> fpapcs.s
>     --- fp.s  2012-12-13 20:34:09.000000000 +0400
>     +++ fpapcs.s      2012-12-13 20:34:09.000000000 +0400
>     @@ -17,16 +17,17 @@
>       @ Function supports interworking.
>       @ args = 0, pretend = 0, frame = 0
>       @ frame_needed = 1, uses_anonymous_args = 0
>     - stmfd   sp!, {r4, r5, fp, lr}
>     - add     fp, sp, #12
>     + mov     ip, sp
>     + stmfd   sp!, {r4, r5, fp, ip, lr, pc}
>     + sub     fp, ip, #4
>       mov     r4, r0
>       ldr     r5, [fp, #0]
>       mov     r1, r4
>       ldr     r0, .L3
>       bl      printf
>       add     r0, r5, r4
>     - sub     sp, fp, #12
>     - ldmfd   sp!, {r4, r5, fp, lr}
>     + sub     sp, fp, #20
>     + ldmfd   sp, {r4, r5, fp, sp, lr}
>       bx      lr
>      .L4:
>       .align  2
>     @@ -40,17 +41,18 @@
>       @ Function supports interworking.
>       @ args = 0, pretend = 0, frame = 0
>       @ frame_needed = 1, uses_anonymous_args = 0
>     - stmfd   sp!, {fp, lr}
>     + mov     ip, sp
>     + stmfd   sp!, {fp, ip, lr, pc}
>     + sub     fp, ip, #4

thus fp points to the location where pc is stored

>       mov     r0, #0
>     - add     fp, sp, #4
>       bl      func
>       mov     r1, fp
>       mov     r2, r0
>       ldr     r0, .L7
>       bl      printf
>       mov     r0, #0
>     - sub     sp, fp, #4
>     - ldmfd   sp!, {fp, lr}
>     + sub     sp, fp, #12
>     + ldmfd   sp, {fp, sp, lr}
>       bx      lr
>      .L8:
>       .align  2
> 
>     $ diff -u fpapcs.s fpapcs_gnu.s
>     --- fpapcs.s    2012-12-13 20:34:09.000000000 +0400
>     +++ fpapcs_gnu.s        2012-12-13 20:32:51.000000000 +0400
>     @@ -1,13 +1,3 @@
>     -       .cpu arm9tdmi
>     -       .fpu softvfp
>     -       .eabi_attribute 20, 1
>     -       .eabi_attribute 21, 1
>     -       .eabi_attribute 23, 3
>     -       .eabi_attribute 24, 1
>     -       .eabi_attribute 25, 1
>     -       .eabi_attribute 26, 2
>     -       .eabi_attribute 30, 2
>     -       .eabi_attribute 18, 4
>             .file   "fp.c"
>             .text
>             .align  2
> 
> 
> but it does not work either:
> 
>     $ arm-linux-gnueabi-gcc-4.4 -o fpapcs -mapcs -mno-sched-prolog fpapcs.s
>     $ qemu-arm -L /usr/arm-linux-gnueabi ./fpapcs
>     Hello World! 0
>     fp0: 0x4080011c  fp1: 0x83e0

Of course since it's the return address which is displayed

> 
> 
> ~~~~
> 
> There are many arm abis and maybe I'm doing something stupid, but to me
> all this looks like gcc generates incorrect code.

Are you sure the code you compiled for ARM doesn't contain 
__builtin_return_address instead of __builtin_frame_address?

> 
> > > ... as probably required by arm calling convention (not looked at the
> > > spec yet, but it seems reasanoble, given how push is really a block
> > > str and that str stores register in ascending order).
> > 
> > I have once seen a page in MSDN describing the ARM stack frame of
> > Windows CE. I don't know if ARM specified how stack frames should look
> > like.
> 
> From gcc manual:
> 
> `-mapcs-frame'
>      Generate a stack frame that is compliant with the ARM Procedure
>      Call Standard for all functions, even if this is not strictly
>      necessary for correct execution of the code.  Specifying
>      `-fomit-frame-pointer' with this option will cause the stack
>      frames not to be generated for leaf functions.  The default is
>      `-mno-apcs-frame'.
> 
> so at least there is "ARM Procedure Call Standard". Sorry, no time to
> search for it...
> 
> 
> Thanks,
> Undertimed Kirill

Sorry for taking so much time to answer you.

Best regards,

Thomas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]