[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correc
From: |
Kirill Smelkov |
Subject: |
Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correctly |
Date: |
Sat, 5 Jan 2013 19:35:44 +0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Sat, Jan 05, 2013 at 03:36:49PM +0100, Thomas Preud'homme wrote:
> Le jeudi 13 décembre 2012 17:55:41, vous avez écrit :
>
> [SNIP]
>
> >
> > from arch/arm/kernel/stacktrace.c:
> >
> > * With framepointer enabled, a simple function prologue looks like
> > this: * mov ip, sp
> > * stmdb sp!, {fp, ip, lr, pc}
> > * sub fp, ip, #4
> > *
> > * A simple function epilogue looks like this:
> > * ldm sp, {fp, sp, pc}
> >
> > so since fp is initialized from ip-4 (old_sp-4) it points to (near ?)
> > fp, not pc, right?
>
> stmdb means sp is decremented first (sp = sp - 4 * nbregs) and then register
> are saved in ascending order from this new location. So, pc will end up at
> initial_sp - 4 :) Hence ip points at pc.
>
> >
> > also linux kernel is compiled with
> >
> > -mapcs -mno-sched-prolog -mabi=apcs-gnu
> >
> > and this changes frame layout compared to what gcc produces by default.
> >
> > (sorry for that near - i don't remember stack and stmdb arm details
> > good enough)
>
> stmdb stands for STore Multiple Decrement Before
>
> >
> > > > 8: e92d0870 push {r4, r5, r6, fp} ; NOTE
> > > > r4... go before fp
> > > >
> > > > 14: e28db00c add fp, sp, #12
> > > >
> > > > 5c: e59b0000 ldr r0, [fp] ; BUG here!
> > >
> > > Why is this a bug in GCC? It will load the old value of fp to r0.
> >
> > Hmm, after push here is how stack looks like:
> >
> > r4 <- old_sp
> > r5 <- sp + 12
> > r6
> > fp
> > <- sp
> >
> > (again, off-by-one error probably, but anywayy that sp+12 can't point to
> > old fp).
>
> Nope. push store register in ascending order from the lowest address to the
> highest address. push {regs} is actually the same as stmdb sp! {regs}. So,
> after that initial push, you'd have:
>
> x <- old_sp
> fp <- sp + 12
> r6
> r5
> r4 <- new_sp
>
> so in fp is stored sp+12 that is the address where is stored the old fp :)
>
> Hence the ldr does load the address of the previous frame in the return
> register (r0).
>
> >
> > ~~~~
> >
> > Anyway, I've prepared more simple complete example to demonstrate the
> > bug:
> >
> > ---- 8< ---- fp.c
> > #include <stdio.h>
> >
> > void *func(int x)
> > {
> > char *fp1 = __builtin_frame_address(1);
> >
> > printf("Hello World! %i\n", x);
> >
> > fp1 += x;
> > return fp1;
> > }
> >
> > int main()
> > {
> > void *fp0 = __builtin_frame_address(0);
> > void *fp1 = func(0);
> >
> > printf("fp0: %p fp1: %p\n", fp0, fp1);
> > return 0;
> > }
> >
> > The code should output
> >
> > Hello World! 0
> > fp0: <addr> fp1: <the-same-addr!>
> >
> > right? on i386:
> >
> > $ gcc -o fp_i386 -O2 fp.c
> > $ ./fp_i386
> > Hello World! 0
> > fp0: 0xafeda538 fp1: 0xafeda538
> >
> >
> > but it does not on arm, look:
> >
> > $ arm-linux-gnueabi-gcc-4.4 -marm -o fp_arm -O2 fp.c
> > $ qemu-arm -L /usr/arm-linux-gnueabi ./fp_arm
> > Hello World! 0
> > fp0: 0x4080010c fp1: 0x8414
> >
> > and the problem here is that gcc, as I think, incorrectly setups fp:
> >
> > $ arm-linux-gnueabi-gcc-4.4 -marm -o fp.s -S -O2 fp.c
> > $ cat fp.s
> > .cpu arm9tdmi
> > .fpu softvfp
> > .eabi_attribute 20, 1
> > .eabi_attribute 21, 1
> > .eabi_attribute 23, 3
> > .eabi_attribute 24, 1
> > .eabi_attribute 25, 1
> > .eabi_attribute 26, 2
> > .eabi_attribute 30, 2
> > .eabi_attribute 18, 4
> > .file "fp.c"
> > .text
> > .align 2
> > .global func
> > .type func, %function
> > func:
> > @ Function supports interworking.
> > @ args = 0, pretend = 0, frame = 0
> > @ frame_needed = 1, uses_anonymous_args = 0
> > stmfd sp!, {r4, r5, fp, lr}
> > add fp, sp, #12
> > mov r4, r0
> > ldr r5, [fp, #0] ; <- bug here, should be [fp, #-4]
> > mov r1, r4
> > ldr r0, .L3
> > bl printf
> > add r0, r5, r4
> > sub sp, fp, #12
> > ldmfd sp!, {r4, r5, fp, lr}
> > bx lr
> > .L4:
> > .align 2
> > .L3:
> > .word .LC0
> > .size func, .-func
> > .align 2
> > .global main
> > .type main, %function
> > main:
> > @ Function supports interworking.
> > @ args = 0, pretend = 0, frame = 0
> > @ frame_needed = 1, uses_anonymous_args = 0
> > stmfd sp!, {fp, lr}
> > mov r0, #0
> > add fp, sp, #4
> > bl func
> > mov r1, fp
> > mov r2, r0
> > ldr r0, .L7
> > bl printf
> > mov r0, #0
> > sub sp, fp, #4
> > ldmfd sp!, {fp, lr}
> > bx lr
> > .L8:
> > .align 2
> > .L7:
> > .word .LC1
> > .size main, .-main
> > .section .rodata.str1.4,"aMS",%progbits,1
> > .align 2
> > .LC0:
> > .ascii "Hello World! %i\012\000"
> > .space 3
> > .LC1:
> > .ascii "fp0: %p fp1: %p\012\000"
> > .ident "GCC: (Debian 4.4.6-11) 4.4.6"
> > .section .note.GNU-stack,"",%progbits
>
> Agreed, that's weird. I fails equally on my machine (though the code is
> different, maybe because I'm using armv7-a as machine arch).
>
> >
> >
> > and if I apply the following patch
> >
> > diff --git a/t/fp.s b/t/fp1.s
> > index 2047b05..85bb402 100644
> > --- a/t/fp.s
> > +++ b/t/fp1.s
> > @@ -20,7 +20,7 @@ func:
> > stmfd sp!, {r4, r5, fp, lr}
> > add fp, sp, #12
> > mov r4, r0
> > - ldr r5, [fp, #0]
> > + ldr r5, [fp, #-4]
> > mov r1, r4
> > ldr r0, .L3
> > bl printf
> >
> > it does work:
> >
> > $ arm-linux-gnueabi-gcc-4.4 -o fp fp.s
> > $ arm-linux-gnueabi-gcc-4.4 -o fp1 fp1.s
> > $ qemu-arm -L /usr/arm-linux-gnueabi ./fp
> > Hello World! 0
> > fp0: 0x4080012c fp1: 0x8414
> >
> > $ qemu-arm -L /usr/arm-linux-gnueabi ./fp1
> > Hello World! 0
> > fp0: 0x4080012c fp1: 0x4080012c
> >
> > that's why I think gcc generated wrong code here.
>
> In this case I agree. It seems that no matter what, __builtin_frame_address
> loads fp, no matter where it points.
>
> >
> > And to the reference, here is how code would look like if being part of
> > linux:
> >
> > $ arm-linux-gnueabi-gcc-4.4 -o fpapcs.s -marm -mapcs
> > -mno-sched-prolog -S -O2 fp.c $ arm-linux-gnueabi-gcc-4.4 -o fpapcs_gnu.s
> > -marm -mapcs -mno-sched-prolog -mabi=apcs-gnu -S -O2 fp.c $ diff -u fp.s
> > fpapcs.s
> > --- fp.s 2012-12-13 20:34:09.000000000 +0400
> > +++ fpapcs.s 2012-12-13 20:34:09.000000000 +0400
> > @@ -17,16 +17,17 @@
> > @ Function supports interworking.
> > @ args = 0, pretend = 0, frame = 0
> > @ frame_needed = 1, uses_anonymous_args = 0
> > - stmfd sp!, {r4, r5, fp, lr}
> > - add fp, sp, #12
> > + mov ip, sp
> > + stmfd sp!, {r4, r5, fp, ip, lr, pc}
> > + sub fp, ip, #4
> > mov r4, r0
> > ldr r5, [fp, #0]
> > mov r1, r4
> > ldr r0, .L3
> > bl printf
> > add r0, r5, r4
> > - sub sp, fp, #12
> > - ldmfd sp!, {r4, r5, fp, lr}
> > + sub sp, fp, #20
> > + ldmfd sp, {r4, r5, fp, sp, lr}
> > bx lr
> > .L4:
> > .align 2
> > @@ -40,17 +41,18 @@
> > @ Function supports interworking.
> > @ args = 0, pretend = 0, frame = 0
> > @ frame_needed = 1, uses_anonymous_args = 0
> > - stmfd sp!, {fp, lr}
> > + mov ip, sp
> > + stmfd sp!, {fp, ip, lr, pc}
> > + sub fp, ip, #4
>
> thus fp points to the location where pc is stored
>
> > mov r0, #0
> > - add fp, sp, #4
> > bl func
> > mov r1, fp
> > mov r2, r0
> > ldr r0, .L7
> > bl printf
> > mov r0, #0
> > - sub sp, fp, #4
> > - ldmfd sp!, {fp, lr}
> > + sub sp, fp, #12
> > + ldmfd sp, {fp, sp, lr}
> > bx lr
> > .L8:
> > .align 2
> >
> > $ diff -u fpapcs.s fpapcs_gnu.s
> > --- fpapcs.s 2012-12-13 20:34:09.000000000 +0400
> > +++ fpapcs_gnu.s 2012-12-13 20:32:51.000000000 +0400
> > @@ -1,13 +1,3 @@
> > - .cpu arm9tdmi
> > - .fpu softvfp
> > - .eabi_attribute 20, 1
> > - .eabi_attribute 21, 1
> > - .eabi_attribute 23, 3
> > - .eabi_attribute 24, 1
> > - .eabi_attribute 25, 1
> > - .eabi_attribute 26, 2
> > - .eabi_attribute 30, 2
> > - .eabi_attribute 18, 4
> > .file "fp.c"
> > .text
> > .align 2
> >
> >
> > but it does not work either:
> >
> > $ arm-linux-gnueabi-gcc-4.4 -o fpapcs -mapcs -mno-sched-prolog fpapcs.s
> > $ qemu-arm -L /usr/arm-linux-gnueabi ./fpapcs
> > Hello World! 0
> > fp0: 0x4080011c fp1: 0x83e0
>
> Of course since it's the return address which is displayed
>
> >
> >
> > ~~~~
> >
> > There are many arm abis and maybe I'm doing something stupid, but to me
> > all this looks like gcc generates incorrect code.
>
> Are you sure the code you compiled for ARM doesn't contain
> __builtin_return_address instead of __builtin_frame_address?
Thomas, thanks for correcting my arm assembly errors. I agree I did make
mistakes, but I'm sure __builtin_frame_address() is used, not
__builtin_return_address(), and that __builtin_frame_address(1) is wrong.
Here it is once again:
$ cat fp.c
#include <stdio.h>
void *func(int x)
{
char *fp1 = __builtin_frame_address(1);
printf("Hello World! %i\n", x);
fp1 += x;
return fp1;
}
int main()
{
void *fp0 = __builtin_frame_address(0);
void *fp1 = func(0);
printf("fp0: %p fp1: %p\n", fp0, fp1);
return 0;
}
$ arm-linux-gnueabi-gcc-4.4 -ofpapcs -marm -mapcs -mno-sched-prolog -O2 fp.c
$ qemu-arm -L /usr/arm-linux-gnueabi ./fpapcs
Hello World! 0
fp0: 0x4080010c fp1: 0x83e0
> > > > ... as probably required by arm calling convention (not looked at the
> > > > spec yet, but it seems reasanoble, given how push is really a block
> > > > str and that str stores register in ascending order).
> > >
> > > I have once seen a page in MSDN describing the ARM stack frame of
> > > Windows CE. I don't know if ARM specified how stack frames should look
> > > like.
> >
> > From gcc manual:
> >
> > `-mapcs-frame'
> > Generate a stack frame that is compliant with the ARM Procedure
> > Call Standard for all functions, even if this is not strictly
> > necessary for correct execution of the code. Specifying
> > `-fomit-frame-pointer' with this option will cause the stack
> > frames not to be generated for leaf functions. The default is
> > `-mno-apcs-frame'.
> >
> > so at least there is "ARM Procedure Call Standard". Sorry, no time to
> > search for it...
> >
> >
> > Thanks,
> > Undertimed Kirill
>
> Sorry for taking so much time to answer you.
Nevermind... and thanks,
Kirill