Re: [Tinycc-devel] Non PIC-code and TCC's GOT strategy for TCC_OUTPUT

tinycc-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Non PIC-code and TCC's GOT strategy for TCC_OUTPUT_ME

From:	grischka
Subject:	Re: [Tinycc-devel] Non PIC-code and TCC's GOT strategy for TCC_OUTPUT_MEMORY
Date:	Wed, 7 Dec 2022 11:16:52 +0100
User-agent:	Mozilla/5.0 (Windows NT 6.0; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

On 05.12.2022 22:10, Macoy Madson wrote:

Hi,

I'm working on using TCC as the core of run-time modifiable applications. I am 
using TCC as a library, and compiling/linking straight to memory, i.e. 
tcc_set_output_type(state, TCC_OUTPUT_MEMORY). I have mob c03d59e.

I was having issues with static linking SDL2 with TCC, where SDL2 was built 
with GCC -O3. SDL2 would segmentation fault as soon as I tried to initialize 
it. I tracked it down to the log function using stderr, which was a bogus 
address.

I found the issue:

- GCC generates R_X86_64_PC32 relocations to unknown symbols like stderr in 
SDL2.

- TCC, while relocating, sees that the PC32 reference is undefined, so creates an 
"AUTO_GOTPLT_ENTRY" and updates the relocation to go to "stderr@plt" instead of 
"stderr".

* Note that the final address of stderr is a dynamic symbol, because the TCC 
"environment" itself is linked to GNU libc.


Hi,

I think the conclusion is that "tcc -run" using .o/.a from gcc (without -fPIC)
is not well supported.

Of course redirecting to a jump-slot "stderr@plt" is wrong because "stderr"
is not a function.  Thing is that R_X86_64_PC32 can occur for both functions
and data.

But in order to know whether the symbol is STT_FUNC or STT_OBJECT, tcc would
need to actually load libc.so (via tcc_load_dll()) and to scan its tables
which it (traditionally) doesn't for tcc -run.  See build_got_entries():
                    if (s1->dynsym) {
                        /* dynsym isn't set for -run :-/  */
If tcc would have that information, then it could create a PLT slot or
a R_COPY entry respectively, for -run too.

Reason why it mostly works is that TCC itself never uses R_X86_64_PC32 for
data (except for static data which is never SHN_UNDEF).

Such I guess your options probably are:
- use gcc -fPIC
- use tcc to compile SDL (more like -FPIC but not optimized)
- provide SDL as a shared library
- link SDL into the host application and provide its symbols via tcc_add_symbol
- work on tcc to fix the problem

-- grischka


This ends in disaster, because it appears TCC assumes that the relocation can 
safely become a GOT/PLT relocation in build_got_entries()--gotplt_entry_type() 
returns AUTO_GOTPLT_ENTRY for the R_X86_64_PC32 relocations.

Now, I'm not well-practiced at reading assembly so I may be wrong here, but 
here's an example of the code that is breaking (objdump --disassemble --reloc 
build/SDL.o):

0000000000000020 <SDL_InitSubSystem_REAL>:
     20:    f3 0f 1e fa              endbr64
     24:    48 8b 05 00 00 00 00     mov    0x0(%rip),%rax # 2b 
<SDL_InitSubSystem_REAL+0xb>
               27: R_X86_64_PC32    stderr-0x4
     2b:    48 89 05 00 00 00 00     mov    %rax,0x0(%rip) # 32 
<SDL_InitSubSystem_REAL+0x12>
               2e: R_X86_64_PC32    MyStderr-0x4
     32:    48 8d 05 00 00 00 00     lea    0x0(%rip),%rax # 39 
<SDL_InitSubSystem_REAL+0x19>
               35: R_X86_64_PC32    stderr-0x4
     39:    48 89 05 00 00 00 00     mov    %rax,0x0(%rip) # 40 
<SDL_InitSubSystem_REAL+0x20>
               3c: R_X86_64_PC32    MyStderrAddr-0x4
     40:    b8 ff ff ff ff           mov    $0xffffffff,%eax
     45:    c3                       retq
     46:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
     4d:    00 00 00

And here's the working GOT version:

0000000000000020 <SDL_InitSubSystem_REAL>:
     20:    f3 0f 1e fa              endbr64
     24:    48 8b 05 00 00 00 00     mov    0x0(%rip),%rax # 2b 
<SDL_InitSubSystem_REAL+0xb>
               27: R_X86_64_REX_GOTPCRELX    stderr-0x4
     2b:    48 8b 15 00 00 00 00     mov    0x0(%rip),%rdx # 32 
<SDL_InitSubSystem_REAL+0x12>
               2e: R_X86_64_REX_GOTPCRELX    MyStderr-0x4
     32:    48 8b 08                 mov    (%rax),%rcx
     35:    48 89 0a                 mov    %rcx,(%rdx)
     38:    48 8b 15 00 00 00 00     mov    0x0(%rip),%rdx # 3f 
<SDL_InitSubSystem_REAL+0x1f>
               3b: R_X86_64_REX_GOTPCRELX    MyStderrAddr-0x4
     3f:    48 89 02                 mov    %rax,(%rdx)
     42:    b8 ff ff ff ff           mov    $0xffffffff,%eax
     47:    c3                       retq
     48:    0f 1f 84 00 00 00 00     nopl   0x0(%rax,%rax,1)

The (modified) source is the following:

void* MyStderr = NULL;
void* MyStderrAddr = NULL;

SDL_InitSubSystem(Uint32 flags)
{
     Uint32 flags_initialized = 0;

     MyStderr = stderr;
     MyStderrAddr = (void*)&stderr;
     return -1;

}

I use those My* variables outside SDL2 to print the address of stderr, for 
debugging. I am able to print in the other compilation unit because it uses the 
GOTPCRELX relocations to find stderr instead of the faulty PC32 ones.

It appears to me that the PC32 version could not safely use the GOT, because it 
doesn't do the extra dereference necessary due to the indirection.

I can fix this by compiling SDL2 files with -fPIC, which generates 
R_X86_64_REX_GOTPCRELX relocations, and TCC handles those perfectly.

I can also go the 100% static linked approach by compiling in static musl libC or 
something so that all the symbols are defined at relocation time. This has implications 
on how many hoops the "user" needs to jump through in order to get their 
program working; switching libc is harder than adding -fPIC and recompiling. I am of the 
opinion that way more things should just be 100% static-linked, but know that there is a 
huge body of code that isn't that takes a decent amount of tedious effort to convert over.

I'm fine with modifying my SDL2 build to work, but I really need something that 
can detect when any PC32 relocation is caused to become a GOT/PLT relocation. 
That way, I can at least error and instruct the user to re-compile the code as 
position-independent.

What I would like to get confirmation on is

A) R_X86_64_32 entries are unsafe to convert to GOT entries through 
"AUTO_GOTPLT_ENTRY" because at the very least they are seemingly unimplemented 
in TCC's -run/memory mode

and

B) Using COPY relocations wouldn't work in TCC's memory mode either, because the existing 
dynamic symbols provided by the TCC application itself have already been placed, so the 
copy operation cannot occur without "moving" the TCC application's symbols. 
They would have to be moved because the TCC application may have them in a far away place 
in memory, too far for a PC32 relocation to reference.

If I am misunderstanding, an explanation of how AUTO_GOTPLT_ENTRY works with 
output type = TCC_OUTPUT_MEMORY would be greatly appreciated.

With my current understanding, it appears that I can:

- Simply error if I detect an undefined symbol at the relocate stage with any 
PC-relative relocation that isn't the full address space (e.g. PC64 should be 
fine, but PC32 would not work)

- Or do the slightly more sophisticated search of all the loaded dlls 
(RTLD_DEFAULT and others) for the symbol, then check if the DLL's already 
placed symbol is within the PC-relative relocation distance. If it is, then 
target it rather than @plt and use the PC32 relocation normally.

Am I on track here? I greatly appreciate anyone who read through this. I've 
been banging my head against this for several days now.

Thanks,

Macoy Madson


_______________________________________________
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

[Prev in Thread]

Current Thread

[Next in Thread]

[Tinycc-devel] Non PIC-code and TCC's GOT strategy for TCC_OUTPUT_MEMORY, Macoy Madson, 2022/12/05
- Re: [Tinycc-devel] Non PIC-code and TCC's GOT strategy for TCC_OUTPUT_MEMORY, grischka <=

Prev by Date: [Tinycc-devel] Non PIC-code and TCC's GOT strategy for TCC_OUTPUT_MEMORY
Next by Date: [Tinycc-devel] randomly failing tests
Previous by thread: [Tinycc-devel] Non PIC-code and TCC's GOT strategy for TCC_OUTPUT_MEMORY
Next by thread: [Tinycc-devel] randomly failing tests
Index(es):
- Date
- Thread