[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Tinycc-devel] Fallout from commit 01c041923474750a236da02561f0f8835
From: |
Thomas Preud'homme |
Subject: |
Re: [Tinycc-devel] Fallout from commit 01c041923474750a236da02561f0f8835445848b |
Date: |
Sat, 20 Sep 2014 20:13:54 +0800 |
User-agent: |
KMail/4.14 (Linux/3.14-2-amd64; KDE/4.14.0; x86_64; ; ) |
Le jeudi 18 septembre 2014, 17:46:36 Michael Matz a écrit :
> Hi,
>
> On Tue, 9 Sep 2014, Thomas Preud'homme wrote:
> > 1) You added the support for R_ARM_GLOB_DAT and R_ARM_JUMP_SLOT
> > relocations but the computation you added ignore the possible addend at
> > *ptr by doing a simple assignment. Is that normal? Did I miss something?
>
> This is per the ABI. The addend for both relocations (when using REL
> form) is (and must be) always zero. I didn't bother checking for this,
> but rather just ignored anything that was in there.
Mmmh ok, I didn't check sorry for the noise.
>
> > 2) When creating a GOT and PLT entry for a R_ARM_PC24, R_ARM_CALL or
> > R_ARM_JUMP24 you add the offset of the PLT entry to the place being
> > relocated. I'm not sure I got it right but it seems to me that the
> > relocation will be processed again in relocate_section and seems the
> > symbol referenced is still the target function (and not the PLT entry
> > created) as the index in the r_info field of the relocation has remained
> > unchanged. Also this put some relocation computation in
> > build_got_entries. Why not change the initial relocation to make it
> > resolve to the PLT entry.
>
> It might be that I got some of the logic wrong, though I obviously did
> check tcc on arm with itself (and IIRC gawk). I'll need to reproduce the
> problem somewhen. Are you positive that it was my commit that broke it?
> Hmm, or it might be that I also hit an error but somehow convinced myself
> that it was a pre-existing but possibly hidden problem. Man, it's long
> ago :)
I'm not 100% sure no. Here are the facts:
Two builds of tcc mob were done by Debian buildd. The source is slightly
modified to disable some tests, otherwise it's the same. The first build [1]
succedded but didn't contain 63376d7712081c739a3e8dfe5fba396721bbe10b.
The second build [2] does contain it and failed. I looked at the relocation
that failed (thank you for adding the relocation number in the error message)
and checked the code. With the changes you made as part of the removal of jmp
table the only condition that can fail now is being out of reach. I looked at
what changed in this code recently and I found your commits.
[1]
https://buildd.debian.org/status/fetch.php?pkg=tcc&arch=armhf&ver=0.9.27~git20140801.14745bd-1&stamp=1409978859
[2]
https://buildd.debian.org/status/fetch.php?pkg=tcc&arch=armhf&ver=0.9.27~git20140907.87d879a-1&stamp=1410110433
Now of course I did try on my Toshiba AC100 ARM laptop and the test does pass
and given one of the two Debian build passed this lead me to believe it's a
non deterministic error. It probably depends on where the libraries are
loaded.
However I did look at your code and this thing seems surprising. The *offset*
of the PLT entry is added in the place being relocated but the base of the PLT
is never added. Also, everything seems to be already in place to handle PLT
entries (how would it work before if it wasn't?):
1) put_got_entry
(i) create a PLT entry
(ii) create a new symbol in the .dynsym section whose value is the offset of
the PLT entry from the beginning of the PLT section.
(iii) add a relocation for the GOT (created in next step) entry to be
relocated to the callee function location
(iii) create a GOT entry
2) elf_output_file add the base of the PLT section to the value of the symbol
created in dynsym at step 1(ii): see at comment "relocate symbols in .dynsym
now that final addresses are known".
At this point the symbol in dynsym refer to the PLT entry.
3) relocate_section (called via final_sections_reloc) do the relocation of the
called function and writes there the offset between the destination (the PLT
entry) and the place being relocated.
By the way (I also write this to myself as I always forget how this whole mess
work), for a call to a library function put_got_entry is called from
bind_exe_dynsyms. This function call put_got_entry for each symbols that is
still undefined and is a function.
>
> > 3) I don't see any test for the type of output when deciding what type of
> > relocation to add. So even when the output is in memory reloc_type will be
> > JUMP_SLOT which will lead to a PLT entry being created. This seems to
> > contradict the comment near the end of put_got_entry. The comment seems
> > wrong as I don't see how a branch could be relocated without a PLT entry.
> I think what I wanted to convey in the comment is the following situation:
> normally with memory output (or static linking) no symbolic relocations
> whatsoever are required in the executable, because, well, everything is
> relocated from the beginning. But the means by which that is ensured is
> itself (at least in TCCs link editor) by creating (and later processing)
> relocations. Where it matters are indirect calls:
>
> extern void f(void);
> g() { void (*ptr)(void) = f; ptr(); }
>
> Here there will be a GOT slot allocated for 'f'. The initialization of
> 'ptr' will load from that GOT slot. So even though direct calls to 'f'
> can (and will be, when 'f' is defined) fully resolved without a PLT slot,
> this indirect access needs a GOT slot, and because of that also needs to
> initialize that slot with f's address, which is done by an relocation
> (which are processed for static and memory output right at the end of most
> processing).
Why not relocating the text section (the instruction that load the address of
f in a register) directly? Since it's static linking (somehow tcc -run is also
static linking, you know all the addresses) there's no problem with relocating
the text. I'm still confused at what each case in load means but I see a
couple of R_ARM_ABS32 and a call to stuff_const so it seems it leaves a blank
in the instruction sequence where the value will be written via a relocation
and then load from there.
Anyway, let's assume I'm wrong (that happens a lot) and we need a GOT entry
for that. Then I think it should be a R_ARM_GLOB_DAT since the assignment
would be done by a load from the GOT, like if you were reading an extern
variable. The fact that it's a function should not matter, you just load the
address and then call with the register in which you loaded it as parameter
instead of using the register to do some arithmetic if it were an integer
global variable. R_ARM_GLOB_DAT will add a GOT entry and a relocation for it
for the symbol you consider.
And here R_ARM_JUMP24 relocation suggest it's not an indirect call. An
indirect call first load the address in a register and then just jump/call to
it. So here what was being relocated was a plain call.
>
> > 4) the jump table that was removed in subsequent patch was only available
> > when outputing to memory. But now a PLT and GOT entry is created no
> > matter what type of output (see 3) above).
>
> Yes, PLT and GOT together (when unconditionally created) take over the
> role of the old jump table.
What I mean is that a PLT and GOT entry should only be created when outputing
to memory (as it was with the jump table in relocate_section). When outputing
to executable this will be handled by bind_exe_dynsyms as explained above.
>
> Now obviously I bungled something regarding all that on ARM and will have
> to look at the details. Let me first try to reproduce and I'll come back.
Good luck with that, I cannot reproduce on my ARM laptop. But still the code
look wrong. Now feel free to prove me wrong. I often have and I may have
missed something important.
>
> Ciao,
> Michael.
Cheers,
Thomas
signature.asc
Description: This is a digitally signed message part.