tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] [PATCH] TCC arm64 back end


From: Thomas Preud'homme
Subject: Re: [Tinycc-devel] [PATCH] TCC arm64 back end
Date: Mon, 13 Apr 2015 22:52:48 +0800
User-agent: KMail/4.14.1 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; )

Le lundi 13 avril 2015, 15:12:52 Michael Matz a écrit :
> Hello Thomas,
> 
> On Sun, 12 Apr 2015, Thomas Preud'homme wrote:
> > I was a bit puzzled because I saw symbols are resolved when a file is
> > loaded that define them (in tcc_load_object_file). The reason this
> > doesn't happen here is that the symbol is provided by libc.so.6 and the
> > function that loads dynamic libraries (tcc_load_dll) only look for
> > undefined symbols in dynsymtab_section rather than symtab_section. There
> > might be an obvious reason but I'm not sure why symbols from object
> > files and libraries are handled differently in terms of name resolution.
> > Of course relocation happens differently but name resolution should be
> > the same, shouldn't it?
> 
> Object files have no dynsym section, only a symtab.  Conversely shared
> objects don't have a symtab (conceptually; in reality they might have one,
> but then only for debugging purposes); their interface (and that's what is
> wanted when loading or linking against a shared object) is described by
> dynsym.

Yes, as you noticed I tend to focus too much on the code and not think enough 
about how it all works. It's like every time I need to do a change in the 
linker I relearn how it all works when I already have most of the knowledge. 
Anyway don't worry as I rediscovered it by myself today when doing some tests 
to understand how it all work.

> 
> It should be (and mostly is) like this in tcc:
> * symtab_section contains all symbols (defined or undefined) from either
>   .c files or regular .o files (ET_REL) contained on the cmdline
> * s1->dynsymtab_section contain all symbols (defined or undefined) from
>   all shared objects (ET_DYN) mentioned on the command line (these are
>   collected from their respective .dynsym sections)
> * s1->dynsym contains the resulting dynamic symbols of the to-be-produced
>   executable or shared library (i.e. that one is built up in several steps
>   during linking)

Indeed, I refigured this out today.

> 
> bind_exe_dynsyms is the one converting from symtab_section to .dynsym.
> After all, all undefined symbols (in symtab) must come from some shared
> lib (if it came from some .o it would not be undefined in symtab), hence
> must be recorded somewhere in dynsymtab.  But to actually create the
> import this symbol then must be listed in the executables .dynsym section,
> and this is what's done in bind_exe_dynsyms (i.e. it resolved undefined
> symbols to shared libs).

Yes, and it should be the only place calling put_got_entry. Right now it's a 
bit in build_got_entries and a bit in bind_exe_dynsyms. Unless I missed 
something but I'll discover before I do any change.

> 
> Conversely shared libs may also contain undefined symbols.  If they are
> provided by other shared libs the dynamic linker will take care of it.
> But they may also be provided by the main executable.  In order to avoid
> exporting _all_ global symbols from an executable always, only symbols
> actually undefined in shared libs and provided by the executable are
> exported (included as defined in .dynsym).  That's the purpose of
> bind_libs_dynsyms.

It doesn't have to be undefined. I just did a test with ld and it exports 
symbol form the program if it's present in a dynsym of any library. Ld's man 
page confirm that in the description of the -E (aka --export-dynamic) option:

"If you do not use either of these options (or use the --no-export-dynamic 
option to restore the default behavior), the dynamic symbol table will 
normally contain only those symbols which are referenced by some dynamic 
object mentioned in the link."

> 
> For creating a shared lib, all global symbols (with default visibility)
> are exported (included in their .dynsym); that's done by
> export_global_syms.
> 
> > But when linking with tcc I get:
> > 
> > Hello, world!
> 
> Yes, that's a bug in tcc.  The problem is that all .dynsyms from shared
> libs are load and merged before the symtab of the executable is consulted.
> Therefore dynsymtab contains a definition for printf (from glibc) and
> hence bind_libs_dynsyms doesn't see anymore that it was once undefined in
> one library; it would probably need tweaking in add_elf_sym (if the
> to-be-added symbol is from shared libs, and should be added into
> dynsymtab, then UNDEF should prevail, not definedness; alternatively this
> only when also contained in symtab_section as global).

The good news is that given the above, add_elf_sym doesn't need to be changed 
but only bind_libs_dynsyms. Or at least it should be as easy as moving the == 
UNDEF to guard the warning below but unfortunately doing so segfault because 
some relocation is not happening. Don't worry, I'll get to the bottom of it.

> 
> > I also found a possible speed improvement. Currently tcc_load_dll load
> > dll recursively.
> 
> Yes, but that's not only a speed change, but a semantic one as well.
> Basically what tcc implements right now is something in between ld's
> --copy-dt-needed-entries and its negative; tcc will _search_ for symbols
> also in referenced DSOs, but won't record a DT_NEEDED tag (this is a
> useless behaviour).

Yes, I realized it later. I thought all global executable were exported but 
no, only those referenced by any library which means all libraries need to be 
opened indeed. It seems that what you are describing (and thus what tcc does) 
is exactly the default option of ld: search symbols but don't add DT_NEEDED. 
The only problem is how it search symbols.

> 
> > That should only be necessary for tcc_run but that require changing
> > bind_libs_dynsyms (which requires changing anyway because of the bug
> > above).
> 
> I don't see that.  Also undefined references from shared libs should only
> be satisfied by the executable if they are mentioned directly on the
> command line I think.

Nope. But you can add on the command line extra symbols to be exported are 
specify an exact list of symbol to be exported but the default behavior is to 
export anything used by a library, even if it's defined in the same exact 
library.

Try it for yourself:

% cat main.c       
int foo (void);

int
baz (void)
{
  return 0;
}

int
foo_helper (void)
{
  return 42;
}

int
main (void)
{
  return foo ();
}


% cat foo.c
int bar (void);

int
foo_helper (void)
{
  return bar ();
}

int
foo (void)
{
  return foo_helper ();
}


% cat bar.c
int baz (void);

int
bar (void)
{
  return baz ();
}

Compile bar.c as libbar.so, foo.c as libfoo.so that link against libbar.so and 
then main.c as main that links against libfoo.so (and you need to define 
LD_LIBRARY_PATH for ld to find libbar). I know just libfoo with foo_helper 
returning 0 would be enough to prove my point but I was also looking for 
whether ld opens and reads libbar.so (the experiment I mentionned above).

Thanks for teaching me, I really appreciate. It seems that despite what you 
told, you understand tcc's linker at least as well as me. Anyway, if you don't 
mind I'd still like to be the one to improve its organisation. I'll appreciate 
any review of my changes though.

Best regards,

Thomas

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]