[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Tinycc-devel] Unify C and asm symbols
From: |
Michael Matz |
Subject: |
Re: [Tinycc-devel] Unify C and asm symbols |
Date: |
Sun, 10 Dec 2017 06:50:13 +0100 (CET) |
User-agent: |
Alpine 2.20 (LSU 67 2015-01-07) |
Hi,
On Wed, 6 Dec 2017, grischka wrote:
My tests go through also without this hunk, so it's either an optimization
(but then it should be tested before the strcmp) or it's a left-over from
during development of your patch, or it's for a situation I don't know yet
(perhaps one of the multi-file ones?)
f1.c:
asm("call x1; ret; x1: ...");
f2.c:
void x1() {}
MAY cause
"error: 'x1' defined twice"
Yeah, that's whay I meant with multi-file mode problems ...
3) instead of (2), call rebuild_hash() at the end of each file.
Where all of 1..3 still have this problem:
f1.c:
x1() { printf("x1(1)\n"); }
main() { x1(); main_2(); }
f2.c:
main_2() { asm("call x1"); }
static x1() { printf("x1(2)\n"); }
... and this.
4) create asm symbols as "static" first always. This however needs to
post-process asm symbol again (to convert any still undefined symbols
to globals), as well as to patch relocation entries if such global
already was defined in a previous file.
And yes, using static asm symbols from the beginning is the only
good solution (I see) to the above multi-file problems, with the
associated fallout problems from that, as you noticed: newly globalized
symbols need to be in the ELF hash table, the "old" local symbols might
already be used in emitted relocs and undefined-at-end must be global.
See attached (additional) patch. It definitely would add some lines
but then again I wasn't able to produce any problems w.r.t. multi-files
with that.
Comments ? ;)
I wasn't a big fan of the reloc rewriting, and when thinking about the
above multi-file problems I had a more linker centric approach in mind.
So what I came up with is below: doing a proper regular-symbol resolution
step: basically similar to what's done while reading in .o files.
Advantages: no rewriting of reloc entries (which is expensive in your
patch as you iterate all relocs for each into-global change), only needing
to rebuild the hash table once or twice per output file (or -run), the
undef globalization can be folded into it and it's overall a bit shorter
(30 lines added) :)
This is on top of your last two patches (i.e. without the static-asm-sym
patch). If you don't beat me to it I think I'll push your two and this
one somewhen tomorrow; together they (and even just your two) are a clear
progression (and all three together even source line number neutral! ;) )
What do you think?
Ciao,
Michael.
commit c47d42823da2bb2908ce6a60452a24ba58b1b273
Author: Michael Matz <address@hidden>
Date: Sun Dec 10 06:18:27 2017 +0100
Fix some multi-file corner cases with asm
for this we have to create also asm symbols as VT_STATIC initially
except if there's an indication that it should be global (.globl
or undefined at end of unit). For this to work we need to
be able to globalize symbols after they were local and enter them
into the ELF hash tables, and also adjust the symbols that were
potentially already used in relocs when they were still local.
The easiest is to do a proper symbol resolution step also in multi-file
mode, for regular symbols (the non-dynamic ones, i.e. not from shared
libs).
---
tcc.h | 4 +--
tccasm.c | 21 +++++-----------
tccelf.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++----------------
tccpe.c | 3 +--
tccrun.c | 3 +--
5 files changed, 74 insertions(+), 44 deletions(-)
diff --git a/tcc.h b/tcc.h
index 38bbda8..926ac5d 100644
--- a/tcc.h
+++ b/tcc.h
@@ -911,7 +911,6 @@ struct filespec {
/* symbol was created by tccasm.c first */
#define VT_ASM (VT_VOID | VT_UNSIGNED)
-#define VT_ASM_GLOBAL VT_DEFSIGN
#define IS_ASM_SYM(sym) (((sym)->type.t & (VT_BTYPE | VT_ASM)) == VT_ASM)
/* token values */
@@ -1426,11 +1425,10 @@ ST_FUNC void put_stabs_r(const char *str, int type, int
other, int desc, unsigne
ST_FUNC void put_stabn(int type, int other, int desc, int value);
ST_FUNC void put_stabd(int type, int other, int desc);
-ST_FUNC void relocate_common_syms(void);
+ST_FUNC void resolve_regular_syms(void);
ST_FUNC void relocate_syms(TCCState *s1, Section *symtab, int do_resolve);
ST_FUNC void relocate_section(TCCState *s1, Section *s);
-ST_FUNC void tcc_add_linker_symbols(TCCState *s1);
ST_FUNC int tcc_object_type(int fd, ElfW(Ehdr) *h);
ST_FUNC int tcc_load_object_file(TCCState *s1, int fd, unsigned long
file_offset);
ST_FUNC int tcc_load_archive(TCCState *s1, int fd);
diff --git a/tccasm.c b/tccasm.c
index 4fe3f32..355d158 100644
--- a/tccasm.c
+++ b/tccasm.c
@@ -42,13 +42,12 @@ static Sym *asm_label_find(int v)
return sym;
}
-static Sym *asm_label_push(int v, int t)
+static Sym *asm_label_push(int v)
{
- Sym *sym = global_identifier_push(v, t, 0);
/* We always add VT_EXTERN, for sym definition that's tentative
(for .set, removed for real defs), for mere references it's correct
as is. */
- sym->type.t |= VT_ASM | VT_EXTERN;
+ Sym *sym = global_identifier_push(v, VT_ASM | VT_EXTERN | VT_STATIC, 0);
sym->r = VT_CONST | VT_SYM;
return sym;
}
@@ -67,7 +66,7 @@ ST_FUNC Sym* get_asm_sym(int name, Sym *csym)
{
Sym *sym = asm_label_find(name);
if (!sym) {
- sym = asm_label_push(name, 0);
+ sym = asm_label_push(name);
if (csym)
sym->c = csym->c;
}
@@ -102,7 +101,7 @@ static void asm_expr_unary(TCCState *s1, ExprValue *pe)
/* forward */
if (!sym || (sym->c && elfsym(sym)->st_shndx != SHN_UNDEF)) {
/* if the last label is defined, then define a new one */
- sym = asm_label_push(label, VT_STATIC);
+ sym = asm_label_push(label);
}
}
pe->v = 0;
@@ -381,7 +380,7 @@ static Sym* asm_new_label1(TCCState *s1, int label, int
is_local,
}
} else {
new_label:
- sym = asm_label_push(label, is_local == 1 ? VT_STATIC : 0);
+ sym = asm_label_push(label);
}
if (!sym->c)
put_extern_sym2(sym, NULL, 0, 0, 0);
@@ -392,11 +391,6 @@ static Sym* asm_new_label1(TCCState *s1, int label, int
is_local,
if (is_local != 2)
sym->type.t &= ~VT_EXTERN;
- if (IS_ASM_SYM(sym) && !(sym->type.t & VT_ASM_GLOBAL)) {
- sym->type.t |= VT_STATIC;
- update_storage(sym);
- }
-
return sym;
}
@@ -679,11 +673,8 @@ static void asm_parse_directive(TCCState *s1, int global)
Sym *sym;
next();
sym = get_asm_sym(tok, NULL);
- if (tok1 != TOK_ASMDIR_hidden) {
+ if (tok1 != TOK_ASMDIR_hidden)
sym->type.t &= ~VT_STATIC;
- if (IS_ASM_SYM(sym))
- sym->type.t |= VT_ASM_GLOBAL;
- }
if (tok1 == TOK_ASMDIR_weak)
sym->a.weak = 1;
else if (tok1 == TOK_ASMDIR_hidden)
diff --git a/tccelf.c b/tccelf.c
index 657aa61..6b2e227 100644
--- a/tccelf.c
+++ b/tccelf.c
@@ -299,6 +299,9 @@ static void rebuild_hash(Section *s, unsigned int
nb_buckets)
strtab = s->link->data;
nb_syms = s->data_offset / sizeof(ElfW(Sym));
+ if (!nb_buckets)
+ nb_buckets = ((int*)s->hash->data)[0];
+
s->hash->data_offset = 0;
ptr = section_ptr_add(s->hash, (2 + nb_buckets + nb_syms) * sizeof(int));
ptr[0] = nb_buckets;
@@ -370,9 +373,7 @@ ST_FUNC int put_elf_sym(Section *s, addr_t value, unsigned
long size,
return sym_index;
}
-/* find global ELF symbol 'name' and return its index. Return 0 if not
- found. */
-ST_FUNC int find_elf_sym(Section *s, const char *name)
+static int find_elf_sym_1(Section *s, const char *name, int onlydef)
{
ElfW(Sym) *sym;
Section *hs;
@@ -388,13 +389,21 @@ ST_FUNC int find_elf_sym(Section *s, const char *name)
while (sym_index != 0) {
sym = &((ElfW(Sym) *)s->data)[sym_index];
name1 = (char *) s->link->data + sym->st_name;
- if (!strcmp(name, name1) && ELFW(ST_BIND)(sym->st_info) != STB_LOCAL)
+ if (!strcmp(name, name1) && ELFW(ST_BIND)(sym->st_info) != STB_LOCAL
+ && (!onlydef || sym->st_shndx != SHN_UNDEF))
return sym_index;
sym_index = ((int *)hs->data)[2 + nbuckets + sym_index];
}
return 0;
}
+/* find global ELF symbol 'name' and return its index. Return 0 if not
+ found. */
+ST_FUNC int find_elf_sym(Section *s, const char *name)
+{
+ return find_elf_sym_1(s, name, 0);
+}
+
/* return elf symbol value, signal error if 'err' is nonzero */
ST_FUNC addr_t get_elf_sym_addr(TCCState *s, const char *name, int err)
{
@@ -716,21 +725,6 @@ static void sort_syms(TCCState *s1, Section *s)
tcc_free(old_to_new_syms);
}
-/* relocate common symbols in the .bss section */
-ST_FUNC void relocate_common_syms(void)
-{
- ElfW(Sym) *sym;
-
- for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
- if (sym->st_shndx == SHN_COMMON) {
- /* symbol alignment is in st_value for SHN_COMMONs */
- sym->st_value = section_add(bss_section, sym->st_size,
- sym->st_value);
- sym->st_shndx = bss_section->sh_num;
- }
- }
-}
-
/* relocate symbol table, resolve undefined symbols if do_resolve is
true and output error if undefined symbol. */
ST_FUNC void relocate_syms(TCCState *s1, Section *symtab, int do_resolve)
@@ -1167,7 +1161,7 @@ ST_FUNC void tcc_add_runtime(TCCState *s1)
/* add various standard linker symbols (must be done after the
sections are filled (for example after allocating common
symbols)) */
-ST_FUNC void tcc_add_linker_symbols(TCCState *s1)
+static void tcc_add_linker_symbols(TCCState *s1)
{
char buf[1024];
int i;
@@ -1226,6 +1220,48 @@ ST_FUNC void tcc_add_linker_symbols(TCCState *s1)
}
}
+/* Do final regular symbol preparation (for those coming from .c/.o/.s files,
+ not from shared libs) */
+ST_FUNC void resolve_regular_syms(void)
+{
+ int rebuild = 0;
+ ElfW(Sym) *sym;
+
+ /* Allocate common symbols in BSS. */
+ for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
+ if (sym->st_shndx == SHN_COMMON) {
+ /* symbol alignment is in st_value for SHN_COMMONs */
+ sym->st_value = section_add(bss_section, sym->st_size,
+ sym->st_value);
+ sym->st_shndx = bss_section->sh_num;
+ }
+ }
+
+ /* Now assign linker provided symbols their value. */
+ tcc_add_linker_symbols(tcc_state);
+
+ /* And finally resolve still UNDEF symbols (for multi-file mode),
+ and globalize those that are still UNDEF. */
+ rebuild_hash(symtab_section, 0);
+ for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
+ if (sym->st_shndx == SHN_UNDEF) {
+ const char *name = (char *) symtab_section->link->data +
sym->st_name;
+ int symndx = find_elf_sym_1(symtab_section, name, 1);
+ if (symndx) {
+ *sym = ((ElfSym *)symtab_section->data)[symndx];
+ rebuild = 1;
+ } else if (ELFW(ST_BIND)(sym->st_info) == STB_LOCAL) {
+ sym->st_info
+ = ELFW(ST_INFO)(STB_GLOBAL, ELFW(ST_TYPE)(sym->st_info));
+ rebuild = 1;
+ }
+ }
+ }
+ if (rebuild)
+ rebuild_hash(symtab_section, 0);
+}
+
+
static void tcc_output_binary(TCCState *s1, FILE *f,
const int *sec_order)
{
@@ -2016,8 +2052,7 @@ static int elf_output_file(TCCState *s1, const char
*filename)
if (file_type != TCC_OUTPUT_OBJ) {
/* if linking, also link in runtime libraries (libc, libgcc, etc.) */
tcc_add_runtime(s1);
- relocate_common_syms();
- tcc_add_linker_symbols(s1);
+ resolve_regular_syms();
if (!s1->static_link) {
if (file_type == TCC_OUTPUT_EXE) {
@@ -2058,6 +2093,14 @@ static int elf_output_file(TCCState *s1, const char
*filename)
}
}
build_got_entries(s1);
+ } else {
+ for_each_elem(symtab_section, 1, sym, ElfW(Sym)) {
+ if (sym->st_shndx == SHN_UNDEF
+ && ELFW(ST_BIND)(sym->st_info) == STB_LOCAL) {
+ sym->st_info
+ = ELFW(ST_INFO)(STB_GLOBAL, ELFW(ST_TYPE)(sym->st_info));
+ }
+ }
}
/* we add a section for symbols */
diff --git a/tccpe.c b/tccpe.c
index a67023d..af4438d 100644
--- a/tccpe.c
+++ b/tccpe.c
@@ -1970,8 +1970,7 @@ ST_FUNC int pe_output_file(TCCState *s1, const char
*filename)
tcc_add_bcheck(s1);
pe_add_runtime(s1, &pe);
- relocate_common_syms(); /* assign bss addresses */
- tcc_add_linker_symbols(s1);
+ resolve_regular_syms();
pe_set_options(s1, &pe);
ret = pe_check_symbols(&pe);
diff --git a/tccrun.c b/tccrun.c
index d0501a0..72b3994 100644
--- a/tccrun.c
+++ b/tccrun.c
@@ -188,8 +188,7 @@ static int tcc_relocate_ex(TCCState *s1, void *ptr, addr_t
ptr_diff)
pe_output_file(s1, NULL);
#else
tcc_add_runtime(s1);
- relocate_common_syms();
- tcc_add_linker_symbols(s1);
+ resolve_regular_syms();
build_got_entries(s1);
#endif
if (s1->nb_errors)
- Re: [Tinycc-devel] Unify C and asm symbols, Michael Matz, 2017/12/03
- Re: [Tinycc-devel] Unify C and asm symbols, grischka, 2017/12/04
- Re: [Tinycc-devel] Unify C and asm symbols, Michael Matz, 2017/12/04
- Re: [Tinycc-devel] Unify C and asm symbols, Michael Matz, 2017/12/04
- Re: [Tinycc-devel] Unify C and asm symbols, grischka, 2017/12/06
- Re: [Tinycc-devel] Unify C and asm symbols,
Michael Matz <=
- Re: [Tinycc-devel] Unify C and asm symbols, grischka, 2017/12/10
- Re: [Tinycc-devel] Unify C and asm symbols, Michael Matz, 2017/12/10
- Re: [Tinycc-devel] Unify C and asm symbols, Christian Jullien, 2017/12/11
- Re: [Tinycc-devel] Unify C and asm symbols, grischka, 2017/12/12
- Re: [Tinycc-devel] Unify C and asm symbols, Christian Jullien, 2017/12/12
- Re: [Tinycc-devel] Unify C and asm symbols, Michael Matz, 2017/12/15