64-byte alignment adds about 10% to the longer example program I
linked earlier (which actually prints the size of the allocated
code block). 10% isn't bad! For this reason, and because lots of
architectures use 64-byte cache lines, I suggest that we just use
64-byte alignment, independent of architecture.
David
On Sun, Jan 8, 2017 at 7:40 AM, David Mertens wrote:
OK, done! And you were right, we only need to align on 64 bytes!
Follow-up question: since the alignment is only 64-bytes,
would it be sensible to have all architectures align to this,
including ARM?
David
On Sun, Jan 8, 2017 at 7:19 AM, David Mertens
<address@hidden <mailto:address@hidden>>
wrote:
Thanks for the feedback, grischka.
On Sat, Jan 7, 2017 at 6:15 AM, grischka <address@hidden
<mailto:address@hidden>> wrote:
David Mertens wrote:
I just pushed a commit that sets up 512-byte
alignment for x86-64
architectures. It only uses 512 bytes for x86-64;
for all others it sticks
with the default of 16 bytes.
L1/L2 cache line size is 64 bytes on x86-like
processors, no matter
whether run in 32 or 64 bit mode.
Yes, theoretically we should not need to align on anything
more than 64 bytes. I chose 512 because I still got
slowdowns for smaller alignments, including 256. But you
mention...
However to make it work reliably the memory from
malloc needs to be
aligned as well, like so:
offset = 0, mem = (addr_t)ptr;
+ mem += -(int)mem & SECTION_ALIGNMENT;
and the possibly additional amount needs to be
requested in advance:
if (0 == mem)
- return offset;
+ return offset + SECTION_ALIGNMENT;
If I put this in place, then maybe the section alignment
can be lessened. I'll have to check. FWIW, I've been doing
this with my own TCC-calling code already and I've seen
performance benefits. I don't see how the math would work
to let me reduce SECTION_ALIGNMENT to 64 bytes, but I'll
experiment and see what happens.
All of this is a black box to me. From what I've read, I
don't think we'd need to worry about anything beyond 64
bytes, but I don't understand the underlying CPU behavior
well enough to predict. The numbers I actually use will be
based on real timing from testing on my machine or from
feedback from others.
I ran the tests on my BeagleBone Black with
the original alignment and saw no performance issues,
Obviously ARM don't automatically clear the
instruction cache which is
why we have the explicit __clear_cache() call for ARM
further down in
set_pages_executable().
I am not sure if this quite follows the project
practices. I define
SECTION_ALIGNMENT just prior to the function
tcc_relocate_ex. If anybody
can think of a better place to put it, to keep
useful things in one place,
please move it.
SECTION_ALIGNMENT seems too general as a name.
tccelf.c is full of
section_alignments of various kinds. I'd suggest
something prefixed
with RUN_xxxx to indicate that it's used only in that
specific place.
Can do! I may not have time today, but I should be able to
push a revised commit in the next couple of days.
David
--
"Debugging is twice as hard as writing the code in the
first place.
Therefore, if you write the code as cleverly as
possible, you are,
by definition, not smart enough to debug it." -- Brian
Kernighan
--
"Debugging is twice as hard as writing the code in the first
place.
Therefore, if you write the code as cleverly as possible,
you are,
by definition, not smart enough to debug it." -- Brian
Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
_______________________________________________
Tinycc-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/tinycc-devel