> On Thursday, January 5, 2017 3:25 PM, David Mertens <
address@hidden> wrote:
> Hello everyone,
> I have now written a very simple C program which gives highly erratic timing behavior when run under tcc -run. I have added this file to the gist; look for cache-test-simple.c here:
https://gist.github.com/ run4flat/ fcbb6480275b1b9dcaa7a8d3a80846 38
> The simple program does not attempt to produce a shared object library, and so should be runnable on any operating system that supports tcc -run, including Windows and Mac in addition to Linux. Here are some sample outputs on my machine:
> $ time ./tcc -B. -DNOPS=0 -run cache-test-simple.c
> real 0m0.052s
> $ time ./tcc -B. -DNOPS=1 -run cache-test-simple.c ***
> real 0m1.413s
> $ time ./tcc -B. -DNOPS=2 -run cache-test-simple.c
> real 0m0.069s
> $ time ./tcc -B. -DNOPS=3 -run cache-test-simple.c
> real 0m0.076s
> $ time ./tcc -B. -DNOPS=4 -run cache-test-simple.c ***
> real 0m1.158s
> The starred results are over an order of magnitude slower than the unstarred results.
> 1) Do others see this on other operating systems with 64-bit Intel processors?
> 2) Do others see this on any operating system with 64-bit AMD processors?
> 3) Do others see this on any operating system with any other architecture?
> Thanks!
> David
> On Thu, Jan 5, 2017 at 12:59 AM, David Mertens <
address@hidden> wrote:
> Update: I *can* get this slowdown with tcc. The main trigger is to have a global variable that gets modified by the function.
> I have updated the gist:
https://gist.github.com/ run4flat/ fcbb6480275b1b9dcaa7a8d3a80846 38
> This program generates a single function filled with a collection of skipped operations (number of operations is a command-line option) and finished with a modification of a global variable. It compiles the function using tcc, then calls the function a specified number of times (repeat count specified via command-line). It can either generate code in-memory, or it can generate a .so file and load that using dlopen. (If it generates in-memory, it prints the size of the generated code.)
> Here are the interesting results on my machine, all for 10,000,000 iterations, using compilation-in-memory:
> N Code Size (Bytes) Time (s)
> 0 128 2.52
> 1 144 2.54
> 2 176 2.57
> 3 208 0.035
> 4 224 0.058
> 5 256 2.57
> 6 272 0.060
> Switching over to a shared object file, I get these results (code size is size of the .so file):
> N Code Size (Bytes) Time (s)
> 0 2960 0.057
> 1 2984 0.040
> 2 3016 0.058
> 3 3040 0.039
> 4 3064 0.040
> 5 3088 0.060
> 6 3112 0.063
> As you can see, the jit-compiled code has odd jumps of 30x speed drops depending on... something. The shared object file, on the other hand, has consistently sound performance.
> Two questions:
> 1) Can anybody reproduce these effects on their Linux machines, especially different architectures? (I can try an ARM tomorrow.)
> 2) Is there something special about how tcc builds a shared object file that is not happening with the jit-compiled code?
> Thanks!
> David
> --
> "Debugging is twice as hard as writing the code in the first place.
> Therefore, if you write the code as cleverly as possible, you are,
> by definition, not smart enough to debug it." -- Brian Kernighan
> --
> "Debugging is twice as hard as writing the code in the first place.
> Therefore, if you write the code as cleverly as possible, you are,
> by definition, not smart enough to debug it." -- Brian Kernighan
> _______________________________________________
> Tinycc-devel mailing list
> _______________________________________________
> Tinycc-devel mailing list