Update: I *can* get this slowdown with tcc. The main trigger is to have a global variable that gets modified by the function.
This program generates a single function filled with a collection of skipped operations (number of operations is a command-line option) and finished with a modification of a global variable. It compiles the function using tcc, then calls the function a specified number of times (repeat count specified via command-line). It can either generate code in-memory, or it can generate a .so file and load that using dlopen. (If it generates in-memory, it prints the size of the generated code.)
Here are the interesting results on my machine, all for 10,000,000 iterations, using compilation-in-memory:
N Code Size (Bytes) Time (s)
0 128 2.52
1 144 2.54
2 176 2.57
3 208 0.035
4 224 0.058
5 256 2.57
6 272 0.060
Switching over to a shared object file, I get these results (code size is size of the .so file):
N Code Size (Bytes) Time (s)
0 2960 0.057
1 2984 0.040
2 3016 0.058
3 3040 0.039
4 3064 0.040
5 3088 0.060
6 3112 0.063
As you can see, the jit-compiled code has odd jumps of 30x speed drops depending on... something. The shared object file, on the other hand, has consistently sound performance.
Two questions:
1) Can anybody reproduce these effects on their Linux machines, especially different architectures? (I can try an ARM tomorrow.)
2) Is there something special about how tcc builds a shared object file that is not happening with the jit-compiled code?
Thanks!