Re: [Tinycc-devel] Huge swings in cache performance

How many times does foo overflow requiring a cache flush?

From: Tinycc-devel [mailto:tinycc-devel-bounces+address@hidden On Behalf Of David Mertens
Sent: Thursday, January 5, 2017 12:59 AM
To: address@hidden
Subject: Re: [Tinycc-devel] Huge swings in cache performance

Update: I *can* get this slowdown with tcc. The main trigger is to have a global variable that gets modified by the function.

I have updated the gist: https://gist.github.com/run4flat/fcbb6480275b1b9dcaa7a8d3a8084638

This program generates a single function filled with a collection of skipped operations (number of operations is a command-line option) and finished with a modification of a global variable. It compiles the function using tcc, then calls the function a specified number of times (repeat count specified via command-line). It can either generate code in-memory, or it can generate a .so file and load that using dlopen. (If it generates in-memory, it prints the size of the generated code.)

Here are the interesting results on my machine, all for 10,000,000 iterations, using compilation-in-memory:

N   Code Size (Bytes) Time (s)
0                 128     2.52
1                 144       2.54
2                 176       2.57
3                 208       0.035
4                 224       0.058
5                 256       2.57
6                 272       0.060

Switching over to a shared object file, I get these results (code size is size of the .so file):

N   Code Size (Bytes) Time (s)
0                2960     0.057
1                2984       0.040
2                3016       0.058
3                3040       0.039
4                3064       0.040
5                3088       0.060
6                3112       0.063

As you can see, the jit-compiled code has odd jumps of 30x speed drops depending on... something. The shared object file, on the other hand, has consistently sound performance.

Two questions:

1) Can anybody reproduce these effects on their Linux machines, especially different architectures? (I can try an ARM tomorrow.)

2) Is there something special about how tcc builds a shared object file that is not happening with the jit-compiled code?

Thanks!

David

--

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan

From:	Michael B. Smith
Subject:	Re: [Tinycc-devel] Huge swings in cache performance
Date:	Thu, 5 Jan 2017 23:09:08 +0000