Re: [Tinycc-devel] Huge swings in cache performance

Bah, it looks like the situation is not as simple as I had hoped. Here is the code I wrote to try (unsuccessfully) to emulate the problem: https://gist.github.com/run4flat/fcbb6480275b1b9dcaa7a8d3a8084638

This executable uses libtcc to jit-compile a chunk of C code with varying length, then call the generated function a variable number of cycles. The only way to go off the icache cliff is to produce a function with over 3000 increment/decrement operations. Going from 2000 to 3000 causes the icache miss rate to increase by a factor of 100. However, there is no measurable performance degradation: the computational time increases as O(N).

This is good, I suppose. It means that any performance gains to be had are on my end, not on tcc. For my original problem, I suspect that there are icache conflicts between Perl's object code and the jit-compiled stuff, or (as Edmund suggests) false cache invalidations due to adjacent data updates.

David

On Wed, Dec 21, 2016 at 2:39 AM, Christian Jullien <address@hidden> wrote:

No!
M2 mode generates a pseudo assembler Lisp code like:

((fentry fib 1 0 0)
(param 0)
(jeq _l004 '1)
(jneq _l003 '2)
(move a1 '1)
(return)
_l003
(gsub1 a1)
...

Each instruction is translated into an encoded integer (OpenLisp specific)
and goes to a vector which corresponds to compiled code.
This vector is interpreted with a state machine working much like a
processor using a BIG switch

JNEQ above is interpreted using:

for( ;; ) {
nextinst:
inst = ((FIXPTR)opcode[ pc++ ]) >> 4;

switch( olopcode( inst ) ) {
case LAP_FENTRY:
/*
* LEN = 3 : 00 | type | xxxx | yyyy | zzzz
*
* information only, should not be executed.
* x is the number of required arguments and
* y is flag used when function has &rest or
* :rest argument.
* z is the number of local paramters.
*/
OLLAPNEXT;
...
case LAP_JNEQ:
/*
* LEN = 1 : 5C xx xx obj
*
* where xx is the next address and obj an
immediate.
*/
ollapinton();

if( a1 != opcode[ pc++ ] ) {
pc = (int)olushortarg( inst );
}
OLLAPNEXT;

M3 takes M2 pseudo code and generates pure C code which is statically
compiled (as a standard .c file).

-----Original Message-----
From: Tinycc-devel [mailto:tinycc-devel-bounces+eligis=address@hidden]
On Behalf Of Edmund Grimley Evans
Sent: mercredi 21 décembre 2016 08:26
To: address@hidden
Subject: Re: [Tinycc-devel] Huge swings in cache performance

Are you dynamically generating code on Intel? But presumably the dynamically
generated code is not inside your loop? However, if your dynamically
generated code is adjacent in memory to some data that gets modified, then
it could be (I have no idea how this stuff works on Intel) that the
processor thinks that the code may have been modified, even though it hasn't
been modified, and invalidates the cache just in case. And this phenomenon
would be very sensitive to the precise layout. The solution might be to put
the dynamically generated code in a block of memory that is separately
allocated with mmap. On the other hand, if you're already doing that,
probably this isn't the explanation.

Edmund

_______________________________________________
Tinycc-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

_______________________________________________
Tinycc-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan

From:	David Mertens
Subject:	Re: [Tinycc-devel] Huge swings in cache performance
Date:	Wed, 21 Dec 2016 12:34:19 -0500