Tcc was never intended to be a real optimizing compiler, because it is focused on compilation speed and memory savings. However, it seems that some improvements on generated code can be done, without significant compiler speed degradation.As tcc generates machine code, rather than assembly, pattern searching for peephole optimization might be quite fast.
I remarked that sequences in 386 generated code like this (Intel syntax)
MOV ECX,[memory location]
ADD EAX,ECX
appear quite often, and they can be replaced with
ADD EAX, [memory location]
which saves two bytes. Together with variants using AND, OR, XOR, ADC, SUB and SBB instructions, ]iIn a generated code from tcc.c there are about 400 similar sequences.
To do the replacement In a file i386-gen.c I have changed the function
void gen_opi(int op) about 20 lines below the label gen_op8:.
.....
} else {
gv2(RC_INT, RC_INT);
r = vtop[-1].r;
fr = vtop[0].r;
o((opc << 3) | 0x01);
o(0xc0 + r + fr * 8);
// Added code******
unsigned char * peep;
peep=cur_text_section->data+ind;
// op with global var
if (peep[-8] == 0x8B && /* MOV reg,regm */
((peep[-7] & 0xC7) == 5) && /* modrm= [disp32] */
((peep[-7] & 0x38) == (peep[-1] & 0x38 )) /* dest reg of first==srcreg of second */
) {
peep[-8]=(opc << 3) | 0x03; /* first instruction change mov to op and swap registers */
peep[-7] &= 0xC7; /* clear reg field of first instruction */
peep[-7] |= r*8; /* set reg field of first instruction */
ind -= 2; /* two bytes saved */
}
// op with local var
else {
if (peep[-5] == 0x8B && /* MOV reg,regm */
((peep[-4] & 0xC7) == 0x45) && /* modrm= [EBP+disp8] */
((peep[-4] & 0x38) == (peep[-1] & 0x38 )) /* dest reg of first==srcreg of second */
) {
peep[-5]=(opc << 3) | 0x03; /* first instruction change mov to op and swap registers */
peep[-4] &= 0xC7; /* clear reg field of first instruction */
peep[-4] |= r*8; /* set reg field of first instruction */
ind -= 2; /* two bytes saved */
}
}
//*************************
}