[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] Small super-optimizer results
From: |
Georg-Johann Lay |
Subject: |
Re: [avr-gcc-list] Small super-optimizer results |
Date: |
Thu, 22 Sep 2011 19:06:01 +0200 |
User-agent: |
Thunderbird 2.0.0.24 (X11/20100302) |
Paulo Marques schrieb:
> Georg-Johann Lay wrote:
>> Paulo Marques schrieb:
>>> Hi, all
>>>
>>> Some time ago I toyed with trying to build a super-optimizer for the
>>> avr. The project didn't went very far, but it produced a few small bits
>>> of code that I think are worth sharing.
>>>
>>> Basically I was trying to produce functions to multiply an 8 bit value
>>> by 8 bit constants with a 16 bit result in avr families without a mul
>>> instruction.
>> Hi Paulo,
>
> Hi, Georg
>
>> can your optimizer produce results for other arithmetic than
>> multiplication?
>
> Yes, with a few tweaks.
>
>> For example to get a sequence for
>>
>> R >>>= 2
>>
>> where R is an 8-bit register and >>> stands for "rotate right".
>> There is a 5-insn sequence
>>
>> SWAP R
>> LSL R
>> ADC R,Z
>> LSL R
>> ADC R,Z
>>
>> where Z means "a register containing 0", i.e. >>> 2 is performed as
>> <<< 4
>> <<< 1
>> <<< 1
>>
>> With MUL instruction there is
>>
>> LDI U,64
>> MUL R,U
>> OR R0,R1
>> MOV R,R0
>>
>> but that needs an upper register U and for avr-gcc the zero-register R1
>> must be cleared:
>>
>> CLR Z
>>
>> so that this sequence is not shorter but only slower. The question is
>> if there exists a shorter sequence and maybe your tool knows some magic
>> to give the answer?
>
> The tool can not give a definite answer, because it only uses a subset
> of all available instructions (and it might have bugs).
>
> I tweaked it for the 2 bit rotation, and with the subset I usually use
> it couldn't find any sequence better than yours: the best sequences used
> 5 instructions and 2 registers.
>
> I then increased the subset of instructions to give it more options, but
> it still could not find a smaller sequence... :(
>
> It did found a lot of 5 instruction sequences. Here are a few examples:
>
> MOV R2, R
> ASR R2
> ROR R
> ASR R2
> ROR R
>
> This also uses an extra register, but doesn't require it to be
> initialized to zero. Since we have a zero register available most of the
> time, this is really no advantage over your version.
>
> ADD R, R
> ADC R, Z
> ADD R, R
> ADC R, Z
> SWAP R
>
> This is basically the same as yours, although the SWAP being done in the
> end makes the code harder to follow ;)
>
> I've attached the output of the optimizer. The R1 register is assumed to
> have zero when starting and R0 has the value to be shifted. As you can
> see many of the 81 sequences are equivalent.
Hi Paulo.
Thanks for the enthusiastic digging :-)
There are weird sequences like
SWAP R0
ADD R0 R0
ADC R0 R1
ADC R0 R0
ADC R0 R1
that use knowledge that C=0 in the second ADC. Thus it's just an obfuscated
version of the next alternative:
SWAP R0
ADD R0 R0
ADC R0 R1
ADD R0 R0
ADC R0 R1
There are quite some different ways of doing the rotate because rotates are
commutative.
Do you use BLD and BST?
> If you want me to try different problems, just let me know. I'm always
> curious about what kind of weird sequences the optimizer will produce :)
I am preparing a patch for avr-gcc that introduces rotate patterns.
For the 8-bit rotate I found the 5-insn sequence, but for some reason I though
there should be a shorter one. Like: "it cannot be that hard to rotate an
8-bit value right by 2..." and remembered your post here.
Johann