[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[avr-gcc-list] Missed optimisations
From: |
David Brown |
Subject: |
[avr-gcc-list] Missed optimisations |
Date: |
Mon, 14 Jan 2008 15:00:59 +0100 |
User-agent: |
Thunderbird 2.0.0.9 (Windows/20071031) |
I've just tried out the crc code I posted recently on avr-chat:
> static const uint8_t PROGMEM crcTable8n[16] = {
> 0x00, 0x8D, 0x97, 0x1A, 0xA3, 0x2E, 0x34, 0xB9,
> 0xCB, 0x46, 0x5C, 0xD1, 0x68, 0xE5, 0xFF, 0x72
> };
>
> uint8_t CrcByte8n(uint8_t crc, uint8_t data) {
> // Fold next byte into CRC
> crc = (crc << 4) ^ (pgm_read_byte(&crcTable8n[(crc >> 4) ^
(data >> 4)]));
> crc = (crc << 4) ^ (pgm_read_byte(&crcTable8n[(crc >> 4) ^
(data & 0x0f)]));
>
> return crc;
> }
>
The generated assemble from avr-gcc 4.2.2 (latest winavr release) with
-Os is:
40 CrcByte8n:
41 /* prologue: frame size=0 */
42 /* prologue end (size=0) */
43 0000 982F mov r25,r24 ; tmp48, crc
44 0002 9627 eor r25,r22 ; tmp48, data
45 0004 9295 swap r25 ; tmp48
46 0006 9F70 andi r25,lo8(15) ; tmp48,
47 0008 20E0 ldi r18,lo8(crcTable8n) ; tmp51,
48 000a 30E0 ldi r19,hi8(crcTable8n) ; tmp51,
49 000c F901 movw r30,r18 ; tmp52, tmp51
50 000e E90F add r30,r25 ; tmp52, tmp48
51 0010 F11D adc r31,__zero_reg__ ; tmp52
52 /* #APP */
53 0012 E491 lpm r30, Z ; __result
54
55 /* #NOAPP */
56 0014 8295 swap r24 ; crc.36
57 0016 807F andi r24,lo8(-16) ; crc.36,
58 0018 8E27 eor r24,r30 ; crc.36, __result
59 001a E82F mov r30,r24 ; tmp55, crc.36
60 001c E295 swap r30 ; tmp55
61 001e EF70 andi r30,lo8(15) ; tmp55,
62 0020 F0E0 ldi r31,lo8(0) ; ,
63 0022 70E0 ldi r23,lo8(0) ; data,
64 0024 6F70 andi r22,lo8(15) ; data,
65 0026 7070 andi r23,hi8(15) ; data,
66 0028 E627 eor r30,r22 ; tmp56, data
67 002a F727 eor r31,r23 ; tmp56, data
68 002c E20F add r30,r18 ; tmp56, tmp51
69 002e F31F adc r31,r19 ; tmp56, tmp51
70 /* #APP */
71 0030 E491 lpm r30, Z ; __result
72
73 /* #NOAPP */
74 0032 8295 swap r24 ; crc.36
75 0034 807F andi r24,lo8(-16) ; crc.36,
76 0036 8E27 eor r24,r30 ; crc.36, __result
77 0038 90E0 ldi r25,lo8(0) ; <result>,
78 /* epilogue: frame size=0 */
79 003a 0895 ret
The code is basically good (the "swap" instruction is used for the
shifts, which is very nice - a big improvement over the older 3.4 gcc),
but there are a few missed optimisations shown here that are probably
quite common in other code.
Why is the address of crcTable8n loaded into r18:r19 first, before being
copied into r30:r31 for the address calculation? It seems that this
happens when the address is reused - if it is not reused, then r30:r31
are loaded directly. However, the reuse does not benefit from having
the address in a register - the "add r30, r18" and "adc r31, r19" on
lines 68 and 69 could be replaced with subi and sbci instructions to
save space and time, and to free registers r18:r19. On most RISC cpus,
storing the address in a register for reuse would be a benefit, which is
probably why this code is generated - on the AVR, it is not helpful (at
least, not here).
Secondly, the "(data & 0x0f)" clause generates messy 16-bit code. I
realise C requires integer promotion in such cases, but it's important
to try to remove unnecessary code such as loading the high register with
zero, then anding it with zero, then eoring it. gcc version 3.4.6 was
sometimes marginally better at such code. It should be noted that the
quality of the generated code depends very much on the exact expression
- the original "[(crc >> 4) ^ (data & 0x0f)]" generates poor code, while
the equivalent "[((crc >> 4) ^ data) & 0x0f]" generates tight code.
Running the same code in c++ mode produces worse results - there are
more promotions to 16-bit which are not removed by the optimiser. I
realise there are no maintainers for the c++ port, and the c++ specific
libraries are not implemented, but is there a particular problem that is
causing this difference?
I've looked through the bug lists, and can't see any of these points in
existing bug reports, although there may be some connection with
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11259
Best regards,
David
- [avr-gcc-list] Missed optimisations,
David Brown <=