[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Testing alternatives to functions from lib1funcs.S
From: |
Georg-Johann Lay |
Subject: |
Re: Testing alternatives to functions from lib1funcs.S |
Date: |
Sun, 21 Apr 2024 15:22:31 +0200 |
User-agent: |
Mozilla Thunderbird |
Am 21.04.24 um 10:08 schrieb Wolfgang Hospital:> Dear all,>
Is there a test scaffold for the functions from lib1funcs.S,
correctness, size&speed over the variety of 8-bit AVR cores?
Size is the easiest one: Just determine the size of, say
-nodefaultlibs -nostartfiles against a respective compilation
with -Wl,-u,__divmodqi4
Benchmarking speed is not so easy. I am using the avrtest core
simulator because it is fast, simulating a core is enough, and
it has some extra features, e.g. get random values and get values
out of the target, e.g. LOG_FMT_DOUBLE ("double = %f\n", x);
https://github.com/sprintersb/atest
See the end of this mail for an example.
For correctness, most of the functions are tested off testsuite
by hand-written programs that test new implementations against
existing ones, like in the code below. Such tests don't make sense
any more when the new version is integrated. And performance
tests / comparisons are misplaced in the GCC testsuite anyway.
Is there a more comprehensive statement of calling conventions than
https://gcc.gnu.org/wiki/avr-gcc#Exceptions_to_the_Calling_Convention,
It is comprehensive, but likely not complete. For completeness, you'll
have to resort to avr.md and the files it includes. There is no
table that lists the non-ABT stuff though; you'll have to find the
transparent calls, usually of type "xcall". Notice however that
such functions may be ABI or non-ABI. Transparent calls are basically
used for two purposes:
* Non-ABI calls like some mul stuff that gets param in X reg.
* ABI calls that don't clobber all callee-used regs, in order to
model the smaller footprint.
in particular explicitly stating which functions are guaranteed to have
__zero_reg__ 0 on entry/where it suffices to have __zero_reg__ 0 on
return as opposed to preserving its value?
When a function does /not/ have zero_reg=0 on entry, then the compiler
or libc (or application code) has a bug. Same when zero_reg!=0 on
exit.
I've been tinkeringaround, the "ldi r_cnt, 9""rjmp entry point" in
__udivmodqi4 instead of "ldi r_cnt, 8""lsl r_arg1" annoying me for
years. (Biggest relative strict improvement I found, FWIW.)
I went ahead and applied it, see https://gcc.gnu.org/PR114794
In order to test it, I ran the following code with
avrtest_log -q -no-log ...
<CODE>
#include <stdint.h>
#include "avrtest.h"
volatile uint8_t q8, my_q8;
volatile uint8_t r8, my_r8;
extern void __udivmodqi4 (void);
extern void my_udivmodqi4 (void);
__asm("\n"
"r_rem = 25 /* remainder */" "\n"
"r_arg1 = 24 /* dividend, quotient */" "\n"
"r_arg2 = 22 /* divisor */" "\n"
"r_cnt = 23 /* loop count */" "\n"
".pushsection .text" "\n"
".global my_udivmodqi4" "\n"
"my_udivmodqi4:" "\n\t"
" sub r_rem,r_rem ; clear remainder and carry" "\n\t"
" ldi r_cnt,8 ; init loop counter" "\n\t"
" lsl r_arg1 ; shift dividend" "\n\t"
"__udivmodqi4_loop:" "\n\t"
" rol r_rem ; shift dividend into remainder" "\n\t"
" cp r_rem,r_arg2 ; compare remainder & divisor" "\n\t"
" brcs __udivmodqi4_ep ; remainder <= divisor" "\n\t"
" sub r_rem,r_arg2 ; restore remainder" "\n\t"
"__udivmodqi4_ep:" "\n\t"
" rol r_arg1 ; shift dividend (with CARRY)" "\n\t"
" dec r_cnt ; decrement loop counter" "\n\t"
" brne __udivmodqi4_loop" "\n\t"
" com r_arg1 ; complement result" "\n\t"
" ; because C flag was complemented in loop" "\n\t"
" ret" "\n\t"
".popsection");
static inline __attribute__((__always_inline__))
void my_divmod8 (volatile uint8_t *pq, volatile uint8_t *prem,
uint8_t dividend, uint8_t divisor)
{
register uint8_t rem asm("25");
register uint8_t q asm("24");
register uint8_t r22 asm("22") = divisor;
register uint8_t r24 asm("24") = dividend;
asm ("%~call %x[func]"
: "=r" (q), "=r" (rem)
: "r" (r22), "r" (r24), [func] "i" (my_udivmodqi4)
: "r23");
*pq = q;
*prem = rem;
}
static inline __attribute__((__always_inline__))
void divmod8 (volatile uint8_t *pq, volatile uint8_t *prem,
uint8_t dividend, uint8_t divisor)
{
register uint8_t rem asm("25");
register uint8_t q asm("24");
register uint8_t r22 asm("22") = divisor;
register uint8_t r24 asm("24") = dividend;
asm ("%~call %x[func]"
: "=r" (q), "=r" (rem)
: "r" (r22), "r" (r24), [func] "i" (__udivmodqi4)
: "r23");
*pq = q;
*prem = rem;
}
void bench_divmod8 (void)
{
uint8_t a = 0;
do
{
uint8_t b = 1;
do
{
PERF_START_CALL (1);
divmod8 (&q8, &r8, a, b);
PERF_STOP (1);
PERF_START_CALL (2);
my_divmod8 (&my_q8, &my_r8, a, b);
PERF_STOP (2);
if (q8 != my_q8 || r8 != my_r8)
__builtin_abort();
} while (++b);
} while (++a);
}
int main (void)
{
bench_divmod8();
PERF_DUMP_ALL;
return 0;
}
</CODE>
The input space is only 16 bits wide, so a full coverage is possible.
With larger input spaces, one could use avrtest_[p]rand() or
similar means to randomize the input.
The output is as follows:
$ avrtest_log -mmcu=avr5 -no-log ben.elf -m 100000000 -q
--- Dump # 1:
Timer T1 "" (65280 rounds): 00ec--00fc
Instructions Ticks
Total: 3765820 5222400
Mean: 57 80
Stand.Dev: 0.9 0.0
Min: 57 80
Max: 65 80
Calls (abs) in [ 2, 3] was: 2 now: 2
Calls (rel) in [ 0, 1] was: 0 now: 0
Stack (abs) in [08fb,08f9] was:08fb now:08fb
Stack (rel) in [ 0, 2] was: 0 now: 0
Min round Max round Min tag / Max tag
Calls -all-same- /
Stack -all-same- /
Instr. 1 65026 -no-tag- / -no-tag-
Ticks -all-same- /
Timer T2 "" (65280 rounds): 0108--0116
Instructions Ticks
Total: 3569980 4896000
Mean: 54 75
Stand.Dev: 0.9 0.0
Min: 54 75
Max: 62 75
Calls (abs) in [ 2, 3] was: 2 now: 2
Calls (rel) in [ 0, 1] was: 0 now: 0
Stack (abs) in [08fb,08f9] was:08fb now:08fb
Stack (rel) in [ 0, 2] was: 0 now: 0
Min round Max round Min tag / Max tag
Calls -all-same- /
Stack -all-same- /
Instr. 1 65026 -no-tag- / -no-tag-
Ticks -all-same- /
So the new code requires 5 ticks less (changed from 80 to 75)
"Calls" is the (relative or absolute) call depth.
"Stack" is the (relative or absolute) stack usage.
Johann
Recommendations for a platform to vent such ideas welcome (I know of
stackoverflow.com).
regards
W. Hospital
--
Wolfgang Hospital