|
From: | Wolfgang Hospital |
Subject: | sped-up functions from lib1funcs.S: what about using more instructions and/or stack? |
Date: | Wed, 15 May 2024 08:47:24 +0200 |
User-agent: | Mozilla Thunderbird |
Dear all,
G-J Lay has been kind enough to turn my whine
about __udivmodqi4 into a bug report and
handle that; I tried to follow suit reporting further strict
improvements (NO resource used more, at least one used less).
While I think bug keyword "missed-optimization" is for missing
opportunities during compilation, I have no problem regarding
strictly sub-optimal library code as a missed
optimization.
But what about speed improvements that take more instructions and/or stack, or are slower for some argument values? Starting with a same size __mulqi3 faster for all multipliers but zero, for which it is slower, or a __mulhi3 with worst case about twice as fast, but 3 instructions longer than the current code (both pointless for cores with mul, obviously). Or division routines: a faster one that is no larger "without movw", but uses one more return address on stack; one that is 2 instructions smaller, a wee bit faster on average, but slower worst case; one that's about 14 cycles faster, but 1 instruction longer?
How important is arithmetic for longer operands?
regards
W. Hospital
-- Wolfgang Hospital
[Prev in Thread] | Current Thread | [Next in Thread] |