[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
From: |
David Brown |
Subject: |
[avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t |
Date: |
Wed, 22 Apr 2009 12:48:52 +0200 |
User-agent: |
Thunderbird 2.0.0.21 (Windows/20090302) |
Dale Whitfield wrote:
Hi David >@2009.04.22_10:55:27_+0200
Dale Whitfield wrote:
Hi David >@2009.04.21_19:42:34_+0200
Dale Whitfield wrote:
Hi,
All this talk of super-optimisers, etc. brings this issue I've had,
to the fore again.
Perhaps there are suggestions on how to encourage the compiler to do the
optimisations rather than having to hand-code to get the best result.
Take this code:
volatile uint32_t status;
static inline void set_status(uint32_t flag, uint8_t set)
__attribute__((always_inline));
void set_status(uint32_t flag, uint8_t set) {
if (set) status |= flag;
else status &= ~flag;
}
Which when used in an interrupt generates:
ISR (INT7_vect)
{
<snip>
That's way too much code... Its fairly obvious where the optimisations
should be, although I can see that some have been done already.
I can't see much possibility for improving the above code (except by
removing the push and zeroing of r1). You asked for "status" to be a
volatile 32-bit int, so that's what you got. The code below is
semantically different, and thus compiles differently.
I don't believe it is semantically different. Which is one reason I
raised this. I am using the version below in existing code and it
behaves correctly.
Loading 4 registers and then storing back 3 that are unchanged makes no
sense at all.
I'm afraid that's how "volatile" works. In a way, by using "volatile"
you are telling the compiler "I *know* this makes no sense to you, but
do it my way anyway".
Its how volatile works by *loading* the registers before making changes.
However, if its possible to only store back changed bytes, then surely
that's all that needs to be done.
Nope - volatile loads and volatile stores are always done completely,
with the whole data element. That's fundamental to "volatile".
Where volatile comes in here is that the optimisation shouldn't use any
previous loads or modify register values more than once without
reloading/storing etc. Here, the value is loaded once, one byte is
changed and then all 4 are stored. That's wasteful.
That's what you get when you make a 32-bit item volatile - all accesses
to it use the whole 32-bit data, every time.
:-) I guess we need to agree to disagree on this. I was hoping this
wouldn't become a philosophical issue around volatile data. Which is
fraught at the best of times.
You can disagree all you like - but I've got the C standards, every C
compiler, every C compiler developer, and almost every C programmer on
my side. It's not a matter of philosophy or disagreement any more than
a disagreement about wither "0.5" is an int or not.
Suppose you have a 16-bit hardware timer that must always be set low
byte first, then high byte. If you make a change that just affects the
low byte, you still want the compiler to generate the full 16-bit write
- you achieve that by declaring the timer as a 16-bit volatile variable.
How is the compiler supposed to guess that you mean something different
in this case?
I don't think this is a good example, because processor register
accesses should be treated differently to plain RAM accesses. And, yes,
of course just updating one byte in this scenario would be wrong.
C has no way to differentiate these concepts - it's simply not part of
the language. With a perfect embedded systems programming language you
could get exactly the type of access control you are asking for - C is
not perfect, and does not have these concepts. It's all or nothing -
volatile or not volatile.
You highlight an interesting distinction though.
Maybe this is where the optimisation goes astray. For IO access, or
access to processor registers rather, since they may be accessed via lds
sts, the optimisation doesn't apply. However, for RAM access, it does.
There is no difference, there is no "optimisation gone astray". You ask
the compiler to use volatile accesses, and it uses volatile accesses.
In this example, the interrupt routine is half as long when, as I see
it, correctly optimised. On an interrupt particularly, this could be
critical time/cycle saving.
I agree with your conclusion - the code is longer and slower than
needed. But you are totally wrong on your premise - this is not because
the compiler is failing in its optimisation. It is because *your* code
forces it to generate long and slow code.
There is no reason for using "volatile" in this situation - making
"status" non-volatile will give you exactly the code you want here. But
it may then cause trouble for other code, depending on how you use
"status" and "set_status".
There is every reason. The flags get updated in interrupt routines and
in non-interrupt code.
That's dependent on parts of your code that I haven't seen, so I can't
comment.
However, based on your other post in this thread, you are correct in
that volatile access is not what you need - it's atomic access (that
applies to 8-bit data as well as 32-bit data).
Try the following version of set_status:
void set_status(uint32_t flag, uint8_t set) {
if (set) *(uint32_t* &status) |= flag;
else *(uint32_t* &status) &= ~flag;
asm("" : : : "memory");
}
First, we remove the "volatile" from status so that the optimiser can
give you a single byte access when the function is inlined. Secondly,
instead of using "volatile" to protect the access to "status", we use a
memory clobber that forces writes to memory variables to be completed
before going on. This will ensure that the change is written out to
status as soon as possible (otherwise changes could be cached for later
storage).
Which would be a problem, since the volatile nature of the variable is
important in this usage.
(I haven't tried compiling that code yet.)
_______________________________________________
AVR-GCC-list mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
- Re: [avr-gcc-list] Optimisation of bit set/clear in uint32_t, (continued)
- Re: [avr-gcc-list] Optimisation of bit set/clear in uint32_t, Georg-Johann Lay, 2009/04/21
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/21
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Paulo Marques, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t,
David Brown <=
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Graham Davies, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Graham Davies, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Paulo Marques, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
- Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dave Hylands, 2009/04/22