[avr-gcc-list] Re: Optimisation of bit set/clear in uint32

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t

From:	David Brown
Subject:	[avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
Date:	Wed, 22 Apr 2009 12:48:52 +0200
User-agent:	Thunderbird 2.0.0.21 (Windows/20090302)

Dale Whitfield wrote:

Hi David                                         >@2009.04.22_10:55:27_+0200

Dale Whitfield wrote:

Hi David                                         >@2009.04.21_19:42:34_+0200

Dale Whitfield wrote:

Hi,

All this talk of super-optimisers, etc. brings this issue I've had,
to the fore again.

Perhaps there are suggestions on how to encourage the compiler to do the
optimisations rather than having to hand-code to get the best result.

Take this code:

volatile uint32_t       status;
static inline void set_status(uint32_t flag, uint8_t set) 
__attribute__((always_inline));

void set_status(uint32_t flag, uint8_t set) {
        if (set) status |= flag;
        else status &= ~flag;
}

Which when used in an interrupt generates:

ISR (INT7_vect)
{

<snip>

That's way too much code... Its fairly obvious where the optimisations
should be, although I can see that some have been done already.
I can't see much possibility for improving the above code (except byremoving the push and zeroing of r1). You asked for "status" to be avolatile 32-bit int, so that's what you got. The code below issemantically different, and thus compiles differently.
I don't believe it is semantically different. Which is one reason I
raised this. I am using the version below in existing code and it
behaves correctly.

Loading 4 registers and then storing back 3 that are unchanged makes no
sense at all.

I'm afraid that's how "volatile" works. In a way, by using "volatile"you are telling the compiler "I *know* this makes no sense to you, butdo it my way anyway".


Its how volatile works by *loading* the registers before making changes.
However, if its possible to only store back changed bytes, then surely
that's all that needs to be done.

Nope - volatile loads and volatile stores are always done completely,with the whole data element. That's fundamental to "volatile".

Where volatile comes in here is that the optimisation shouldn't use any
previous loads or modify register values more than once without
reloading/storing etc. Here, the value is loaded once, one byte is
changed and then all 4 are stored. That's wasteful.
That's what you get when you make a 32-bit item volatile - all accessesto it use the whole 32-bit data, every time.


:-) I guess we need to agree to disagree on this. I was hoping this
wouldn't become a philosophical issue around volatile data. Which is
fraught at the best of times.

You can disagree all you like - but I've got the C standards, every Ccompiler, every C compiler developer, and almost every C programmer onmy side. It's not a matter of philosophy or disagreement any more thana disagreement about wither "0.5" is an int or not.

Suppose you have a 16-bit hardware timer that must always be set lowbyte first, then high byte. If you make a change that just affects thelow byte, you still want the compiler to generate the full 16-bit write- you achieve that by declaring the timer as a 16-bit volatile variable.How is the compiler supposed to guess that you mean something differentin this case?
I don't think this is a good example, because processor register
accesses should be treated differently to plain RAM accesses. And, yes,
of course just updating one byte in this scenario would be wrong.

C has no way to differentiate these concepts - it's simply not part ofthe language. With a perfect embedded systems programming language youcould get exactly the type of access control you are asking for - C isnot perfect, and does not have these concepts. It's all or nothing -volatile or not volatile.

You highlight an interesting distinction though.

Maybe this is where the optimisation goes astray. For IO access, or
access to processor registers rather, since they may be accessed via lds
sts, the optimisation doesn't apply. However, for RAM access, it does.

There is no difference, there is no "optimisation gone astray". You askthe compiler to use volatile accesses, and it uses volatile accesses.

In this example, the interrupt routine is half as long when, as I see
it, correctly optimised. On an interrupt particularly, this could be
critical time/cycle saving.
I agree with your conclusion - the code is longer and slower thanneeded. But you are totally wrong on your premise - this is not becausethe compiler is failing in its optimisation. It is because *your* codeforces it to generate long and slow code.
There is no reason for using "volatile" in this situation - making"status" non-volatile will give you exactly the code you want here. Butit may then cause trouble for other code, depending on how you use"status" and "set_status".
There is every reason. The flags get updated in interrupt routines and
in non-interrupt code.

That's dependent on parts of your code that I haven't seen, so I can'tcomment.

However, based on your other post in this thread, you are correct inthat volatile access is not what you need - it's atomic access (thatapplies to 8-bit data as well as 32-bit data).

Try the following version of set_status:

void set_status(uint32_t flag, uint8_t set) {
        if (set) *(uint32_t* &status) |= flag;
        else *(uint32_t* &status) &= ~flag;
        asm("" : : : "memory");
}
First, we remove the "volatile" from status so that the optimiser cangive you a single byte access when the function is inlined. Secondly,instead of using "volatile" to protect the access to "status", we use amemory clobber that forces writes to memory variables to be completedbefore going on. This will ensure that the change is written out tostatus as soon as possible (otherwise changes could be cached for laterstorage).
Which would be a problem, since the volatile nature of the variable is
important in this usage.
(I haven't tried compiling that code yet.)



_______________________________________________
AVR-GCC-list mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [avr-gcc-list] Optimisation of bit set/clear in uint32_t, (continued)
- Re: [avr-gcc-list] Optimisation of bit set/clear in uint32_t, Georg-Johann Lay, 2009/04/21
- [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/21
  - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
    - [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Paulo Marques, 2009/04/22
    - [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
    - [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown <=
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
    - [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, David Brown, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Graham Davies, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dale Whitfield, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Graham Davies, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Paulo Marques, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Alex Wenger, 2009/04/22
    - Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t, Dave Hylands, 2009/04/22

Prev by Date: Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
Next by Date: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
Previous by thread: Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
Next by thread: Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
Index(es):
- Date
- Thread