[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Optimise memset on i386
From: |
Colin Watson |
Subject: |
Re: [PATCH] Optimise memset on i386 |
Date: |
Sat, 24 Jul 2010 23:40:31 +0100 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
On Fri, Jul 23, 2010 at 10:56:24AM -0500, address@hidden wrote:
> [snip]
>
> > + unsigned long patternl = 0;
> > + grub_size_t i;
> > +
> > + for (i = 0; i < sizeof (unsigned long); i++)
> > + patternl |= ((unsigned long) pattern8) << (8 * i);
> > +
>
> might I suggest:
>
> unsigned long patternl = pattern8;
> patternl |= patternl << 8;
> patternl |= patternl << 16;
> patternl |= patternl << 32;
> patternl |= patternl << 64;
>
> O(lg N) instead of O(N), no loop, no branches, and the compiler should be
> smart enough to optimize away the last two lines on systems with narrower
> long.
I no longer have the system on which I benchmarked this. However, since
N is always either 4 or 8 on current targets, this can only amount to
micro-optimisation which I don't think can possibly matter much; we're
talking a handful of cycles at most. Do we really need to spend time
bikeshedding this? The important thing is taking only a cache stall per
long rather than a cache stall per byte; anything else is likely to be
noise.
--
Colin Watson address@hidden