[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-gsl] C/C++ speed optimization bible/resources/pointers needed
From: |
Gordan Bobic |
Subject: |
Re: [Help-gsl] C/C++ speed optimization bible/resources/pointers needed, and about using GSL... |
Date: |
Fri, 27 Jul 2007 08:51:19 +0100 (BST) |
On Fri, 27 Jul 2007, Michael wrote:
I am in the middle of programming to solve an engineering problem
where the speed is huge concern. The project involving lots of
numerical integration and then there are several loops/levels of
optimization on top of the function evaluation engine.
A few general rules I've found to help a lot:
- Don't use unnecessary precision if you don't need it. Don't use a double
if a float will do. This is particularly important it code that the
compiler can vectorize. Even if your SIMD vector unit can handle doubles,
it can typically handle twice as many floats as doubles for the same
operation in the same amount of time.
- Use static variables in your functions wherever possible. This
especially goes for arrays, if you can get away with it. If you know the
maximum size you'll need for an array, static declare it once, and just
use that. malloc()/new is slow.
- Use a good compiler for your target platform. Sadly, GCC isn't great.
It's not even good. When it comes to tight vectorized loops and you don't
want to reach for the hand crafted assembler, I have seen performance
boosts of between 4.5x (on P3 / Core2) and 8.5x (P4) when using Intel's
ICC compiler instead of GCC. GSL certainly compiles and works fine with
ICC. IBM have a compiler for the PowerPC, and I believe Sun have their own
optimizing compiler for the SPARC. They are almost certainly worth looking
into.
As you probably
know, the key to a successful optimization is a fast underlying
objective function evaluator. The faster it is, the more promising the
optimization result(perhaps global optimal). However our project
requires many numerical integrations which prohibits us from making it
super fast. At the heart of the numerical integration is a smart
integrator and a super-fast integrand function evaluator. Even worse,
our function evaluation is in complex-domain. So the key point is how
to arrange our C/C++ code to make it highly efficient in every aspect.
Could anybody give some advice/pointers on how to improve the speed of
C/C++ program? How to arrange code? How to make it highly efficient
and super fast? What options do I have if I don't have luxury to use
multi-threaded, multi-core or distributed computing? But I do have a
P4 at least. Please recommend some good bibles and resources! Thank
you!
On a P4, ICC will utterly annihilate GCC in terms of performance of the
resulting code, especially when it comes to heavy maths. Get a copy and
try it. Enable -vec-report3 and see which of your loops don't vectorize.
Where possible, re-arrange them so that they do vectorize. The compiler
often needs a hand with assorted little hacks to help it vectorize the
code, but they are generally not ugly, and are most of the time limited
to:
1) Copy the object property into a local variable. This will help persuade
compiler that there is no vector dependance it needs to worry about.
2) If you have a loop where you are operating with the iterator on an
array, have a shadow iterator of a vectorizable type. Remember you cannot
use mixed type in vector operations. For example, you can do:
static unsigned int i;
static float ii;
static float baz[16];
for (i = 0, ii = 0; i < 16; i++)
baz[i] *= ii; // vectorizes
//baz[i] *= i; // doesn't vectorize
3) If your function parameters are changing partially, evaluate them
partially and cache the results for each part so you don't have to
re-calculate. For example, if your function is something like:
Y = a * (bX^2 + c) + d;
Arrange your loops so the outer-most one works on the inner-most
evaluation (in this case X*X, followed by multiplication by b, followed by
addition of c, followed my multiplication by a, followed by addition of d
in the innermost loop. You can then cache the values of X*X (which is,
incidentally, much faster than (pow(X,2)), b*X^2, b*X^2+c, etc, so when
your outer parameters change, you don't have to re-calculate the inner
terms. How you design your caches is also important. This can cause more
overhead than it saves, so you have to optimize it very carefully while
keeping your underlying algorithm structure in mind.
4) Keep your data sizes in mind. If your frequently used data doesn't fit
in the CPU caches, you are likely to start experiencing slow-downs on the
order of 20x or so due to latencies. Use a float when you don't need a
double, as they are half the size.
5) Write the optimized code yourself. GSL and similar libraries are great
for a rapid proof of concept prototype, but there is a price to pay in
terms of performance when using generic code vs. bespoke code specifically
optimized for a particular task.
6) Learn the compiler switches for your compiler. Test the accuracy of
your resulting numbers. When you start cutting corners (e.g. "-ffast-math
-mfpmath=sse,387" on GCC, "-fp-model fast=2 -rcd" on ICC) you may get more
speed, but sometimes the precision on your floats will reduce. This may or
may not be acceptable for your application.
7) Think about your algorithms. If you are doing a calculation that is a
modulo of a power of 2 on an int, use a bit-wise AND instead. It is an
order of magnitude faster. (e.g. instead of X % 4, do X &= 0x3).
There are hundreds of little hacks you can do to speed your code up. It is
impossible to simmarize them all, and they will differ from project to
project and they won't all be appropriate all the time. I hope this gets
you started on the right path, though. :-)
Good luck.
Gordan
- [Help-gsl] C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Michael, 2007/07/27
- Re: [Help-gsl] C/C++ speed optimization bible/resources/pointers needed, and about using GSL...,
Gordan Bobic <=
- [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Lionel B, 2007/07/27
- Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Gordan Bobic, 2007/07/27
- Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Jochen Küpper, 2007/07/27
- Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Gordan Bobic, 2007/07/27
- Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Jochen Küpper, 2007/07/27
- Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Gordan Bobic, 2007/07/27
- Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Marc Baaden, 2007/07/27