Openblas can be compiled as single-threaded library, or a multithreaded library with either pthread or omp interface.
A long time ago somebody told me that I should use omp interface in a multithreaded program (like octave).
With OMP interface you need to use OMP_NUM_THREADS variable, OPENBLAS_NUM_THREADS is ignored.
I also noticed that at least on linux you need to setup this variable before starting octave; setting it from withing octave
does not work. Here is some benchmarks on i7-2600K (Fedora linux):
[dima@i7 gcc_def]$ OMP_NUM_THREADS=1 LD_PRELOAD=/usr/lib64/libopenblaso.so ./run-octave -q -f
octave:1> a=randn(4000); tic; inv(a)*a; toc
Elapsed time is 10.6499 seconds.
octave:2>
[dima@i7 gcc_def]$ OMP_NUM_THREADS=2 LD_PRELOAD=/usr/lib64/libopenblaso.so ./run-octave -q -f
octave:1> a=randn(4000); tic; inv(a)*a; toc
Elapsed time is 5.75883 seconds.
octave:2>
[dima@i7 gcc_def]$ OMP_NUM_THREADS=4 LD_PRELOAD=/usr/lib64/libopenblaso.so ./run-octave -q -f
octave:1> a=randn(4000); tic; inv(a)*a; toc
Elapsed time is 3.8937 seconds.
octave:2>
[dima@i7 gcc_def]$ OMP_NUM_THREADS=6 LD_PRELOAD=/usr/lib64/libopenblaso.so ./run-octave -q -f
octave:1> a=randn(4000); tic; inv(a)*a; toc
Elapsed time is 4.884 seconds.
octave:2>
[dima@i7 gcc_def]$ OMP_NUM_THREADS=8 LD_PRELOAD=/usr/lib64/libopenblaso.so ./run-octave -q -f
octave:1> a=randn(4000); tic; inv(a)*a; toc
Elapsed time is 6.45016 seconds.
octave:2>
[dima@i7 gcc_def]$ LD_PRELOAD=/usr/lib64/libopenblaso.so ./run-octave -q -f
octave:1> a=randn(4000); tic; inv(a)*a; toc
Elapsed time is 7.12494 seconds.
octave:2> a=randn(4000); tic; inv(a)*a; toc
Elapsed time is 7.05646 seconds.
I could not figure out how openblas is compiled for the win version of octave, it could be that it is just a single-threaded library.
Sincerely,
Dmitri.
--