octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FYI: optimizing certain matrix arithmetic


From: Michael Creel
Subject: Re: FYI: optimizing certain matrix arithmetic
Date: Tue, 29 Sep 2009 15:13:37 +0200

On Tue, Sep 29, 2009 at 2:15 PM, Jaroslav Hajek <address@hidden> wrote:
> On Tue, Sep 29, 2009 at 2:08 PM, Jaroslav Hajek <address@hidden> wrote:
>> On Tue, Sep 29, 2009 at 1:54 PM, Michael Creel <address@hidden> wrote:
>>> On Tue, Sep 29, 2009 at 1:44 PM, Jaroslav Hajek <address@hidden> wrote:
>>>> On Tue, Sep 29, 2009 at 1:34 PM, Michael Creel <address@hidden> wrote:
>>>>> On Tue, Sep 29, 2009 at 12:47 PM, Jaroslav Hajek <address@hidden> wrote:
>>>>>> On Tue, Sep 29, 2009 at 12:26 PM, Michael Creel <address@hidden> wrote:
>>>>>>> On Tue, Sep 29, 2009 at 10:44 AM, Jaroslav Hajek <address@hidden> wrote:
>>>>>>>> On Tue, Sep 29, 2009 at 10:38 AM, Michael Creel <address@hidden> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> On an Apple Macbook Pro running Ubuntu Jaunty amd64, using the 
>>>>>>>>> benchmark
>>>>>>>>>
>>>>>>>>> %%%%%%%%%%%%%%%%%%%%%%
>>>>>>>>> n = 500;
>>>>>>>>> R = triu (rand (n));
>>>>>>>>> u = rand (n, 1);
>>>>>>>>>
>>>>>>>>> tic; for i = 1:1000; R \ u; endfor; toc
>>>>>>>>> tic; for i = 1:1000; u' / R; endfor; toc
>>>>>>>>> tic; for i = 1:1000; R' \ u; endfor; toc
>>>>>>>>>
>>>>>>>>> R = tril (rand (n));
>>>>>>>>> u = rand (n, 1);
>>>>>>>>>
>>>>>>>>> tic; for i = 1:1000; R \ u; endfor; toc
>>>>>>>>> tic; for i = 1:1000; u' / R; endfor; toc
>>>>>>>>> tic; for i = 1:1000; R' \ u; endfor; toc
>>>>>>>>>
>>>>>>>>> u = u + I*rand (n, 1);
>>>>>>>>> tic; for i = 1:1000; R \ u; endfor; toc
>>>>>>>>> tic; for i = 1:1000; R' \ u; endfor; toc
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> n = 800;
>>>>>>>>> a = rand (n);
>>>>>>>>> b = rand (n) + i*rand (n);
>>>>>>>>> tic; a * b; toc
>>>>>>>>> tic; b * a; toc
>>>>>>>>> tic; a' * b; toc
>>>>>>>>> tic; b * a'; toc
>>>>>>>>> tic; a \ b; toc
>>>>>>>>> tic; b / a; toc
>>>>>>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>>>>>>
>>>>>>>>> Octave3.0.1 that comes with Ubuntu Jaunty amd64, I get
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> octave:4> bench
>>>>>>>>> Elapsed time is 0.20216 seconds.
>>>>>>>>> Elapsed time is 1.93894 seconds.
>>>>>>>>> Elapsed time is 2.33824 seconds.
>>>>>>>>> Elapsed time is 0.188448 seconds.
>>>>>>>>> Elapsed time is 1.95657 seconds.
>>>>>>>>> Elapsed time is 2.43552 seconds.
>>>>>>>>> Elapsed time is 4.08299 seconds.
>>>>>>>>> Elapsed time is 7.84752 seconds.
>>>>>>>>> Elapsed time is 0.213021 seconds.
>>>>>>>>> Elapsed time is 0.21117 seconds.
>>>>>>>>> Elapsed time is 0.218387 seconds.
>>>>>>>>> Elapsed time is 0.217174 seconds.
>>>>>>>>> Elapsed time is 0.452714 seconds.
>>>>>>>>> Elapsed time is 0.391383 seconds.
>>>>>>>>> octave:5>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Matlab 2008b gives
>>>>>>>>>>> bench
>>>>>>>>> Elapsed time is 0.289161 seconds.
>>>>>>>>> Elapsed time is 0.566446 seconds.
>>>>>>>>> Elapsed time is 0.562623 seconds.
>>>>>>>>> Elapsed time is 0.253456 seconds.
>>>>>>>>> Elapsed time is 0.574304 seconds.
>>>>>>>>> Elapsed time is 0.570281 seconds.
>>>>>>>>> Elapsed time is 0.253070 seconds.
>>>>>>>>> Elapsed time is 0.572601 seconds.
>>>>>>>>> Elapsed time is 0.102086 seconds.
>>>>>>>>> Elapsed time is 0.102677 seconds.
>>>>>>>>> Elapsed time is 0.103080 seconds.
>>>>>>>>> Elapsed time is 0.103759 seconds.
>>>>>>>>> Elapsed time is 0.165608 seconds.
>>>>>>>>> Elapsed time is 0.181704 seconds.
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Octave 3.2.3+ from today, self compiled, gives
>>>>>>>>> octave:1> bench
>>>>>>>>> Elapsed time is 0.208794 seconds.
>>>>>>>>> Elapsed time is 0.189178 seconds.
>>>>>>>>> Elapsed time is 0.186724 seconds.
>>>>>>>>> Elapsed time is 0.188649 seconds.
>>>>>>>>> Elapsed time is 0.192915 seconds.
>>>>>>>>> Elapsed time is 0.19166 seconds.
>>>>>>>>> Elapsed time is 0.186277 seconds.
>>>>>>>>> Elapsed time is 0.19102 seconds.
>>>>>>>>> Elapsed time is 0.212707 seconds.
>>>>>>>>> Elapsed time is 0.211013 seconds.
>>>>>>>>> Elapsed time is 0.210491 seconds.
>>>>>>>>> Elapsed time is 0.210447 seconds.
>>>>>>>>> Elapsed time is 0.431791 seconds.
>>>>>>>>> Elapsed time is 0.367412 seconds.
>>>>>>>>> octave:2>
>>>>>>>>>
>>>>>>>>> Congratulations!
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>
>>>>>>>> It's interesting you didn't get any speed-up in the second part of the
>>>>>>>> benchmark, compared to 3.0.1...
>>>>>>>> What BLAS and LAPACK are you using? What's your compiler configuration?
>>>>>>>> Also, what exactly is your tip? The "3.2.3+" is a bit unclear, did you
>>>>>>>> mean "3.3.50+", i.e. the development version?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>> --
>>>>>>>> RNDr. Jaroslav Hajek
>>>>>>>> computing expert & GNU Octave developer
>>>>>>>> Aeronautical Research and Test Institute (VZLU)
>>>>>>>> Prague, Czech Republic
>>>>>>>> url: www.highegg.matfyz.cz
>>>>>>>>
>>>>>>>
>>>>>>> Oops, sorry, it's 3.3.50+, updated this morning.
>>>>>>>
>>>>>>> I make using
>>>>>>> make -j2 CFLAGS="-O3 -march=native -funroll-loops" FFLAGS="-O3
>>>>>>> -march=native -funroll-loops" XTRA_CFLAGS="-O3 -march=native
>>>>>>> -funroll-loops" XTRA_CXXFLAGS="-O3 -march=native -funroll-loops"
>>>>>>>
>>>>>>
>>>>>> In general, if you're with a newer gcc on a 64-bit architecture, I
>>>>>> advise you against -funroll-loops. For me, it usually gets some +1% of
>>>>>> additional speed of some operations, at the cost of increasing the
>>>>>> binaries' size by more than 50%. Seems like a bad tradeoff.
>>>>>>
>>>>>>> ./configure reports
>>>>>>>  BLAS libraries:       -llapack -lcblas -lf77blas -latlas
>>>>>>>
>>>>>>> so I assume that Octave is using Atlas (the atlas dev package that
>>>>>>> comes with Kubuntu Jaunty amd64).
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>
>>>>>> Apparently, yes. Hmm. It's really weird you got almost exactly the same 
>>>>>> figures.
>>>>>> If you apply the attached patch, rebuild and re-run the benchmark,
>>>>>> what do you get?
>>>>>>
>>>>>> --
>>>>>> RNDr. Jaroslav Hajek
>>>>>> computing expert & GNU Octave developer
>>>>>> Aeronautical Research and Test Institute (VZLU)
>>>>>> Prague, Czech Republic
>>>>>> url: www.highegg.matfyz.cz
>>>>>>
>>>>>
>>>>> With that patch applied, I get
>>>>> octave:1> bench
>>>>> Elapsed time is 0.194493 seconds.
>>>>> Elapsed time is 0.192309 seconds.
>>>>> Elapsed time is 0.189026 seconds.
>>>>> Elapsed time is 0.188679 seconds.
>>>>> Elapsed time is 0.195958 seconds.
>>>>> Elapsed time is 0.193521 seconds.
>>>>> Elapsed time is 0.187596 seconds.
>>>>> Elapsed time is 0.193254 seconds.
>>>>> Elapsed time is 0.215135 seconds.
>>>>> Elapsed time is 0.213705 seconds.
>>>>> Elapsed time is 0.21341 seconds.
>>>>> Elapsed time is 0.212501 seconds.
>>>>> Elapsed time is 0.363992 seconds.
>>>>> Elapsed time is 0.368094 seconds.
>>>>>
>>>>> so there is an improvement in the second to last number.
>>>>>
>>>>> Cheers, M.
>>>>>
>>>>
>>>> OK, it's funny. I now understand where the problem is. Just change the line
>>>>
>>>> b = rand (n) + i*rand (n);
>>>>
>>>> to
>>>>
>>>> b = rand (n) + I*rand (n);
>>>>
>>>> (note the big I). At this point, i is still defined from the previous
>>>> loops as a real numeric value (!)
>>>> And run the benchmarks again. I think this affects Matlab, too.
>>>> In any case, it is apparent that your Matlab is linked to something
>>>> faster than ATLAS; probably Intel's MKL.
>>>>
>>>> regards
>>>>
>>>> --
>>>> RNDr. Jaroslav Hajek
>>>> computing expert & GNU Octave developer
>>>> Aeronautical Research and Test Institute (VZLU)
>>>> Prague, Czech Republic
>>>> url: www.highegg.matfyz.cz
>>>>
>>>
>>> That's one of those bugs that causes rockets to go off course, I
>>> guess!
>>
>> Yes, definitely. Maybe we could do something about it...
>>
>>> OK, it makes sense now. Matlab has been available here for a
>>> while, but I haven't used it much. I don't know the details of what
>>> libraries it uses - it's v2009b.
>>
>> You can usually tell by inspecting the binaries using ldd. But it
>> doesn't matter much.
>> I will be glad if you post the results of the corrected benchmark.
>>
>
> Btw., I guess that you have two cores, right? If so, it is likely that
> your Matlab uses both of them. You can achieve a similar effect by
> linking Octave to a multi-threaded ATLAS.
>
> --
> RNDr. Jaroslav Hajek
> computing expert & GNU Octave developer
> Aeronautical Research and Test Institute (VZLU)
> Prague, Czech Republic
> url: www.highegg.matfyz.cz
>



The corrected benchmark, which works for both Octave and Matlab,
(uses 'j'  for complex unit) is

%%%%%%%%%%%%%%%%%%
n = 500;
R = triu (rand (n));
u = rand (n, 1);

tic; for i = 1:1000; R \ u; end; toc
tic; for i = 1:1000; u' / R; end; toc
tic; for i = 1:1000; R' \ u; end; toc

R = tril (rand (n));
u = rand (n, 1);

tic; for i = 1:1000; R \ u; end; toc
tic; for i = 1:1000; u' / R; end; toc
tic; for i = 1:1000; R' \ u; end; toc

u = u + j*rand (n, 1);
tic; for i = 1:1000; R \ u; end; toc
tic; for i = 1:1000; R' \ u; end; toc


n = 800;
a = rand (n);
b = rand (n) + j*rand (n);
tic; a * b; toc
tic; b * a; toc
tic; a' * b; toc
tic; b * a'; toc
tic; a \ b; toc
tic; b / a; toc
%%%%%%%%%%%%%%%%%%%%%


Matlab gives
>> bench
Elapsed time is 0.321079 seconds.
Elapsed time is 0.617195 seconds.
Elapsed time is 0.576792 seconds.
Elapsed time is 0.253461 seconds.
Elapsed time is 0.599158 seconds.
Elapsed time is 0.658734 seconds.
Elapsed time is 0.314414 seconds.
Elapsed time is 0.703873 seconds.
Elapsed time is 0.206035 seconds.
Elapsed time is 0.206898 seconds.
Elapsed time is 0.207248 seconds.
Elapsed time is 0.208293 seconds.
Elapsed time is 0.274126 seconds.
Elapsed time is 0.344650 seconds.


Octave 3.3.50+ (with the patch you posted) gives
octave:1> bench
Elapsed time is 0.202895 seconds.
Elapsed time is 0.195078 seconds.
Elapsed time is 0.193112 seconds.
Elapsed time is 0.18874 seconds.
Elapsed time is 0.196512 seconds.
Elapsed time is 0.193809 seconds.
Elapsed time is 0.290697 seconds.
Elapsed time is 0.297939 seconds.
real * complex: split
Elapsed time is 0.624116 seconds.
complex * real: split
Elapsed time is 0.451244 seconds.
Elapsed time is 0.450199 seconds.
Elapsed time is 0.452107 seconds.
Elapsed time is 0.65004 seconds.
Elapsed time is 0.694705 seconds.
octave:2>


As far as I can tell, matlab is using only 1 core. Perhaps it would
use both cores if it had to work with larger matrices.

Michael



reply via email to

[Prev in Thread] Current Thread [Next in Thread]