octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: More efficient MEX or MEX-like interface


From: David Bateman
Subject: Re: More efficient MEX or MEX-like interface
Date: Tue, 30 Sep 2008 15:08:49 +0200
User-agent: Thunderbird 2.0.0.17 (X11/20080926)

Fredrik Lingvall wrote:
David Bateman wrote:
Fredrik Lingvall wrote:
Does this mean that BLAS/LAPACK expects complex data stored in the same
way as complex data is currently stored in Octave?
Yes, Octave calls lapack/blas functions like [cz]GEMM, for complex
arguments. Greping the matlab libraries with

nm -C --dynamic  libmwnumerics.so | grep [gG][eE][mM][mM]

gives

                U dgemm_
                U sgemm_
                U zgemm_

So it appears that the double precision complex matrix multiple is
linked in but the single precision version isn't. I'd think this means
that in fact ZGEMM is used by the SuiteSparse code (which doesn't have
single precision), and that in fact matlab internally uses four calls
to DGEMM on the real and imaginary parts of the matrix to simulate
ZGEMM as that means that they never explicitly have to form the
complex matrix. Given the speed differences (or lack thereof) for such
operations between Octave and Matlab I believe that pretty much
confirms what they do.
How did you test this?

I did a quick test (Pentium-M laptop):

1) Octave 3.0.2

octave:1> n=2000; B = randn(n,n) + i*randn(n,n);
octave:2> n=2000; A = randn(n,n) + i*randn(n,n);
octave:3> tic, C=A*B;toc
Elapsed time is 37.4009 seconds.
octave:4> tic, C=A*B;toc
Elapsed time is 37.3448 seconds.
octave:5>

2) Matlab 7.3

n=2000; A = randn(n,n) + i*randn(n,n);
n=2000; B = randn(n,n) + i*randn(n,n);
tic, C=A*B;toc
Elapsed time is 36.725418 seconds.
tic, C=A*B;toc
Elapsed time is 36.786914 seconds.

Both Matlab and Octave use Goto BLAS v. 1.19

It looks like there is no speed penalty  for storing real and imag
parts  separated (at least not for the size of the problem that I tested).

/F



This is exactly my point... If matlab had to reform a C99 complex matrix and then call zGEMM rather than use four calls to dGEMM and do the additions then it would be slower as the creation and copying of the data to a C99 complex array comes at a cost. In fact xGEMM does the operator

 C = alpha * A * B + beta * C

and so you might for example call dGEMM once passing the two imaginary parts as A, B and have beta of zero, then have a second call with that result as C, beta of -1 and A and B being the imaginary parts, thus giving you the real part of the complex matrix multiply with two calls to dGEMM on the real and imaginary parts of the matrix. The same goes for the calculation of the imaginary part of the matrix multiply. The underlying code of zGEMM has to do something similar in any case, it just does the four multiplications element by element instead, so there is no surprise it is much the same speed as what matlab does.

The absence of cGEMM from the symbols of the numeric is a pretty good indication that the above is exactly what mathworks does as I see no reason to handle matrix multiplies different between double and single precision values.


D.

--
David Bateman                                address@hidden
Motorola Labs - Paris +33 1 69 35 48 04 (Ph) Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob) 91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax) The information contained in this communication has been classified as: [x] General Business Information [ ] Motorola Internal Use Only [ ] Motorola Confidential Proprietary



reply via email to

[Prev in Thread] Current Thread [Next in Thread]