|
From: | David Bateman |
Subject: | Re: More efficient MEX or MEX-like interface |
Date: | Tue, 30 Sep 2008 15:08:49 +0200 |
User-agent: | Thunderbird 2.0.0.17 (X11/20080926) |
Fredrik Lingvall wrote:
This is exactly my point... If matlab had to reform a C99 complex matrix and then call zGEMM rather than use four calls to dGEMM and do the additions then it would be slower as the creation and copying of the data to a C99 complex array comes at a cost. In fact xGEMM does the operatorDavid Bateman wrote:Fredrik Lingvall wrote:Does this mean that BLAS/LAPACK expects complex data stored in the same way as complex data is currently stored in Octave?Yes, Octave calls lapack/blas functions like [cz]GEMM, for complex arguments. Greping the matlab libraries with nm -C --dynamic libmwnumerics.so | grep [gG][eE][mM][mM] gives U dgemm_ U sgemm_ U zgemm_ So it appears that the double precision complex matrix multiple is linked in but the single precision version isn't. I'd think this means that in fact ZGEMM is used by the SuiteSparse code (which doesn't have single precision), and that in fact matlab internally uses four calls to DGEMM on the real and imaginary parts of the matrix to simulate ZGEMM as that means that they never explicitly have to form the complex matrix. Given the speed differences (or lack thereof) for such operations between Octave and Matlab I believe that pretty much confirms what they do.How did you test this? I did a quick test (Pentium-M laptop): 1) Octave 3.0.2 octave:1> n=2000; B = randn(n,n) + i*randn(n,n); octave:2> n=2000; A = randn(n,n) + i*randn(n,n); octave:3> tic, C=A*B;toc Elapsed time is 37.4009 seconds. octave:4> tic, C=A*B;toc Elapsed time is 37.3448 seconds. octave:5> 2) Matlab 7.3n=2000; A = randn(n,n) + i*randn(n,n); n=2000; B = randn(n,n) + i*randn(n,n); tic, C=A*B;tocElapsed time is 36.725418 seconds.tic, C=A*B;tocElapsed time is 36.786914 seconds. Both Matlab and Octave use Goto BLAS v. 1.19 It looks like there is no speed penalty for storing real and imag parts separated (at least not for the size of the problem that I tested). /F
C = alpha * A * B + beta * Cand so you might for example call dGEMM once passing the two imaginary parts as A, B and have beta of zero, then have a second call with that result as C, beta of -1 and A and B being the imaginary parts, thus giving you the real part of the complex matrix multiply with two calls to dGEMM on the real and imaginary parts of the matrix. The same goes for the calculation of the imaginary part of the matrix multiply. The underlying code of zGEMM has to do something similar in any case, it just does the four multiplications element by element instead, so there is no surprise it is much the same speed as what matlab does.
The absence of cGEMM from the symbols of the numeric is a pretty good indication that the above is exactly what mathworks does as I see no reason to handle matrix multiplies different between double and single precision values.
D. -- David Bateman address@hiddenMotorola Labs - Paris +33 1 69 35 48 04 (Ph) Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob) 91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax) The information contained in this communication has been classified as: [x] General Business Information [ ] Motorola Internal Use Only [ ] Motorola Confidential Proprietary
[Prev in Thread] | Current Thread | [Next in Thread] |