[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: More efficient MEX or MEX-like interface
From: |
Jaroslav Hajek |
Subject: |
Re: More efficient MEX or MEX-like interface |
Date: |
Tue, 30 Sep 2008 19:54:09 +0200 |
On Tue, Sep 30, 2008 at 4:39 PM, Fredrik Lingvall <address@hidden> wrote:
> David Bateman wrote:
>>
>>>
>>
>> This is exactly my point... If matlab had to reform a C99 complex matrix
>> and then call zGEMM rather than use four calls to dGEMM and do the additions
>> then it would be slower as the creation and copying of the data to a C99
>> complex array comes at a cost. In fact xGEMM does the operator
>>
>> C = alpha * A * B + beta * C
>>
>> and so you might for example call dGEMM once passing the two imaginary
>> parts as A, B and have beta of zero, then have a second call with that
>> result as C, beta of -1 and A and B being the imaginary parts, thus giving
>> you the real part of the complex matrix multiply with two calls to dGEMM on
>> the real and imaginary parts of the matrix. The same goes for the
>> calculation of the imaginary part of the matrix multiply. The underlying
>> code of zGEMM has to do something similar in any case, it just does the four
>> multiplications element by element instead, so there is no surprise it is
>> much the same speed as what matlab does.
>>
>> The absence of cGEMM from the symbols of the numeric is a pretty good
>> indication that the above is exactly what mathworks does as I see no reason
>> to handle matrix multiplies different between double and single precision
>> values.
>>
> OK. I just guessed that it would be more efficient to call zGEMM once
> (where the real and imag parts are closer in memory) than calling dGEMM four
> times (which should generate more cache misses). This seems not to be the
> case though (at least not for the size of the matrices that I used).
>
The number of FLOPs is the same, so it would only be true if ZGEMM had
significantly better peak ratio, which it normally doesn't.
The Matlab layout is only problematic when you can't split the
operation at the highest level, i.e. for factorizations and
factorization updates. For instance, I guess that Matlab's complex
qrupdate is, for moderate matrix sizes, significantly slower than
Octave's.
I guess the main reason why Mathowrks chose this was the absence of
complex type from C in the past.
> /F
>
>
--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz