Another changeset uploaded; the symmetric cases A'*A, A*A', A.'*A and
A*A.' are now mapped to xSYRK and ZHERK. With these, the benchmark
becomes more impressive:
n = 50; m = 505000; a = rand(m,n); tic; c = a'*a; toc; clear
with current Octave:
Elapsed time is 4.24332 seconds.
with the new changeset:
Elapsed time is 0.916971 seconds.
i.e. a 462% speed-up (this is, of course, caused by the fact that
DSYRK not only avoids transposing and operates more cache coherently,
just like DGEMM('T','N',...), but also calculates only half of the
matrix)