[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
FFT slowup in 2.1.55
From: |
John W. Eaton |
Subject: |
FFT slowup in 2.1.55 |
Date: |
Fri, 27 Feb 2004 12:02:49 -0600 |
On 27-Feb-2004, David Bateman <address@hidden> wrote:
| Ok, due to the alignment issues in FFTW 3.x I've found I can get
| into the case where I'm calling exactly the same FFT, but the
| alignment changes each time I call the function. The planner in
| oct-fftw.cc then recreates the plan each time, which is slow.
| For small FFT's [I've tested fft(randn(64,45))], this results in
| a significant slowup.
|
| I've tried two lines of attack to fix this problem. The first is to
| try and force the 16-byte alignment in "class Array<T>" using
|
| #ifdef HAVE_ATTRIB_ALIGN
| typedef T alignedT __attribute__ ((aligned(16)));
| #else
| typedef T alignedT;
| #endif
|
| where HAVE_ATTRIB_ALIGN is a configure time option. I then replace the
| "new T" in ArrayRep with "new alignedT". This appears to only give me
| 8-byte alignment, but I've read in the gcc manual that ld might not be
| able to do better than 8-byte alignment, so it is not clear to me if
|
| 1) The code I've done is correct, but the linker won't give me 16-byte
| alignment, or
| 2) I've missing somewhere which also needs 16-byte alignment.
|
| Does anyone have any ideas on this?
Doesn't the __attribute__ ((aligned(16)) qualifier apply to variables,
not types? It seems to me that attaching it to a typedef would have
no effect.
In any case, the variable you want to align in the Array classes
doesn't actually exist in the class. We only have a pointer to it,
and you don't really want the pointer to be aligned on a 16-bit
boundary, you want the object that it points to to be aligned. So I
think you have to write an allocator that will do that for you. So
instead of writing
explicit ArrayRep (int n) :
data (new alignedT [n]), len (n), count (1) { }
I think it might work to write
explicit ArrayRep (int n) :
data (make_aligned_16_double_array (n)), len (n), count (1) { }
where "make_aligned_16_double_array" is a function that returns a
pointer to double which has the elements of the allocated array
aligned on a 16-byte boundary. To do that, you will probably need to
allocate a char array of the appropriate size (might require up to 15
bytes of padding) then return a pointer to one of the first 15
elements with a cast to make it a pointer to double instead of a
pointer to char.
But the solution above won't quite work because ArrayRep is a template
class. So we need to do this only for some values of the template
type paramater T (double and maybe Complex). Some method of
specializing these functions is needed. Here is a simplified example
that I think should work (I leave it up to you to determine how to
compute the necessary offset so that the data pointed to by the pointer
returned from the make_aligned_16_double_array function is actually on
a 16-byte boundary).
#include <iostream>
template <class T>
class
Array
{
public:
class ArrayRep
{
public:
T *data;
int len;
explicit ArrayRep (int n)
: data (new T [n]), len (n) { std::cerr << "generic" << std::endl; }
};
ArrayRep *rep;
explicit Array (int n)
: rep (new typename Array<T>::ArrayRep (n)) { }
};
double *
make_aligned_16_double_array (int n)
{
// Do something magic to find the required offset (should be in the
// range [0, 15].
int offset = 0;
char *buf = new char [n * sizeof (double) + offset];
return reinterpret_cast<double *> (&buf[offset]);
}
template <>
Array<double>::ArrayRep::ArrayRep (int n)
: data (make_aligned_16_double_array (n)), len (n)
{
std::cerr << "double" << std::endl;
}
int
main (void)
{
Array<int> int_ra (2);
Array<double> double_ra (2);
}
Or am I missing a simpler solution?
jwe