...
My current interpretation of various benchmarks that Elias
Mårtenson and
myself did some years ago is that the bandwidth of the memory
interface
between the CPUs (or cores) and the memory is the limiting factor,
and no
matter how efficient the APL interpreter is, this bottleneck will
dictate the
speedup that can be achieved.
Makes sense. It is my understanding that CPU's are so much faster than any memory that memory can't even keep up with a single CPU. The only reason we see speed improvements is in small loops that can fit in cache. Long sequences, like a large array, can't even keep up with a single CPU. I guess machine architecture will have to catch up.
Thanks.
Blake