On Mon, 2008-05-05 at 16:07 +0200, Christian Schoenebeck wrote:
Uhm, stupid question: already tried if GCC's
special "complex" attribute type
leads to a better result with auto vectorization? At least that could give
the optimizer a better chance.
No I did not, but thats an idea. This certainly looks nice and clean:
_Complex float cxA[N], cxB[N], cxD[N];
for (i = 0;i < N; ++i)
cxD[i] += cxA[i] * cxB[i];
Comparison to the other two versions with gcc -O3 -msse
-ftree-vectorize, suggests a slight advantage over the original
(non-vectorized) two dimensional array:
clock: 13920 ms (_Complex)
clock: 7040 ms (cvec_t)
clock: 14470 ms (original array of complex)
With icc -O3 -msse the difference is even more pronounced:
clock: 3850 ms (_Complex)
clock: 1410 ms (cvec_t)
clock: 13290 ms (original array of complex)
Moving from 'gcc 4.2.2' to '4.3 20070713 (experimental)' is very
disappointing:
clock: 46180 (_Complex) <-- we have
found a looser!
clock: 7030 (cvec_t)
clock: 14340 (original array of complex)
/j
CU
Christian
_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev(a)lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev --