[LAD] vectorization

Jens M Andreasen jens.andreasen at comhem.se
Mon May 5 16:00:30 UTC 2008


On Mon, 2008-05-05 at 16:07 +0200, Christian Schoenebeck wrote:

> Uhm, stupid question: already tried if GCC's special "complex" attribute type 
> leads to a better result with auto vectorization? At least that could give 
> the optimizer a better chance.
> 

No I did not, but thats an idea. This certainly looks nice and clean:

  _Complex float cxA[N], cxB[N], cxD[N];

  for (i = 0;i < N; ++i)
     cxD[i] += cxA[i] * cxB[i];

Comparison to the other two versions with gcc -O3 -msse
-ftree-vectorize, suggests a slight advantage over the original
(non-vectorized) two dimensional array:

> clock: 13920 ms (_Complex)
> clock:  7040 ms (cvec_t)
> clock: 14470 ms (original array of complex)


With icc -O3 -msse the difference is even more pronounced:

> clock:  3850 ms (_Complex)
> clock:  1410 ms (cvec_t)
> clock: 13290 ms (original array of complex)

Moving from 'gcc 4.2.2' to '4.3 20070713 (experimental)' is very
disappointing:

> clock: 46180 (_Complex)                 <-- we have found a looser!
> clock:  7030 (cvec_t)
> clock: 14340 (original array of complex)

/j

> CU
> Christian
> _______________________________________________
> Linux-audio-dev mailing list
> Linux-audio-dev at lists.linuxaudio.org
> http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
-- 




More information about the Linux-audio-dev mailing list