[LAD] vectorization

Jussi Laako jussi at sonarnerd.net
Wed May 7 06:17:29 UTC 2008


Jens M Andreasen wrote:
> PS: Your fastest calculation is when the data floods the cache:
> N=(1024*1024), n=1000, gcc, clock: 8410 ms (_Complex). Is that a typo?

Nope, that's the actual result, I just verified the settings, recompiled 
and re-run, and it's still:
 > clock: 8390 ms (_Complex)
 > clock: 9310 ms (cvec_t)
 > clock: 8480 ms (original float array[N][2])
 > clock: 10550 ms (asm on float array)

Fast memory bus + prefetch is a really good thing...

I also have vectorized float array copy and it's significantly faster 
than memcpy(). While memcpy() stays under 1 GB/s, vectorized version can 
reach around 90% of the theoretical memory speed for large copies.


	- Jussi



More information about the Linux-audio-dev mailing list