jussi at sonarnerd.net
Wed May 7 06:17:29 UTC 2008
Jens M Andreasen wrote:
> PS: Your fastest calculation is when the data floods the cache:
> N=(1024*1024), n=1000, gcc, clock: 8410 ms (_Complex). Is that a typo?
Nope, that's the actual result, I just verified the settings, recompiled
and re-run, and it's still:
> clock: 8390 ms (_Complex)
> clock: 9310 ms (cvec_t)
> clock: 8480 ms (original float array[N])
> clock: 10550 ms (asm on float array)
Fast memory bus + prefetch is a really good thing...
I also have vectorized float array copy and it's significantly faster
than memcpy(). While memcpy() stays under 1 GB/s, vectorized version can
reach around 90% of the theoretical memory speed for large copies.
More information about the Linux-audio-dev