On Mon, Jul 25, 2011 at 5:04 AM, Maurizio De Cecco <jmax(a)dececco.name> wrote:
Short resume of my initial post: i found that using
the gcc vector
extensions induced a 2x slow down using gcc, and a 4x speed up in clang.
[snip]
I include the code, results and scripts to run the tests in a small zip
if anybody want to make other tests; the test code compute an arbitrary
vector computation (essentially 100 million multiply add), starting from a
seed given as argument.
I'm getting SIMD instructions when I compile. However, you have two
things slowing you down:
- The calculations for the for(;;) loop is slowing you down with
every iteration.
- You're only using one xmm register, so you're getting some memory slowdowns.
Both of these can be solved by having gcc unroll your loops for you
(recompile with -funroll-loops).
In addition, you're handling 3 buffers at a time. bufc[k] = bufa[k] *
bufb[k]. You might be able to speed it up a little by converting the
code to:
memcopy(bufc, bufa, N*sizeof(float));
for(k=0; k<N ; ++k) bufc[k] *= bufb[k];
This way you are only handling 2 buffers at a time (which an x86 CPU
generally does better with). But YMMV on this piece of advice.
-gabriel