Short resume of my initial post: i found that using the gcc vector
extensions induced a 2x slow down using gcc, and a 4x speed up in clang.
I made more tests, isolating a small code example, on Mac OS and Ubuntu,
and i found out the origin of the problem, even if i do not know what
exactly happening.
My original test used vectors of float of size 8; the gcc vector
extension documentation says that if the vector size do not match the
hardware vector size, the code is synthesized in some way.
With a vector size of 8 i found the above results under Mac OS X, using
clang and gcc4.2, and under Ubuntu 11.04, using clang and gcc4.5.2.
When i move to a vector size of 4, things go better; clang slow down wrt
the size of 8 of around 2x, and gcc obtains the same result; the
interesting point is that gcc obtains essentially the same speed with
and without vector extensions, meaning probably that the compiler is
good enough in vectorizing the code, at least in the the test cases i used.
I include the code, results and scripts to run the tests in a small zip
if anybody want to make other tests; the test code compute an arbitrary
vector computation (essentially 100 million multiply add), starting from
a seed given as argument.
The code is modeled around the way jmax compute, i.e. one vector
operation at a time on vectors passed by pointers, and it is not
designed to be the fastest possible code to implements this computation.
Thanks for the help,
Maurizio
Attachments:
- tests.zip
(application/x-zip-compressed — 8.2 KB)