Ruben van Royen wrote:
First of all, I was not yet talking about vectorizing
your code which is often
hard, especially for a compiler. but SSE can be used on scalars as well (as
you probably know).
The fact is that the intel pentium 4 optimization guide says that SSE code is
generally as fast as or faster than regular FP code. And especially the
truncation to integer is faster. Also denormals (which started all of this)
can be handled faster by sse math by turning on a mode flag that makes input
denormals behave as zero's This is of course not IEEE compliant, but exactly
what you were doing in your code.
I agree, in theory slowdowns should not occur but what I found strange
is that even Intel's own compiler, icc produced bad performance
when compiling the resampling code with SSE/SSE2 math and vectorization on.
If the compiler was smart then it would not have used SSE/SSE2 in that
section of code but apparently icc is still not good in spotting
those problems.
The problem for a C programmer is that since he is assuming that the
compiler does a good job in optimizing, most will not easily be able
to figure out why the SSE optimizations slowed down certain routines.
Then there the dilemma might occur where 50% of CPU is spent in
function1() and 50% in function2().
but if you activate SSE then function1() speeds up 40% while function2()
slows down 30%.
If it was possible to tell the compiler to not use SSE in function2()
then the app would benefit from SSE but
in the above case it would not.
Usually optimal C code can only be generated if the programmer knows the
CPU well and the compiler too, but often
this requires long painful trial and error sessions, analysis of asm
code generated by the compiler etc.
Ok there are profilers available but they don't automagically solve all
the optimization problems.
cheers,
Benno
http://www.linuxsampler.org
The reasons for SSE code being slower than FP code
could be:
The addition is pipelined in the FP, but not in the SSE unit.
Incorrect allignment might incur a higher penalty for SSE.
Ruben