[linux-audio-dev] Traps in floating point code

Benno Senoner sbenno at gardena.net
Thu Jul 1 20:38:03 UTC 2004


Ruben van Royen wrote:

>First of all, I was not yet talking about vectorizing your code which is often 
>hard, especially for a compiler. but SSE can be used on scalars as well (as 
>you probably know).
>The fact is that the intel pentium 4 optimization guide says that SSE code is 
>generally as fast as or faster than regular FP code. And especially  the 
>truncation to integer is faster. Also denormals (which started all of this) 
>can be handled faster by sse math by turning on a mode flag that makes input 
>denormals behave as zero's This is of course not IEEE compliant, but exactly 
>what you were doing in your code.
>  
>

I agree, in theory slowdowns should not occur but what I found strange 
is that even Intel's own compiler, icc produced bad performance
when compiling the resampling code with SSE/SSE2 math and vectorization on.
If the compiler was smart then it would not have used SSE/SSE2 in that 
section of code but apparently icc is still not good in spotting
those problems.
The problem for a C programmer is that since he is assuming that the 
compiler does a good job in optimizing, most will not easily be able
to figure out why the SSE optimizations slowed down certain routines.
Then there the dilemma might occur where 50% of CPU is spent in 
function1() and 50% in function2().
but if you activate SSE then function1() speeds up 40% while function2() 
slows down 30%.
If it was possible to tell the compiler to not use SSE in function2() 
then the app would benefit from SSE but
in the above case it would not.
Usually optimal C code can only be generated if the programmer knows the 
CPU well and the compiler too, but often
this requires long painful trial and error sessions, analysis of asm 
code generated by the compiler etc.
Ok there are profilers available but they don't automagically solve all 
the optimization problems.

cheers,
Benno
http://www.linuxsampler.org

>The reasons for SSE code being slower than FP code could be:
>	The addition is pipelined in the FP, but not in the SSE unit.
>	Incorrect allignment might incur a higher penalty for SSE.
>
>Ruben 
>
>  
>




More information about the Linux-audio-dev mailing list