[linux-audio-dev] Traps in floating point code

Erik de Castro Lopo erikd-lad at mega-nerd.com
Thu Jul 1 10:21:30 UTC 2004


On Thu, 1 Jul 2004 10:25:51 +0200
Ruben van Royen <rvroyen at guidedbees.com> wrote:

> Hi all, 
> 
> please note that SSE2 has support for 64bit floats (doubles) and contains an 
> instruction that truncates to int, irregardless of controlwords. A new enough 
> gcc with (-march=pentium4 or -msse2) and -mfpmath=sse will use sse instead of 
> the old fp unit. This has more advantages, since sse math uses normal 
> registers instead of the stack in the old fp unit.

SSE and SSE2 are a huge advantage for some algorithms and nearly useless for
others.

I recently spent a good deal of time trying to implement the inner most 
loop of Secret Rabbit Code in SSE where single precision fp was
sufficient. The best the compiler could do by compiling the existing C
code with -msse -mfpmath=sse was half the speed of the same code compiled
for the standard FPU. I then turned to the <xintrinsics.h> header file
and pretty much hand coded asm. My best effort was still about 20% slower
than the C compiled for the standard FPU.

The problem with SRC is that I am calculating coeffients for the filter
on the fly by looking up a large table and interpolating between 
coefficients. There is simply no way to vectorize it.

However, I'm working on another project where I expect to obtain close
to the full 4 times speed improvement because the algorithm fits the
SSE 4 samples-at-a-time processing model very, very well.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo  nospam at mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
Spammer:     Any of you guys looking for a permanent position in Scotland?
Kaz Kylheku: No, I'm looking for a thug in Scotland who might be interested
             in beating up off-topic Usenet spammers, on a pro bono basis.



More information about the Linux-audio-dev mailing list