On Fri, 2004-07-02 at 00:40, Erik de Castro Lopo wrote:
Eric what do
you think ? can something like that be coded efficiently
using SSE/SSE2 ?
Probably not. There are some algorithms which simply can't be vectorized.
SSE2 is usually significantly faster for non-vectorized code also. At
least for P4 and AMD64. I usually do some profiling on code generated by
the compiler and then handcode the SSE2 parts for compiler bottlenecks.
IIR filter was one good example where compilers sucked badly.
--
Jussi Laako <jussi.laako(a)pp.inet.fi>