[Tim Blechmann]
> Similar approach works fine here, too (flipping
the sign of the
> addition constant at every sample or every block, depending on the
> algorithm in question).
>
> Inaudible (in fact barely measurable), code is branchless and simple.
> Perfect solution for a stupid little problem.
the denormal handling on the sse unit is far better
than on the fpu ...
it's also possible to enable hardware FTZ/DAZ ...
better a hardware solution than a software solution ;-)
Hard to disagree :)
However, I find SSE only useful from v2 on, not that much faster than
an AMD FPU, and a pain to code with gcc. It's limited to x86, so
you'll need portable C code to go with it. Likewise, h/w FTZ for the
FPU is nice, but not available anywhere.
In the end, I'll rather write and optimize quick and portable C/C++
than endure the coding, compilation and maintenance inferno that is
SIMD.
Cheers, Tim