On Wed, 2005-11-16 at 00:05 +0100, Florian Schmidt wrote:
I think libDSP does prefetch and cache alignment,
SIMD, yadayada :)
I don't know though to which degree each one of the functions is
optimized. Best to ask Jussi himself (CC'ed) :)
Most of the time prefetch is left to compiler (works ok most of the
time), though it's done manually for some functions.
For x86 there is handwritten SIMD (E3DNow! and SSE2/SSE3) version of
these operations, automatically used depending on runtime architecture:
- Copy
- Add
- Multiply
- Complex multiply
- Multiply-add
- Complex multiply-add
- Add-multiply
- Multiply-accumulate
- Min-max
- Normalized cross-correlation
- i16 -> float, i32 -> float conversion
- FIR filter
- IIR filter
Optimized functions are also used as part of more complex functions. All
allocations are aligned to required boundary. There are C, C++ and C++
template APIs. Btw. Intel's compiler can vectorize most of the remaining
functions and for some even parallelize.
Currently the lib is missing autotools stuff, so it's
makefile-configured...
--
Jussi Laako <jussi.laako(a)pp.inet.fi>