Yeah, I'm respawning this topic ...
because I was curious how the GCC vector extension situation changed meanwhile
and wrote a small benchmark for mixing signals with and without gain. You can
find it here:
http://download.linuxsampler.org/dev/mixdown.tar.gz
I compared a pure C++ implementation vs. the hand crafted SSE assembly code
(by Sampo Savolainen, Ardour) and of course an implementation utilizing GCC's
vector extensions. On my very weak, but environment friendly ;-) VIA box the
GCC vector implementation outperforms the other two solutions (using GCC
4.2.3 BTW):
Benchmarking mixdown (no coeff):
pure C++ : 670 ms
ASM SSE : 200 ms
GCC vector extensions : 180 ms
Benchmarking mixdown (WITH coeff):
pure C++ : 890 ms
ASM SSE : 300 ms
GCC vector extensions : 230 ms
And this time, the signal output of the vector implementation is even
correct. ;-) BUT, and this is important, you have to supply the correct
C(XX)FLAGS. For the upper result I used:
CXXFLAGS="-O3 -march=i686 -mmmx -msse -ffast-math -funroll-loops -fomit-frame-pointer
-fpermissive -mfpmath=sse"
If you don't advice GCC to emit SSE code, the vector implementation gets
horrible bad performance wise and even a lot worse than the pure C++
solution. Here's my result with just CXXFLAGS="-O3":
Benchmarking mixdown (no coeff):
pure C++ : 680 ms
ASM SSE : 200 ms
GCC vector extensions : 1970 ms
Benchmarking mixdown (WITH coeff):
pure C++ : 1280 ms
ASM SSE : 310 ms
GCC vector extensions : 2540 ms
So I would say it's finally time to put hands on GCC's vector toys, wasting
less time on hairy assembly tasks. I think for such simple algorithms like
mixing it's completely sufficient to keep a pure C++ implementation and a GCC
vector extension implementation side by side and just automatically determine
by a small benchmark in the configure script (or whatever) which one of the
two solutions to pick for compilation, dependent on what the user supplied as
CXXFLAGS. At least that's what I'm going to do ... I think ...
CU
Christian