[LAD] vectorization

Christian Schoenebeck cuse at users.sourceforge.net
Tue Apr 15 17:45:43 UTC 2008


Yeah, I'm respawning this topic ...

because I was curious how the GCC vector extension situation changed meanwhile 
and wrote a small benchmark for mixing signals with and without gain. You can 
find it here:

	http://download.linuxsampler.org/dev/mixdown.tar.gz

I compared a pure C++ implementation vs. the hand crafted SSE assembly code 
(by Sampo Savolainen, Ardour) and of course an implementation utilizing GCC's 
vector extensions. On my very weak, but environment friendly ;-) VIA box the 
GCC vector implementation outperforms the other two solutions (using GCC 
4.2.3 BTW):

Benchmarking mixdown (no coeff):
pure C++                : 670 ms
ASM SSE                 : 200 ms
GCC vector extensions   : 180 ms

Benchmarking mixdown (WITH coeff):
pure C++                : 890 ms
ASM SSE                 : 300 ms
GCC vector extensions   : 230 ms

And this time, the signal output of the vector implementation is even 
correct. ;-) BUT, and this is important, you have to supply the correct 
C(XX)FLAGS. For the upper result I used:

CXXFLAGS="-O3 -march=i686 -mmmx -msse -ffast-math -funroll-loops -fomit-frame-pointer -fpermissive -mfpmath=sse"

If you don't advice GCC to emit SSE code, the vector implementation gets 
horrible bad performance wise and even a lot worse than the pure C++ 
solution. Here's my result with just CXXFLAGS="-O3":

Benchmarking mixdown (no coeff):
pure C++                : 680 ms
ASM SSE                 : 200 ms
GCC vector extensions   : 1970 ms

Benchmarking mixdown (WITH coeff):
pure C++                : 1280 ms
ASM SSE                 : 310 ms
GCC vector extensions   : 2540 ms

So I would say it's finally time to put hands on GCC's vector toys, wasting 
less time on hairy assembly tasks. I think for such simple algorithms like 
mixing it's completely sufficient to keep a pure C++ implementation and a GCC 
vector extension implementation side by side and just automatically determine 
by a small benchmark in the configure script (or whatever) which one of the 
two solutions to pick for compilation, dependent on what the user supplied as 
CXXFLAGS. At least that's what I'm going to do ... I think ...

CU
Christian



More information about the Linux-audio-dev mailing list