[LAD] vectorization

Jens M Andreasen jens.andreasen at comhem.se
Mon Apr 28 22:24:16 UTC 2008

On Sat, 2008-04-19 at 00:30 +0300, Jussi Laako wrote:
> For simple operations, compilers are rather good on vectorization. Even 
> though I don't know if there's any support for multi-arch targets on 
> gcc, so that the SSE2/SSE3 optimized binary would run on hardware 
> without SSE (dynamic code selection)? I haven't got time to follow the 
> latest gcc developments.

I tried rewriting a "moog filter" slightly to calculate 4 voices in
parallel instead of one by declaring all scalars to be arrays and then
looping through them as in:

/* float r1,r2,r3,r4; */
float r1[4],r2[4],r3[4],r4[4]; // tmp

for(int i = 0; i < 4; ++i)
   r1[i] = b[1][i];
   b[1][i] = r3[i] = p[i] * (r4[i] + r2[i]) - r1[i] * f[i];

This strategy fails to auto-vectorize with gcc4.3 but works with icc
10.1 and almost quadruples thruput. Breaking up the filter in separate
smaller functions helped getting rid of confusion regarding what should
be the inner and outer loops. The functions are inlined anyways.

For applications that look like a bunch of identical channel strips,
this should be pretty useful. "Buy one and get three for free!" :-D

So it is not all science-fiction, but gcc is not quite there yet.

More information about the Linux-audio-dev mailing list