Re: [LAD] vectorization

28 Apr 2008

On Sat, 2008-04-19 at 00:30 +0300, Jussi Laako wrote:
...
  For simple operations, compilers are rather good on
vectorization. Even
 though I don't know if there's any support for multi-arch targets on
 gcc, so that the SSE2/SSE3 optimized binary would run on hardware
 without SSE (dynamic code selection)? I haven't got time to follow the
 latest gcc developments. 
I tried rewriting a "moog filter" slightly to calculate 4 voices in
parallel instead of one by declaring all scalars to be arrays and then
looping through them as in:
/* float r1,r2,r3,r4; */
float r1[4],r2[4],r3[4],r4[4]; // tmp
...
for(int i = 0; i < 4; ++i)
{
   ...
   r1[i] = b[1][i];
   b[1][i] = r3[i] = p[i] * (r4[i] + r2[i]) - r1[i] * f[i];
   ...
}
This strategy fails to auto-vectorize with gcc4.3 but works with icc
10.1 and almost quadruples thruput. Breaking up the filter in separate
smaller functions helped getting rid of confusion regarding what should
be the inner and outer loops. The functions are inlined anyways.
For applications that look like a bunch of identical channel strips,
this should be pretty useful. "Buy one and get three for free!" :-D
So it is not all science-fiction, but gcc is not quite there yet.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [LAD] vectorization