I went through the source, trying to figure out what makes this
algorithm tick, looking for possible ways to parallelize all or parts of
it.
The innermost loop - doing the beef of the work - is in filter.c:
// ommiting update of coefficients
// ...
while (k--)
{
s1 += d1;
s2 += d2;
a += da;
x = *sig;
y = x - s2 * sect_ptr->z2;
*sig++ -= a * (sect_ptr->z2 + s2 * y - x);
y -= s1 * sect_ptr->z1;
sect_ptr->z2 = sect_ptr->z1 + s1 * y;
sect_ptr->z1 = y + 1e-10f;
}
This is called for each of the four filters like:
for (j = 0; j < bands_count; j++)
{
param_sect_proc(filter_ptr->sect + j, k,
sig, sfreq[j], sband[j], sgain[j]);
}
.. where the parameters are unique for each filter but the input signal
is the result of the previous run, which creates an unfortunate
dependency.
If sig instead had been splatted out on the 4 elements of an SSE vector,
all 4 filters could have been calculated in parallel, quite possibly
auto-vectorizable.
param_sect_proc(filter_ptr->sect + j, k,
sig[j], sfreq[j], sband[j], sgain[j]);
To ease the burden on gcc - making sure it actually notices the obvious
- it might be an advantage to move the outer loop (for j = ...) into two
loops in param_sect_proc. One loop for the coefficients (which will not
vectorize) and one for the actual work loop (which will.)
/j
On Sat, 2009-06-13 at 17:40 +0300, Nedko Arnaudov wrote:
Four-band parametric equaliser LV2 plugin. DSP code by
Fons Adriaensen.
Homepage:
http://nedko.arnaudov.name/soft/lv2fil/