Re: [LAD] vectorization

14 Feb 2008

On Thu, 2008-02-07 at 18:51 +0100, Malte Steiner wrote:
...
  Hello,
 I try to squeeze as much performance as possible out of my upcomming
 Linux synthesizer and try manual vectorization with following construct
 in c, mainly to vectorize away multiplications : 
Have you checked the SIMD code in Ardour? We have SIMD code for crucial
DSP. The functions we use are:
(pure ASM, defined in libs/ardour/sse_functions.s or _64bit.s)
mix_buffers_with_gain(float *dst, float *src, long nframes, float gain);
mix_buffers_no_gain (float *dst, float *src, long nframes);
apply_gain_to_buffer (float *buf, long nframes, float gain);
float compute_peak(float *buf, long nframes, float current);
(xmmintrin, defined in libs/ardour/sse_functions_xmm.cc)
find_peaks(float *buf, nframes_t nframes, float *min, float *max)
When I wrote the code, I was unable to get better results from the gcc
vectorizer. From what I've heard, it's supposed to be getting better.
But at least until that, we are using the above code.
Note especially the xmmintrin syntax. It's a brilliant way of doing
pseudo-assembler. It gives you the power of direct XMM (SIMD) register
access and direct SIMD calls.
compute_peak() returns the largest absolute peak value in buf and
current. (i,e. return max( max(abs(buf)), current) ). The function we
have is multiple magnitudes faster than anything GCC can come up with
from generic C code. This is partly because we are using 16-byte aligned
buffers and mostly because we can cheat and not run a true ABS function,
but a bit masking operation which works for audio data as there are no
infinites or NaNs in it.
All functions work with aligned and non-aligned data. With non-aligned
data, they will run one sample at a time until they reach alignment and
continue 4 buffers at a time.
  Sampo

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [LAD] vectorization