[linux-audio-dev] Arbitrary bufsizes in plugins requiring power of 2 bufsizes, Was: jack_convolve-0.0.10, libconvolve-0.0.3 released

Benno Senoner sbenno at gardena.net
Wed Jun 29 11:20:31 UTC 2005


My suggestion is to handle buffering in the convolution plugin and 
accept any buffer size from the host.

I'd do it without threading to ensure the lowest possible latency.

For example:

assume we run convolution at 512 samples.

use a ringbuffer structure (eg like RingBuffer.h in LinuxSampler).
http://cvs.linuxsampler.org/cgi-bin/viewcvs.cgi/*checkout*/linuxsampler/src/common/RingBuffer.h?rev=1.6&content-type=text/plain

(the process() in the example below is just a pseudo plugin API , input 
and output is mono)

process(float *input,float *output, int numframes) {  

  if(numframes == 512) {
    convolve(input, output, 512);
    return;
  }

  ringbuffer->write(input, numframes);

  while(numframes >0) {
    if(ringbuffer->read_space() >= 512) {
      ringbuffer->read(temp_buf, 512);
      convolve(temp_buf, output, 512);  // does convolution and writes 
to the output array
      numframes -= 512;
    }
    else {
      write_silence(output,512);
      numframes -= 512;
    }
  }

}


This approach has the advantage that if the host supplies the 
convolution plugin with 512 frames then
the added latency due to buffering is zero since
if(numframes == 512)  then it calls convolve() and returns without 
messing with ringbuffers.
 
Otherwise, with the above approach both the number of frames used in the 
convolver and number of frames supplied by the host can
be completely arbitrary.

Drawbacks of the approach:
especially on high CPU usage plugins (and convolution IS cpu hungry, 
especially at low buffer sizes),  since the
host will run the process() callbacks in RT mode, CPU spikes could 
introduce xruns and other bad stuff.
Assume a 512 frames convolution will take 80% of the CPU on a certain 
machine.
At 44.1kHz 512 frames = 11msec. 80% of 11msec =9.2msec
If we run the above code in a host enviroment that uses eg 256 frames 
(5.5msec buffers),
the first time process() is called the >=512 condition is not satisfied 
and thus a 0 filled buffer is returned (silence).
At the second process() call, the >=512 condition is satisfied (there 
are exactly 512 frames in the buffer).
And the convolve() function is called, eating 9.2msec of CPU.
Since 9.2msec > 5.5msec ... sh*t happens ... XRUN.

If numframes supplied by the host is bigger than 512 then there are no 
CPU spike problems.
For example if the host supplies 1024 frames, the above code would call
convolve() 2 times outputting 1024 frames.  (eating 2x9.2msec out of the 
22msec available)
It would be a bit inefficient because if the plugin knows that the host 
supplies at least 1024 frames
then you could run the convolution at 1024 achieving greater efficiency.

If the host guarantees that it always supplies the same number of frames 
then the convolver could adjust
it's internal framesize to to achieve optimal CPU usage.

If not then a scheme like the above one is unavoidable.

Just for curiousity, does anyone know that's the current status of the 
variable/fixed buffer sizes scenarios
supplied to plugins by hosts on various plugin platforms like VST, AU etc ?

The above code does some memory bouncing (only when numframes supplied 
by the host does not match
the number of frames used in the convolver):
 it first copies the input to the ringbuffer's own buffer and then back
to temp_buf. So some memory bandwidth is wasted but I think as long as 
you don't run hundreds of convolution plugins
(impossible on today's machines) the added overhead is negligible since 
convolution is so CPU heavy.

I think with an approach like the above you achieve the best of both 
worlds, no added latency if the host calls
the plugin with numframes = power of 2 (matching the internal 
convolver's buffer size), and some added latency
if the host does not use powers of 2.

Regarding the CPU spikes, if the convolver uses less than 50% of CPU 
then you can run the host with the half
convolver's numframes without getting XRUNs.
eg if the convolver uses 40% CPU at 512 frames then running it in a host 
with 256 frames then the convolver will
still use an average of 40% CPU but it will experience 80% CPU spikes.
(eg 80% 0% 80% 0% etc ...).
This is not so good because if we want to add an other plugin that has a 
constant CPU usage of 40%
which would lead to an average 80% CPU usage we can't because  during 
the 80% CPU spike we have
only 20% of CPU headroom left.


Florian, since we would like to add convolution to LinuxSampler over 
time it would be cool if you could add the above
ideas to libconvolve so that one can use the lib without worrying about 
supplying the right buffer sizes etc, and
in plugin hosts enviroments it would be handy too since we don't always 
know what the host will do.

cheers,
Benno
http://www.linuxsampler.org




Florian Schmidt wrote

>
>Or should the plugin do this internally and simply report to the host
>that it needs a fixed buffer size (which then corresponds to the audio
>system's buffer size).. Are dssi/ladspa's allowed to do threading?
>Without i wouldn't know how to do it. And even if it were allowed to do
>threading, how would the dssi know which priorities to use, etc (on a RP
>kernel it should have prio higher than i.e. hd and net irq's, but lower
>than the jack audio thread).
>
>Plus i wonder whether the (then fixed) buffer size should be user
>configurable in any way or would the plugin simply report "16k frames is
>what i want" :) Sometimes it does make sense to use it in realtime mode
>(with the same buffer size as the audio system), if you have the cpu
>power or the responses are short enough.
>
>Regards,
>Flo
>
>  
>




More information about the Linux-audio-dev mailing list