On Sat, Oct 20, 2012 at 07:02:16PM -0400, David Robillard wrote:
The goal of the plugin is to do synchronous (i.e.
processing in the
run() context) convolution in a *strictly* hard-real-time fashion with
no latency. Doing the processing in another thread(s) does not meet
this requirement essentially by definition.
This is again *simply not true*. It's actually possible to prove that
the scheme used by zita-convolver will not fail unless the synchronous
solution would fail as well. This goes back to research done in the
1980's, you'll find the proof somewhere in the IEEE transactions on
parallel processing of around that time.
One reason (particular to this application and not even required for
the proof), is that using multiple partition sizes reduces the work
to be done, not just per CPU, but the total amount. Let's take one
example, the 'greathall' reverb that comes with jconvolver. Let's
assume a Jack period of 256 frames.
If done entirely synchronously, it uses 440 partitions of 256 frames.
When using multiple partition sizes, zita-convolver will (in this case)
use 3 partitions of 256 frames, 6 of 512 frames, 6 of 2048 frames, and
13 of 8192 frames, for a total of 28 partitions.
It's easy to show that to a first approximation the total CPU load
for FFT based convolution is proportional to the number of partitions
(exercise for the reader).
That means that the second solution takes 28/440 or around 7% of the
CPU load required for the first. Even if we allow for some overhead,
let's say it will be 15% instead of 7%, it's rather clear that in a
highly loaded system the synchronous solution will fail to deliver in
real time *way before* the other one. And what's worse, it will take
the rest of the Jack graph with it. In the other case, if the system
is fully loaded, the lower priority threads will fail, but your Jack
graph will still work.
If the processing is happening in another thread, then
e.g. at the very
first block, run() must either:
(1) Wait (i.e. block) for the other thread to process the data
(2) Add latency
(3) Busy wait and drop out if data is not available "in time"
(4) Attempt to split the work such that when run() finishes its part
the others will be done
Zita-convolver does none of (1), (2), or (3). Unless in sync mode, but
then it isn't supposed to work in real time.
(4) is admittedly clever, if you know there's idle
cores to make it
beneficial and make some optimistic assumptions about scheduling.
It does not depend on idle cores, it works perfectly on a single CPU
system as well.
Going back to the example above, the thread that does the 512 frames
partition is supposed to run every 512 frames of course. But its output
is not required until 768 frames (3 * 256) later. So it can be up to
a full Jack period late without any ill consequences. Does your approach
allow that ? For the lower priority threads there's even more headroom,
for example the last one runs every 8192 frames but its output is
required only after 16128 frames. So it can be 31 periods late without
affecting the output.
Note that the per-thread 'late' status flags provided by Convproc just
indicate that a thread had not finished its previous period when a new
one was to start. This does *not* mean that the thread is actually too
late delivering its contribution. Even in the worst case it still has
one Jack period of time, and for most of the threaads, a lot more.
convoLV2 aims to not do any of these things by doing
the processing
synchronously, which is much simpler and more reliable.
As shown above, in real life it will be much *less* reliable.
The cost is block length restrictions.
Again, that is not true. You can continue to repeat this, but that
will still not make it true.
Plugins that launch a bunch of processing threads can
be problematic for
other reasons, e.g. if you have many of them instantiated, and they are
already spread across all available cores by the host (as Ardour can).
No amount of priority tweaking will make hundreds of threads thrashing
in a situation like this work well, it's bloated and will fall apart
much sooner than synchronous plugins.
No, the opposite is true.
Other issues include memory consumption,
Nonsense. The bulk of memory used by a convolution algo is taken by signal
data in various forms. The multi-partition solution does not require any
more. And the overhead for the thread objects etc. is absolutely trivial
compared to what's required anyway for a typical reverb.
complexity
Ever taken a look at the FFTW code (which you use anyway) ? It's order
of magnitude more complicated than zita-convolver. Complexity is not
an issue.
and non-portability.
Zita-convolver works on Linux and OSX, that does not depend on *how*
you use it.
(1) Block length is arbitrary, in which case threads
are necessary
(2) Block length restrictions can be guaranteed, in which case threads
are pointless bloat
Both (1) and (2) are simply untrue, there is *no* relation at all between
using threads and allowable block sizes. You are really showing that you
haven't even started to understand how partitioned convolution works.
--
FA
A world of exhaustive, reliable metadata would be an utopia.
It's also a pipe-dream, founded on self-delusion, nerd hubris
and hysterically inflated market opportunities. (Cory Doctorow)