[LAD] Fw: Re: Some questions about the Jack callback

Mark D. McCurry mark.d.mccurry at gmail.com
Sat Sep 20 21:01:32 UTC 2014

On 09-20, Fons Adriaensen wrote:
> On Sat, Sep 20, 2014 at 04:10:13PM -0400, Mark D. McCurry wrote:
> > On 09-20, Fons Adriaensen wrote:
> > > Having to do 256 1024-point FFTs just to start a note is insane.
> > > It almost certainly means there is something fundamentally wrong
> > > with the synthesis algorithm used.
> > 
> > I agree with that notion.
> > In typical patches something between 2-10 IFFTs is expected and even this
> > cost strikes me as too high (zero IFFTs for pure PAD/SUB synth based).
> > In terms of worst case scenarios ZynAddSubFX can have some rather insane
> > characteristics given multiple parts, kits, voices, etc.
> > For instance if a user decided to use all padsynth instances at max quality,
> > they would need 12GB of memory just to store the resulting wavetables.
> > 
> > Such extremes are not really seen in practice, but things are slowly getting
> > optimized to avoid them when possible.
> You should really look at this from an information theory POV,
> combined with some psycho-acoustics.
> Suppose you have to deliver 256 samples in a period when a note
> starts. That amounts to around 5.3 ms at 48 kHz. That time limits
> the amount of spectral detail that can be detected given the
> output from the first period. Which means that there is no point
> in generating more detail in the first period of a note.
> Even on sustained notes the amount of spectral detail that can be
> detected by a human listener is limited by the critical bandwidth
> of human hearing (which increases with frequency). That means that
> any set of harmonics that fall within a critical bandwidth can be
> replaced by a single one with the same energy and nobody would be
> able to hear the difference. All this means that you *never* need
> 256 harmonics, not even on bass notes below Fs / (2 * 256). 
> And if the final output is a weighted sum of those IFFT outputs
> you can as well compute the weighted sum of the inputs and then
> do a single IFFT - it's a linear transform after all.

If you are proposing that 256 harmonics are not needed, then is there a
transformation that yields an equivilant psycho-acoustic output in less time
than the fft would have taken given any possible spectral input?
(The user has full control over the full spectrum in terms of phase/magnitude)
If so, I'd be interested in reading some papers on that topic, though I'm
skeptical as the work on k-sparse FFTs indicate that this FFT size is much too
small to gain any measurable advantages (above k\approx2).
As it stands the source for a per note voice wavetable is a spectral
representation which is combined with some frequency dependent manipulation
(eg removing harmonics which would alias and the aforementioned adaptive
harmonics) which get thrown into an IFFT.
The resulting wavetable is fairly large to make the error of linear interpolation
small (as to minimize the normal running cost).
Additionally the output from traversing the wavetable can be the source for a
number of nonlinear functions (FM/PM source function and distortions).
If there weren't any nonlinear functions later in the chain, then there might be
some additional flexibility, but I don't perceive too much wiggle room without
precalculating the possible wavetables.

Also, the idea of a set critical bandwidth is broken here due to the ability to
modulate wildly without recalculating the base wavetable.
(This is the largest correctness issue from a signal processing prespective in
that synthesis engine ATM).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.linuxaudio.org/pipermail/linux-audio-dev/attachments/20140920/acce088d/attachment.pgp>

More information about the Linux-audio-dev mailing list