[linux-audio-dev] Plugin APIs (again)

David Olofson david at olofson.net
Wed Dec 4 18:41:00 UTC 2002


On Wednesday 04 December 2002 22.33, Tim Hockin wrote:
> > > I disagree with that - this is a waste of DSP cycles processing
> > > to be sent nowhere.
> >
> > So, why would you ask the plugin to set up outputs that you won't
> > connect, and then force the plugin to have another conditional to
> > check whether the output is connected or not?
>
> This confuses me.   A plugin says it can handle 1-6 channels.  The
> host only connects 2 channels.  The plugin loops for i = 0 to i =
> me->nchannels. There isn't any checking.

If you have a "group" of channels, and just want to skip one in the 
middle, that won't work.


>  If the plugin says it can
> handle 2-6 channels and the host only connects 1, it is an error. 
> Connect at least the minimum, up to the maximum.  In typing this,
> I've seen that discontiguous connections do, in fact, require
> condionals.

Yes, that's exactly what I'm thinking about.


> Maybe it is safe to say you have to connect ports in
> order?

Safe, bit it wouldn't be sufficient. Consider my mono->5.1 example.


> > I would propose that the pre-instantiation host/plugin
> > "negotiations" including:
> >
> > 	* A way for the host to tell the plugin how many ports of
> > 	  each type it wants for a particular instance of the plugin.
>
> This is exactly what I'm talking about with the connect methods. 
> Before we go into PLAY mode, we ask for a certain number of
> channels.
>
> > 	* A way for the host to *ask* the plugin to disable certain
> > 	  ports if possible, so they can be left disconnected.
>
> hmm, this is interesting, but now we're adding the conditional

Well, the point is that the conditional doesn't have to end up in the 
inner loop of the plugin. The host could throw in a silent or garbage 
buffer, if the plugin coder decides it's too hairy to implement the 
plugin in a different way.

Then again, the plugin could use a private array of buffer pointers, 
and throw in silence/garbage buffers itself. The buffers should still 
be supplied by the host, though, to reduce memory use and cache 
thrashing. (This is just hopefully just a "rare" special case, but 
anyway...)


> > plugin with two 1D, contiguous arrays (although possibly with
> > some ports disabled, if the plugin supports it); one for inputs
> > and one for outputs. That will simplify the low level/DSP code,
> > and I think
>
> Yes, I've come around to this.  The question in my mind is now
> about disabling (or just not connecting) some ports.
>
> > Now, if the plugin didn't support DisableSingle on the output
> > ports of type Out;5.1, you'd have to accept getting all 6 outs,
> > and just route the bass and centel channels to "/dev/null". It
> > should be easy enough for the host, and it could simplify and/or
> > speed up the average case (all outputs used, assumed) of the
> > plugin a bit, since there's no need for conditionals in the inner
> > loop, mixing one buffer for each output at a time, or having 63
> > (!) different versions of the mixing loop.
>
> ok, I see now.  If the plugin supports disabling, the host can use
> it.  If the plugin is faster to assume all ports connected, it does
> that instead.

Yes, that's the idea. And for the host, these are decisions made when 
building the net, so it doesn't matter performance wise. It's code 
that needs to be there, indeed - but it's simple and generic enough 
that it could go in the host SDK. (Like that state changing function 
of Audiality, for example; it comes with the plugin API.)


> I think I rather like that.

Good - then it might not be totally nonsense. :-)


> > think it's a bad idea to *require* that plugins support it.
>
> This is key, again, you've convinced me.
>
> > strongly prefer working with individual mono waveforms, each on a
> > voice of their own, as this offers much more flexibility. (And
> > it's also a helluva' lot easier to implement a sampler that way!
> > :-)
>
> just so we're clear, 'voice' in your terminology == 'channel' in
> mine?

Well... If your definition of channel is like in the (classic) 
tracker days, yes. What I call a voice is what plays a single 
waveform in a synth or sampler. Depending on the design, it may only 
have waveform and pitch controls - or it may include filters, 
envelope generators, LFOs, distortion, panning and whatnot.

In fact, a voice could theoretically even combine multiple waveforms, 
but that borders to something I'd call a "voice structure" - and that 
should probably be published as multiple voices in a voice oriented 
API.

I Audiality however, a 'channel' doesn't have a fixed relation to 
audio processing. It's basically like a channel in MIDI speak, and 
when dealing with notes, you're really dealing with *notes* - not 
voices. (Remember  the CoffeeBrewer patch? ;-)


Anyway, my original comment was really about synth/sampler 
programming, where I prefer to construct stereo sounds from multiple 
mono samples (each sample on it's own voice), as opposed to working 
with voices that are capable of playing stereo waveforms.

That said, Audiality supports stereo voices. Don't know if I'll keep 
that feature, though. It's a performance hack for sound effects in 
games, mostly, and at some point, I'll probably have to sacrifice 
some low end scalability for the high end.


> > ...provided there is a quarantee that there is a buffer for the
> > port. Or you'll segfault unless you check every port before
> > messing with it. :-)
>
> Do we need to provide a buffer for ports that are disabled?

Yes, if the plugin says it can't deal with disconnected ports. If it 
says it can, it's supposed to check the pointers at some point during 
the process() call - preferably once, before the event/DSP loop.

As to ports that are outside the number of ports requested by the 
host; well those are outside the loops, and simply don't exist.


[...]
> > > 		{ "left(4):mono(4)" }, { "right(4)" },
> >
> > Does this mean the plugin is supposed to understand that you want
> > a "mono mix" if you only connect the left output?
>
> If the host connects this pad to a mono effect, it knows that the
> 'left' channel is also named 'mono'.  I do not expect the plugin to
> mono-ize a stereo sample

Ok.


> (though it can if it feels clever).

This makes me nervous. :-)

Ok; it's rather handy that some synths automatically transform the 
left line output into a mono output if the right output is not 
connected (the JV-1080 does that, IIRC) - but when it comes to 
software, it's not like you need to drag in a mixer and cables to 
implement that outside the machine. I'd really rather not have 
plugins do all sorts of "smart" things without explicitly being asked 
to.


[...]
> > > * note_on returns an int voice-id
> > > * that voice-id is used by the host for note_off() or
> > > note_ctrl()
> >
> > That's the way I do it in Audiality - but it doesn't mix well
> > with timestamped events, not even within the context of the RT
> > engine core.
>
> how so - it seems if you want to send a voice-specific event, you'd
> need this

No, there are other ways. All you really need is a unique ID for each 
voice, for addressing per-voice.

The simple and obvious way is to just get the voice number from the 
voice allocator. The problem with that is that you need to keep track 
of whether or not that voice still belongs to you before trying to 
talk to it. In Audiality, I do that by allowing patch plugins (the 
units that drives one or more voices based on input from a 'channel') 
to "mark" voices they allocate with an ID - which is usually just the 
channel number.

That's not perfect, though: If you change the patch on that channel 
while you have old notes hanging, you either have to kill those notes 
first (JV-1080 style - not good), or you have to keep the old patch 
around, so it can control it's voices until they die. The latter is 
what I'm trying to do in Audiality, but it requires that patches can 
recognize their own voices *even* if there are other patches still 
working on the same channel.

I could mark voices with *both* channel and patch numbers, but I have 
a feeling that would only work until I discover *another* way that 
patches could lose track of their voices. A cleaner and more generic 
solution is needed. (Especially for a similar system for use in 
public plugin API!)

...And then there's still this roundtrip issue, of course.

So, what I'm considering for Audiality is 'Virtual Voice Handles'. 
When a patch is initialized, it asks the host for a number of these 
(contigous range), which will then function as the "virtual voice 
reserve" for the patch. When you want to allocate a voice, you send a 
VVH with the "request", and just assume that you got one. From then 
on, when you do *anything* with a voice, you reference it through the 
VVH, just as if it was a raw voice index.

If you don't get a voice, or it's stolen, it's no big deal. The synth 
will just ignore anything you say about that VVH. If you ask 
(roundtrip...), the host will tell you whether or not you have a 
voice - but the point is you don't *have* to wait for the reply. 
Worst thing that can happen is that you talk to an object that no 
longer exists, and (for a change!) that's totally ok, since there's 
no chance of anything else listening to that VVH.

Besides, a bonus with a VVH system is that the synth may implement 
voice "unstealing" and other fancy stuff. If a VVH loses it's voice, 
it can just get another voice later on. The synth would indeed have 
to keep track of phase and stuff for all active virtual voices, but 
that's most certainly a lot cheaper than actually *mixing* them.


[...]
> > Besides, VSTi has it. DXi has it. I bet TDM has it. I'm sure all
> > major digital audio editing systems (s/w or h/w) have it. Sample
> > accurate timing. I guess there is a reason. (Or: It's not just
> > me! :-)
>
> yeah, VSTi also has MIDI - need I say more?

Well, yes - and VST 2.0 also added quite a few API features that 
overlap with VST 1.0 features, to confuse developers. The current 
version is far from clean and simple when you look into the details.

I'm not saying that VST is the perfect API (in fact, I don't like it 
very much at all), but looking at the *feature set*, I think it's a 
very good model for what is needed for serious audio synthesis and 
processing these days.

Also note that VST 3.0 seems to be in development, which means we 
shouldn't consider VST 2.0 the perfect do-it-all design. It has 
flaws, and it lacks features that a significant number of users want 
or need. (I would be very interested in a complete list, BTW! :-)


> I'm becoming
> convinced, though.

Well, if you could find Benno (author of EVO; the Linux 
direct-from-disk sampler), he could probably demonstrate the 
performance advantages of timestamped event systems as well. :-)

As to Audiality, switching to timestamped events didn't slow anything 
down, but it did provide sample accurate timing. As a result, it also 
decouples timing accuracy from system buffer size, which means that 
envelopes, fades and stuff sound *exactly* the same regardless of 
engine settings.

BTW, another hint as to why sample accurate timing is critical: the 
envelope generators drive their voices through the event system. This 
simply does not work without concistent and accurate timing - and my 
experiences with some h/w synths have convinced me that sample 
accurate timing is the *minimum* for serious sound programming. If 
you don't have it, you'll have to mess with destructive waveform 
editing instead. (Consider attacks of percussion instruments and the 
like. You can't even program an analog style bass drum on a synth 
with flaky control timing, let alone higher pitched and faster 
sounds.)


> > What kind of knobs need to be ints? And what range/resolution
> > should they have...? You don't have to decide if you use floats.
>
> They should have the same range as floats - whatever their control
> struct dictates.

Of course - but as a designer, how do you decide on a "suitable" 
resolution for this fixed point control, which is actually what it 
is? (Unless it's *really* an integer control, of course.)


> > > I'd assume a violin modeller would have a BOWSPEED control. 
> > > The note_on() would tell it what the eventual pitch would be. 
> > > The plugin would use BOWSPEED to model the attack.
> >
> > Then how do you control pitch continously? ;-)
>
> with a per-voice pitchbend

Yeah - but why not just ditch note pith and bend, and go for 
per-voice continuous pitch? There's no need for sending the arbitrary 
parameter "pitch" with every note_on event, especially since some 
instruments may not care about it at all.


> > Some controls may not be possible to change in real time context
> > - but I still think it makes sense to use the control API for
> > things like that.
>
> I don't know if I like the idea of controls being flagged RT vs
> NONRT, but maybe it is necessary.

I'm afraid it is. If you look at delays, reverbs and things like 
that, you basically have three options:

	1) realloc() the buffers when the certain parameters
	   change, or

	2) decide on an absolute maximum buffer size, and always
	   allocate that during instantiation, or

	3) realloc() the buffers when a NONRT "max_delay" or
	   similar parameter is changed.


1 is out, since it can't work in real time systems. (Well, not 
without a real time memory manager, at least.) 2 "works", but is very 
restrictive, and not very friendly.

3 works and is relatively clean and simple - but it requires an 
"extra" interface for setting NONRT parameters. I would definitely 
prefer this to be basically the same as the normal control interface, 
as the only significant difference is in which context the control 
changes are executed.


> Or maybe it's not, and a user
> who changes a sample in real time can expect a glitch.

So you're not supposed to be able to implement delays that can take 
modulation of the delay parameters, and still cope with arbitrary 
delay lengths? (Just an example; this has been discussed before, so I 
guess some people have more, and real examples.)


> > An Algorithmically Generated Waveform script...?
>
> BUt what I don't get is:  who loads the data into the control?

Nor do I. Haven't decided yet. :-)


> a) hast will call deserialize() with a string or other standard
> format
> b) plugin will  load it from a file, in which case host
> passes the filename to the cotrol
> c) host loads a chunk of arbitrary data which it read from the
> plugin before saving/restoring - in which case how did it get there
> in the first place? (see a or b)

The problem you mention in c) is always present, actually, and this 
has been discussed before: It's about presets and defaults.

Do plugins have their own hardcoded defaults, or is the default just 
another preset? And where are presets stored?

I'd say that the most sensible thing is that the host worries about 
preset. Plugins already have an interface for getting/setting 
controls, so why should they have another? Implement it in the hosts 
- or even once and for all in the host SDK - and be done with it.

As to where the elusive "arbitrary data" goes, I've said I'm leaning 
towards external files via paths in string controls marked as 
FILE_<something>. I still think that makes sense, especially 
considering that these files of "arbitrary data" might be room 
impulse responses for convolution, or even bigger things.


> > Well, then I guess you'll need the "raw data block" type after
> > all, since advanced synth plugins will have a lot of input data
> > that cannot be expressed as one or more "normal" controls in any
> > sane way.
>
> Such as?

Impulse responses, compressed audio data, raw audio data, scripts,...


> Where does this data come from in the first place?

>From wherever we decide to keep defaults and presets. I would suggest 
that plugins should have "safe defaults" built-in (to prevent them 
from crashing if the default preset is missing, at least), and that 
presets are stored on disk, in some sort of database managed by the 
host SDK or something. When you install a plugin, it's presets would 
be added to this database.


> > Just as with callback models, that depends entirely on the API
> > and the plugin implementation. AFAIK, DXi has "ramp events". The
> > Audiality synth has linear ramp events for output/send levels.
>
> So does Apple Audio Units.  I am starting to like the idea..

Well, I had my doubts at first, but it seems to work pretty well with 
just "supposedly linear ramping". The big advantage (apart from not 
having to send one change event per sample, or using arbitrary 
control filters in all plugins) is that plugins can implement the 
ramping in any way they like internally. That is, if it takes heavy 
calculations to transform the control value into the internal 
coefficients you need for the DSP code, you may interpolate the 
coefficients instead. How is up to you - the host just expects you to 
do something that sounds reasonably linear to the user.


> > > Audiality, but if we're designing the same thing, why aren't we
> > > working on the same project?
> >
> > Well, that's the problem with Free/Open Source in general, I
> > think. The ones who care want to roll their own, and the ones
> > that don't care... well, they don't care, unless someone throws
> > something nice and ready to use at them.
> >
> > As to Audiality, that basically came to be "by accident". It
> > started
>
> Interesting how it came about, but why are you helping me turn my
> API into yours, instead of letting me work on yours?  Just curious.
>  I do like to roll my own, but I don't want to waste time..

Well, part of the answer to that is that I'm still interested in 
ideas - but of course, having a working implementation of some of my 
ideas probably won't hurt the discussion!

It seems that I got stuck in this thread instead of releasing the 
code... :-)

I have some space trouble on the site. I'll try to deal with it and 
put the whole package on-line tonight.


//David Olofson - Programmer, Composer, Open Source Advocate

.- Coming soon from VaporWare Inc...------------------------.
| The Return of Audiality! Real, working software. Really!  |
| Real time and off-line synthesis, scripting, MIDI, LGPL...|
`-----------------------------------> (Public Release RSN) -'
.- M A I A -------------------------------------------------.
|    The Multimedia Application Integration Architecture    |
`----------------------------> http://www.linuxdj.com/maia -'
   --- http://olofson.net --- http://www.reologica.se ---



More information about the Linux-audio-dev mailing list