[linux-audio-dev] Plugin APIs (again)

David Olofson david at olofson.net
Sat Dec 7 18:00:01 UTC 2002


On Saturday 07 December 2002 20.12, Tim Hockin wrote:
> Combining three replies:
>
> Steve Harris <S.W.Harris at ecs.soton.ac.uk> wrote:
> > On Sat, Dec 07, 2002 at 12:25:09 -0800, Tim Hockin wrote:
> > > This is what I am thinking:  One of the flags on each control
> > > is whether it is a CTRL_MASTER, or a CTRL_VOICE (or both). 
> > > This allows the plugin to define a ctrl labelled (for example)
> > > VELOCITY and the host can connect it to any MIDI input, if it
> > > wants.
> >
> > I'm not sure that making one control both is meaningful. Per
> > voice and per instrument controls should probably be kept
> > seperate.
>
> Whether they are defined together or seperately, there are
> certainly controls that apply to each voice and to each
> (instrument, channel (midi), plugin).  For example volume.
>
> What advantage do you see to defining them seperately?  Certainly
> the structures are analogous, if not identical.
>
> So no one has offered me any suggestions on how we handle the clash
> between Master and per-voice controls.  To re-ask:

I think we're actually talking about *three* kinds of controls, while 
everyone is still talking about their own set of two kinds of 
controls. What I'm thinking is:

	Master Controls:
		Whether you have multipart/multichannel
		instruments/plugins or not, these always
		address the instrument/plugin instance
		as a whole.

		Ex:	Master output volume of a
			multipart sampler.

		MIDI:	System commands, some SysEx.

	Channel/Part Controls:
		These address a specific Channel/Part of
		an instrument/plugin. If the instrument/
		plugin has only one Channel/Part, these
		messages can be considered equivalent to
		Master Controls.

		Ex:	Channel dry send volume.

		MIDI:	All events with a channel field.

	Voice/Note Controls
		These address specific Notes or Voices,
		to control them individually while they
		are playing. Whether or not a Note or
		Voice is a single oscillator or a network
		of objects is irrelevant to the API -
		this is an interface to an abstract object.

		Ex:	Note Aftertouch.

		MIDI:	NoteOn, NoteOff, Poly Pressure,
			various Universal SysEx messages.


> <RE-ASK rephrased="true">
> Let's assume a plugin provides a control named VOLUME which is both
> VOICE and MASTER.  Let's assume the current state of VOLUME is the
> value 5.0.  Let's assume the sequencer triggers a voice and sets
> VOLUME for that voice to 8.0. The user then turns the master VOLUME
> down to 0.  What happens to the value of the voice's VOLUME.
> 	a) goes to 0
> 	b) ends up at 3.0
> 	c) stays at 8.0
>
> Maybe controls that are both MASTER and VOICE should be absolute
> values for MASTER and scalars against the MASTER value per-voice?
> </RE-ASK>

I don't think this has much to do with the API - and I don't see the 
possibility of a clash. If you take any two controls, they may have 
any relation whatsoever, *defined by the synth*, and the host should 
assume nothing about that relation. Whether the two controls are both 
VOICE, both MASTER or one of each, should not matter - the synth 
still decide what their physical relation is.

So, in your example, you simply have two controls; MASTER::VOLUME and 
VOICE::VOLUME. The synth may multiply them internally, or apply 
MASTER::VOLUME in the output mixing loop, or whatever. Two controls, 
two values - the two controls just happen to have the same hint: 
VOLUME.


> > > 	1) * Host calls plug->note_on(timestamp, note, velocity), and
> > > 	     gets a voice-id.
> >
> > I dont think I like the firstclass-ness of note and velocity, a
> > we've
>
> I agree, mostly.  I threw this in as a bit of a red-herring, to see
> what people were thinking.
>
> > > 	   * Host sends n VOICE_PARAM events to set up any params it
> > > wants
> >
> > You could just send pitch and velocity this way?
>
> Absolutely.  HOWEVER, I have one design issue with it:  Velocity is
> not a continuous control.  You can't adjust the velocity halfway
> through a long note.  You can adjust the pitch.  You can adjust
> portamento time.  Velocity relates SPECIFICALLY to the attack and
> release force of the musician.

You're thinking entirely in keyboardist terms here. :-) You're 
forgetting that some instruments really don't have anything like 
attack and release in real life, but rather "bow speed", "air speed" 
and that kind of stuff.

I think I've suggested before that you may replace the velocity 
"control" with a continous control like "bow speed" (what was that 
generic name I came up with?) - which, if you think in keybordist 
terms would correspond to key "position". Derive that at the point of 
contact, and you have your velocity. You could then indeed think of 
velocity as a continous control - it's just that your average 
keyboard controller only changes it when you press or release a key.

So, given that controllers and instruments are different, it might be 
a good idea to support *both* "velocity" and "position" (pick one for 
each patch/sound!) - and both would be continous, of course.


> Unless we all agree that velocity ==
> loudness, which will be tough, since I think _I_ disagree.

Well, I agree with you - velocity is *not* loudness. (There are other 
controls for that kind of stuff; just look through the "standard" 
MIDI CC list.)


> Could I get you to accept voice_on(velocity); and pass all the rest
> as events?

Well, you won't get *me* to accept it that easy. ;-)


[...]
> > a Channel, and Plugins may or may not be allowed to have multiple
> > Channels. As to the control protocol, if you want to do away with
> > the "channel" field, you could just allow synths to have more
> > than one event input port. One might think that that means more
> > event decoding
>
> So I haven't come to this note in my TODO list yet.  So we want to
> support the idea of (MIDI word) multi-timbral plugins?

I think so, but I'm not 100% determined. (Audiality is still mostly a 
monolith, and I'm not completely sure how to best split it up into 
real plugins.)


> Given that
> it is software, we can say 'just load a new instance for the new
> channel'.

Yeah, but what if you want non-trivial interaction between the 
parts/channels? (Drum kits or something...)

And either way, you may be able to optimize things better if you're 
able to run the loops over parts/channels where appropriate, which is 
something you cannot do if the parts/channels have to be separate 
instances.


>  It prevents anyone from doing an exact software mockup
> of a bit of hardware, but I'm inclined not to care that much..

If that is prevented, I think VSTi and DXi will kill us before we 
even get off the ground. Many users *like* plugins that act and look 
like real hardware devices, and I don't think there's much point in 
trying to reeducate them. Remember that lots of people actually come 
from the hardware world.

This is not to say we should hold back inovation and proper software 
design in favor of emulating hardware. I just believe that preventing 
plugins from being multipart doesn't make much sense. Just reserve 
one control/event port for the whole instrument/plugin, and then 
allow the instrument/plugin to have as many other control/event ports 
as it likes - just like audio in and out ports.


> > ports. As of now, there is one event port for each Channel, and
> > one event port for each physical Voice. "Virtual Voices" (ie
> > Voices as
>
> Are you using multiple event ports, then, or just having one port
> per instrument-plugin and sending EV_VOICE_FOO and EV_MASTER_FOO
> (modulo naming) events?

Well, there actually *are* things very similar to "instrument 
plugins" in Audiality (although they don't process audio; only 
events): Patch Plugins. Each Patch Plugin instance has an input event 
port.

One argument of all events sent to Patch Plugins (or rather, to 
Channels, which is where Patch Plugins are plugged in) is used as a 
"tag" field, which polyphonic Patch Plugins let you use to reference 
individual voices. The MIDI->Audiality mapper just throws the MIDI 
pitch into this field, and -1 ("all") when there is no pitch.

As to master events, there are none of those at this point, since the 
FX insert plugins - for some reason - don't use the event system for 
control yet.


> > bandwidth, and it works well for most keyboard controlled
> > instruments, but that's about it.
>
> Other than Organs, which you've mentioned, what kinds of
> instruments don't have some concept of velocity (whether they
> ignore it or not..).

The organ doesn't have velocity at all - but the violin, and all wind 
instruments I can think of, would be examples of instruments that 
indeed *have* "some concept of velocity" - but one that is very far 
from on/off with a velocity argument.


> > As to the "gets a voice ID" thing, no, I think this is a bad
> > idea. The host should *pick* a voice ID from a previously
> > allocated pool of Virtual Voice IDs. That way, you eliminate the
> > host-plugin or plugin-plugin (this is where it starts getting
> > interesting) roundtrip, which means you can, amond other things:
>
> I'm still not clear on this.  What plugin would trigger another
> plugin?

Any plugin that processes events rather than (or in addition to) 
audio. They're usually called "MIDI plugins".


>  Do you envision that both the host and a plugin would be
> controlling this plugin?

In some cases, that might make sense, yes.


> If so, how do you reconcile that they
> will each have a pool of VVIDs - I suppose they can get VVIDs from
> the host, but we're adding a fair bit of complexity now.

Where? They have to get the VVIDs from somewhere anyway. It's 
probably not a great idea to just assume that plugins can accept any 
integer number as a valid VVID, since that complicates/slows down 
voice lookup.

In fact, Audiality *does* take whatever "voice tags" you give it - 
but as a result, it has to search all voices owned by the channel to 
find the voices. *heh* Could be optimized, but why not just use an 
array of "Virtual Voice Info" structs; one for each valid VVID?

Anyway, dividing a range of integer numbers seems pretty trivial to 
me - but it should still go into the host SDK lib, of course. :-)


> > 	* send controls *before* you start a voice, if you like
>
> you can do this already - any voice-control event with a timestamp
> before (or equal to) the note_on timestamp can be assumed to be a
> startup param.

So you want to start sorting *everything*, just so people can write 
*slightly* simpler event sending code in some special cases?

I'd much rather require that events sent to a port are sent in 
timestamp order, so the host can just merge sorted event lists. In 
cases where you have multiple connections to the same event port, you 
just use "shadow ports", so the host gets a number of ordered event 
lists that it merges into one just before running the plugin that 
owns the real input port.


> > 	* implement smarter and tighter voice allocation
>
> I don't see what you mean by this, or how it matters.  I see voices
> as being the sole property of the instrument.  All the host knows
> about them is that they have some (int) id that is unique
> per-instrument.

Problem is that if that ID is the actual number of a real voice, you 
can't steal a voice in the middle of a buffer. You have to wait until 
the next buffer (or longer, if you're running in different theads, or 
over wire), so the event sender has a chance hearing that "hey, that 
voice now belongs to some other note!"

If the ID is decopled from physical voices, you (as a sender) can 
just go on with your business, and you don't *have* to know whether 
or not your IDs are still valid before sending events. The synth will 
keep track of the *whole* voice allocation matter; not just part of 
it.


> > 	* use a timestamped event system
>
> umm, I thik this is not solely dependant on vvids - I think
> timestamps will work just fine as proposed.

Yes, but see above... This problem is real, and I still have it in 
the lowest levels of Audiality, where it - fortunately - doesn't seem 
to matter all that much. (Internal buffer size is restricted, so it 
voice allocation granularity isn't all that bad.)

But consider what happens if you have voice related IDs on the 
instrument API level, and try to play a fast series of short notes. 
Since the plugin can't know for sure how long these notes will need 
their voices, it simply has to give you a now voice for each note. 
While, if you use VVIDs, you'd just pick a new VVID for each note, 
and let the synth manage the allocation of actual voices. If the 
notes are short enough, voices can be reused within a single buffer 
or roundtrip cycle.
 

>  It's a matter of
> taste, unity, and presentation we're discussing.

No. It's a matter of whether or not to depend on roundtrips and 
buffer splitting to avoid waste of resources.


> > 	* pipe events over cables, somewhat like MIDI
>
> Ahh, now HERE is something interesting.  I'd always assumed the
> plugin would return something, self-allocating a voice.  This is
> what you've called the round-trip problem, I guess. But it seems to
> me that even if you were piping something over a wire, you have
> some sort of local plugin handling it.  THAT plugin allocates the
> voice-id (arguing my model).

Why complicate things? With a protocol that doesn't entirely depend 
on short round-trip latency for proper operation, you can just tunnel 
the events directly to the receiver, possibly translating the 
timestamps if you're not in sample sync. (Which you should be if 
you're serious. Get proper hardware. :-)


> The question I was asking was: is
> voice-id allocation synchronous (call plug->voice_on(timestamp) and
> get a valid voice-id right now), or is it async (send a note-on
> event and wait for it to come back).

Allocation is just a matter of preparing the plugin for receiving and 
handling a specific number of VVIDs. You could just say that there 
are always 65536 VVIDs, but that sounds both wasteful and restrictive 
at the same time to me.

As to the implementation; if your plugin does what Audiality does 
right now, the plugin may ignore the VVID allocation requests 
entirely and just say "ok, whatever" - since anything that fits in 
the VVID field is ok. (Audiality just throws the VVID into the voice 
struct of the voice it actually allocates, and then looks for it when 
it receives events talking about it.)


> This raises another question for me - the host sends events to the
> plugins. Do the plugins send events back?  It seems useful.

Definitely. In fact, applications that don't support plugins that 
*send* events are frequently referred to as broken these days.


> The
> host has to handle syncronization issues (one recv event-queue per
> thread, or it has to be plugin -> host callback, or something), but
> that's OK.

Well, if your threads are not in sample sync, you're in trouble... 
But indeed, the host can deal with that as well, if for example, 
you're crazy enough to want to use dual, unsynchronized or 
"incompatible" sample rates. :-)


> Do timestamps matter to the host, except as
> bookkeeping?

Timestamps matter a lot if you're going to do anything much at all 
with the events. (Record them or convert them to MIDI, for example.)

That said, replies to requests (such as VVID allocation) should 
"bypass" the normal events. (Not that it would make much of a 
difference if you process them right after the plugin has returned 
from process() or at the end of the engine cycle. It may matter if 
you're piping events back and forth over wire, though.)


> > and all hosts... Also note that event size (which has to be
> > fixed) increases if you must squeeze in multiple parameters in
> > some events.
>
> Why does it have to be fixed-size?  It doesn't strictly HAVE to be.

It doesn't *have* to be - but if you look at the MAIA or Audiality 
event systems, you'll realize why I strongly prefer fixed size 
events. It's all about performance and complexity.

Note that you still *can* pass around data blocks that won't fit in 
an event. Just pass data blocks by reference. (I've designed 
protocols for that for MAIA, if you're interested in the details.)


>  On one hand I HATE when an API says 'you have three available
> params' foo->param1, foo->param2, foo->param3.  If you need more,
> too bad.

What do you need them for?

I have multiple parameters in Audiality events, but it seems like 
I'll need *less* parameters once I've cleaned out some silly 
MIDIisms. (Such as having pitch and velocity in "start" events.)


>  On the other hand, there are performance advantages to
> having events pre-allocated, and thereby, fixed sized, or at least
> bounded.

Exactly. Especially considering that you'll send *multiple* events 
for every "voice_start", and that there may be a lot of control 
changes while playing real music, performance matters. We can 
probably forget about anything like traditional dynamic memory 
allocation.

You *could* a few different event sizes - but then you'd have to 
specify size when allocating them, and the deallocation code would 
have to check some hidden field or something, so see how big every 
event is, in order to put it back in the right pool.


(This was actually written by Steve Harris, AFAIK. :-)
> > The intention is that these things would (on the whole) be sound
> > generators, right? To me plugin implies inline processing.
>
> This API is not purely instrumental.  It can certainly be for
> effects and sinks, too.  That said, we're spending a lot of time on
> the instrumental part because it's the new ground wrt LADSPA.

Yes. Let's just don't forget that although we may need to design 
something new from scratch to get a nice and clean API, we do *not* 
automatically have to throw away the feature set of LADSPA.

We should basically have LADSPA functionality, but without LADSPA 
style control ports, and with event ports + "abstract controls" 
instead. Some of the "negotiation" calls may be replaced by events. 
(MAIA philosophy again.)


//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`---------------------------> http://olofson.net/audiality -'
.- M A I A -------------------------------------------------.
|    The Multimedia Application Integration Architecture    |
`----------------------------> http://www.linuxdj.com/maia -'
   --- http://olofson.net --- http://www.reologica.se ---



More information about the Linux-audio-dev mailing list