[linux-audio-dev] Plugin APIs (again)

Tim Hockin thockin at hockin.org
Wed Dec 4 04:15:01 UTC 2002


> Well, a guaranteed unique ID is really rather handy when you want to 
> load up a project on another system and be *sure* that you're using 
> the right plugins... That's about the only strong motivation I can 
> think of right now, but it's strong enough for me.

Ok, I see your motivation for this.  I hate the idea of 'centrally assigned'
anythings for something as open as this.  I'll think more on it..

> IMHO, plugins should not worry about whether or not their outputs are 
> connected. (In fact, there are reasons why you'd want to always 
> guarantee that all ports are connected before you let a plugin run. 
> Open inputs would be connected to a dummy silent output, and open 
> outputs would be connected to a /dev/null equivalent.)

I disagree with that - this is a waste of DSP cycles processing to be sent
nowhere.

> as some kind of interface to the plugin, but to me, it seems way too 
> limited to be of any real use. You still need a serious interface 

if it has no other use than 'ignore this signal and spare the CPU time' it
is good enough for me.

> just like you program a studio sampler to output stuff to the outputs 
> you want. This interface may be standardized or not - or there may be 
> both variants - but either way, it has to be more sophisticated than 
> one bit per output.

Ehh, again, I think it is simpler.  Lets assume a simple sampler.  It has a
single output with 0 or more channels (in my terminology).  If you load a
stereo sample, it has 2 channels.  A 5.1 sample has 6 channels.  Let's
consider an 8-pad drum machine.  It has 8 outputs each with 0-2 channels.
Load a stereo sample, that output has 2 channels.  Now, as I said, maybe
this is a bad idea.  Maybe it should be assumed that all outputs have 2
channels and mono gets duplicated to both or (simpler) LEFT is MONO.

What gets confusing is what we're really debating.  If I want to do a simple
stereo-only host, can I just connect the first pair of outs and the plugin
will route automatically?  Or do I need to connect all 8 to the same buffer
in order to get all the output. In the process of writing this I have
convinced myself you are right :)  If the host does not connect pad #2, pad
#2 is silent.

> I think there should be as little policy as possible in an API. As 
> in; if a plugin can assume that all ins and outs will be connected, 
> there are no special cases to worry about, and thus, no need for a 
> policy.

Slight change - a plugin only needs to handle connected inouts.  If an inout
is not connected, the plugin can skip it or do whatever it likes.

> Strongest resason *not* to use multichannel ports: They don't mix 
> well with how you work in a studio. If something gives you multiple 

I considered that.  At some point I made a conscious decision to trade off
that ability for the simplicity of knowing that all my stereo channels are
bonded together.  I guess I am rethinking that.

> Strongest reason *for*: When implementing it as interleaved data, it 

bleh - I always assumed an inout was n mono channels.  The only reason for
grouping them into inouts was to 'bond' them.

> Like on a studio mixing desk; little notes saying things like 
> "bdrum", "snare upper", "snare lower", "overhead left", "overhead 
> right" etc.

Should I use a different word than 'port'?  is it too overloaded with
LADSPA?

Hrrm, so how does something like this sound?

(metacode)

struct port_desc {
	char *names;
};
simple sampler descriptor {
	...
	int n_out_ports = 6;
	struct port_desc *out_ports[] = {
		{ "mono:left" }
		{ "right" }
		{ "rear:center" }
		{ "rear-left" }
		{ "rear-right" }
		{ "sub:lfe" }
	};
	...
};

So the host would know that if it connects 1 output, the name is "mono", and
if it connects 2 ouptuts, the names are "left", "right", etc.  Then it can
connect "left" to "left" on the next plugin automatically.  And if you want
to hook it up to a mono output, the user could be asked, or assumptions can
be made.  This has the advantage(?) of not specifying a range of acceptable
configs, but a list.  It can have 1, 2, or 6 channels.

another example:

drum machine descriptor {
	...
	int n_out_ports = 16;
	struct port_desc *out_ports[] = {
		{ "left:left(0):mono(0)" }, { "right:right(0)" },
		{ "left(1):mono(1)" }, { "right(1)" },
		{ "left(2):mono(2)" }, { "right(2)" },
		{ "left(3):mono(3)" }, { "right(3)" },
		{ "left(4):mono(4)" }, { "right(4)" },
		{ "left(5):mono(5)" }, { "right(5)" },
		{ "left(6):mono(6)" }, { "right(6)" },
		{ "left(7):mono(7)" }, { "right(7)" },
	};
	...
};

and finally:

mixer descriptor {
	...
	int n_in_ports = -1;
	struct port_desc *in_ports[] = {
		{ "in(%d)" }
	}
	int n_out_ports = 2;
	struct port_desc *out_ports[] = {
		{ "left:mono" }
		{ "right" }
	}
}

Or something similar.  It seems that this basic code would be duplicated in
almost every plugin.  Can we make assumptions and let the plugin leave it
blank if the assumptions are correct?

In thinking about this I realized a potential problem with not having bonded
channels.  A mixer strip is now a mono strip.  It seems really nice to be
able to say "Input 0 is 2-channels" and load a stereo mixer slot, "Input 1 is 
1-channel" and load a mono mixer slot, "Input 2 is 6-channel" and load a 5.1
mixer slot.

I'm back to being in a quandary.  Someone convince me!

> Point being that if the host understands the labels, it can figure 
> out what belongs together and thus may bundle mono ports together 
> into "multichannel cables" on the user interface level.

This is what the "inout is a bundle of mono channels" idea does.

> Well, I don't quite understand the voice_ison() call. I think voice 
> allocation best handled internally by each synth, as it's highly 
> implementation dependent.

My ideas wrt polyphony:

* note_on returns an int voice-id
* that voice-id is used by the host for note_off() or note_ctrl()
* you can limit polyphony in the host
	- when I trigger the 3rd voice on an instrument set for 2-voices, I
	can note_off() one of them
* you can limit polyphony in the instrument
	- host has triggered a 3rd voice, but I only support 2, so I
	internally note_off() one of them and return that voice_id again.
	The host can recognise that and account for polyphony accurately
	(even if it is nothing more than a counter).
* note_off identifies to the host if a voice has already ended (e.g. a sample)
* note_ison can be called by the host periodically for each voice to see
if it is still alive (think of step-sequenced samples).  If a sample ends,
the host would want to decrement it's voice counter.  The other option is a
callback to the host.  Not sure which is less ugly.

I am NOT trying to account for cross-app or cross-lan voices, though a JACK
instrument which reads from a JACK port would be neat.

Side note:  is there a mechanism in jack for me to pass (for example) a
'START NOW' message or a 'GO BACK 1000 samples' message to a Jack port?

>     a real problem so far, but I dot't like it. I want *everything*
>     sample accurate! ;-)

Actually, our focus is slightly different.  I'm FAR less concerned with
sample-accurate control.  Small enough buffers make tick-accurate control
viable in my mind.  But I could be convinced.  It sure is SIMPLER. :)

> > quite conventient for things like strings and pads.   FL does
> > Velocity, Pan, Filter Cut, Filter Res, and Pitch Bend.  Not sure
> > which of those I want to support, but I like the idea.
> 
> "None of those, but instead, anything" would be my suggestion. I 
> think it's a bad idea to "hardcode" a small number of controls into 
> the API. Some kind of lose "standard" such as the MIDI CC allocation, 
> could be handy, but the point is that control ports should just be 
> control ports; their function is supposed to be decided by the plugin 
> author.

I've contemplated an array of params that are configurable per-note.  Not
everything is.  What if we had something like

struct int_voice_param {
	int id;
	char *name;
	int low;
	int high;
};

and specify an array of them.  The host can use this array to build a list
of per-note params to display to the user.  This starts to get messy with
type-specific controls.  Perhaps this info belongs as part of the control
structure.  Yes, I think so.

> Should be handled on the UI level, IMHO. (See above.) Doing it down 
> here only complicates the connection managment for no real gain.

I want to ignore as much of it as possible in the UI.  I want to keep it
simple at the highest level so a musician spends his time making music, not
dragging virtual wires.  Ideally if there is a stereo instrument and I want
to add a stereo reverb, I'd just drop it in place, all connections made
automatically.  If I have a mono instrument and I want a stereo reverb, I'd
drop the reverb in place and it would automatically insert a mono-stereo 
panner plugin between them.

> Yeah... But this is one subject where I think you'll have to search 
> for a long time to find even two audio hackers that agree on the same 
> set of data types. ;-)

I think INT, FLOAT, and STRING suffice pretty well.  And I MAY be convinced
that INT is not needed.  Really, I prefer int (maps well to MIDI).  What
kinds of knobs need to be floats?

> Just a note here: Most real instruments don't have an absolute start 
> or end of each note. For example, a violin has it's pitch defined as 
> soon as you put your finger on the string - but when is the note-on, 
> and *what* is it? I would say "bow speed" would be much more 
> appropriate than on/off events.

I'd assume a violin modeller would have a BOWSPEED control.  The note_on()
would tell it what the eventual pitch would be.  The plugin would use
BOWSPEED to model the attack.

> Well, yes. There *has* to be a set of basic types that cover 
> "anything we can think of". (Very small set; probably just float and 
> raw data blocks.) I'm thinking that one might be able to have some 
> "conveniency types" implemented on top of the others, rather than a 
> larger number of actual types.

I agree - Bool is a flag on INT.  File is a flag on String.

> Dunno if this makes a lot of sense - I just have a feeling that 
> keeping the number of different objects in a system to a functional 
> minimum is generally a good idea. What the "functional minimum" is 
> here remains to see...

With this I agree.  One of the reasons I HATE so many APIs is that they are
grossly over normalized.  I don't need a pad_factory object and a pad object
and a plugin_factory object and a parameter object and an
automatable_parameter object and a scope object...  I want there to be as
FEW structs/objects as possible.

That said, one I am considering adding is a struct oapi_host.  This would
have callbacks for things like malloc, free, and mem_failure (the HOST
should decide how to handle memory allocation failures, not the plugin) as
well as higher level stuff like get_buffer, free_buffer, and who knows what
else.  Minimal, but it puts control for error handling back in the hands of
the host.

> Yeah, I know. It's just that I get nervous when something tries to do 
> "everything", but leaves out the "custom format" fallback for cases 
> that cannot be forseen. :-)

We're speaking of controls here.  In my mind controls have three
characteristics.  1) They have to specify enough information that the host can
draw a nice UI automatically.  2) They are automatable (whether it is sane
or not is different!).  3) They alone compose a preset.  What would a 
raw_data_block be?

> Well, you can put stuff in external files, but that seems a bit risky 
> to me, in some situations. Hosts should provide per-project space for 
> files that should always go with the project, and some rock solid way 
> of ensuring that

I don't really want the plugins writing files.  I'd rather see the host
write a preset file by reading all the control information, or by the host
calling a new char *oapi_serialize() method to store and a new
oapi_deserialize(char *data) method to load.

> should be used in any way in the API. (Although people on the VST 
> list seem to believe MIDI is great as a control protocol for plugins, 
> it's never going into another plugin API, if I can help it... What a 

:)  good good 

> So do I. I'm just trying to base my examples on well known equipment 
> and terminology.

Honestly, I don't know all the terminology.  I have never worked with much
studio gear.  Most of what I have done is in the software space.  So I may
be making mistakes by that, but I may also be tossing obsolete-but-accepted
notions for the same reason :)

[ state-machine header ... ]

Very interesting.  I actually like it very much.  I am going to have athink
on that.  It may be a better paradigm.

> Well, it's basically about sending structured data around, with 
> timestamps telling the receiver when to process the data. As an 
> example, instead of calling
> 
> 	voice_start(v, ev->arg1);
> 
> directly, at exactly the right time (which would mean you have to 
> split buffers for sample accurate timing), I do this:
> 
> 	aev_send1(&v->port, 0, VE_START, wave);
> 
> where aev_send1() is an inline function that grabs an event struct 
> from the pool, fills it in and sends it to the voice's event port. 
> The sender does nothing more about it for now; it just keeps 
> processing it's entire buffer and then returns. Just as if it had 
> been processing only audio buffers. In fact, the timestamped events 
> are very similar to audio data in that they contain both actual data 
> and timing information - it's just that the timing info is explicit 
> in events.

Interesting.  How important is this REALLY, though.  Let me break it into
two parts: note control and parameter control.  Note control can be tick
accurate as far as I am concerned :)  As for param control, it seems to me 
that a host that will automate params will PROBABLY have small ticks.  If
the ticks are small (10-50 samples), is there a REAL drawback to
tick-accurate control?  I know that philosophically there is, but REALLY.

In the event model, if I want a smooth ramp for a control between 0 and 100
across 10 ticks of 10 samples, do I need to send 10 'control += 1' events 
before each tick?

> Seriously, it's probably time to move on to the VSTi/DXi level now. 
> LADSPA and JACK rule, but the integration is still "only" on the 
> audio processing/routing level. We can't build a complete, seriously 
> useful virtual studio, until the execution and control of synths is 
> as rock solid as the audio.

Well, I really want to do it, so let's go.  You keep talking about
Audiality, but if we're designing the same thing, why aren't we working on
the same project?

Lots of ideas to noodle on and lose sleep on.  Looking forward to more
discussion

Thanks
Tim



More information about the Linux-audio-dev mailing list