[linux-audio-dev] Catching up with XAP

David Olofson david at olofson.net
Tue Jan 14 20:57:01 UTC 2003


[Lost touch with the list, so I'm trying to catch up here... I did 
notice that gardena.net is gone - but I forgot that I was using 
david at gardena.net for this list! *heh*]

> Subject: Re: [linux-audio-dev] more on XAP Virtual Voice ID system
> From: Tim Hockin (thockin_AT_hockin.org)
> Date: Fri Jan 10 2003 - 00:49:07 EET 
> > > The plugin CAN use the VVID table to store flags about the
> > > voice,
> > > as you suggested. I just want to point out that this is
> > > essentially the same as the plugin communicating to the host
> > > about
> > > voices, just more passively.
> >
> > Only the host can't really make any sense of the data.
> 
> If flags are standardized, it can. Int32: 0 = unused, +ve = plugin 
> owned, -ve = special meaning.

Sure. I just don't see why it would be useful, or why the VVID 
subsystem should be turned into some kind of synth status API.


> > > It seems useful.
> >
> > Not really, because of the latency, the polling requirement and
> > the coarse timing.
> 
> When does the host allocate from the VVID list? Between blocks. As
> long as a synth flags or releases a VVID during it's block, the
> host benefits from it. The host has to keep a list of which VVIDs
> it still is working with, right?

No. Only the one who *allocated* a VVID can free it - and that means 
the sender. If *you* allocate a VVID, you don't want the host to 
steal it back whenever the *synth* decides it doesn't need the VVID 
any more. You'd just have to double check "your" VVIDs whenever you 
send events - and this just to support something that's really a 
synth implementation detail that just happens to take advantage of a 
host service.


> > > If the plugin can flag VVID table entries as released, the host
> > > can have a better idea of which VVIDs it can reuse.
> >
> > Why would this matter? Again, the host does *not* do physical
> > voice management.
> >
> > You can reuse a VVID at any time, because *you* know whether or
> > not you'll need it again. The synth just doesn't care, as all it
> > will
> 
> right, but if you hit the ned of the list and loop back to the
> start, you need to find the next VVID that is not in use by the
> HOST.

No, you just need to find the next VVID that *you're* not using, and 
reassign that to a new context. (ALLOC_VVID or whatever.) You don't 
really care whether or not the synth's version of a context keeps 
running for some time after you stop sending events for context; only 
the synth does - and if you're not going to send any more events for 
a context, there's no need to keep the VVID.


> That can include VVIDs that have ended spontaneously (again,
> hihat sample or whatever).

VVIDs can't end spontaneously. Only synth voices can, and VVIDs are 
only temporary references to voices. A voice may detach itself from 
"it's" VVID, but the VVID is still owned by the sender, and it's 
still effectively bound to the same context.

BTW, this means that synths should actually keep the voice control 
state for a VVID until it actually knows the context has ended. 
Normally, this just means that voices don't really detach themselves 
from VVIDs, but rather just go to sleep, until stolen or woken up 
again.

That is, synths with "virtual voices" might actually have use for a 
"DETACH_VVID" event. Without it, they basically have to keep both 
real and virtual voices indefinitely. Not sure it actually matters 
much, though. Performance wise, it just means you have to deal with 
voice controls and their ramping (if supported). And since a ramp is 
actually two events (ramp event with "aim point" + terminator event 
or new ramp event), and given that ramping across blocks (*) is not 
allowed, it still means "no events, no processing."

(*) I think I've said this before, but anyway: I don't think making
    ramping accross block boundaries illegal is a good idea. Starting
    a ramp is actually setting up a *state*. (The receiver transforms
    the event into a delta value that's applied to the value every
    sample frame.) It doesn't make sense to me to force senders to
    explicitly set a new state at the start of each block.

    Indeed, the fact that ramping events have target and duration
    arguments looks confusing, but really, it *is* an aim point; not
    a description of a ramp with a fixed duration. If this was a
    perfect world (without rounding errors...), you would have sent
    the delta value directly, but that just won't work in real life.

    If someone can come up with a better aim point format than
    <target, duration>, I'm all ears, because it really *is*
    confusing. It suggests that RAMP events don't behave like SET
    events, but that's just not the case. The only difference is
    that RAMP events set the internal "dvalue", while SET events
    set the "value", and zero "dvalue".


> The host just needs to discard any
> currently queued events for that (expired) VVID. The plugin is
> already ignoreing them.

The plugin is *not* ignoring them. It may route them to a "null 
voice", but that's basically an emergency action taken only when 
running out of voices. Normally, a synth would keep tracking events 
per VVID until the VVID is explicitly detached (see above), or the 
synth steals whatever object is used for the tracking.

Again, DETACH_VVID might still be a good idea. Synths won't be able 
to steal the right voices if they can't tell passive contexts from 
dead contexts...


[...]
> > A bowed string instrument is "triggered" by the bow pressure and
> > speed exceeding certain levels; not directly by the player
> > thinking
> 
> Disagree. SOUND is triggered by pressure/velocity. The instrument
> is ready as soon as bow contacts the string. 

Well, you don't need a *real* voice until you need to play sound, do 
you?

Either way, the distinction is a matter of synth implementation, 
which is why I think it should be "ignored" by the API. The API 
should not enforce the distinction, nor prevent synths from making 
use of the distinction.


> > No, I see a host sending continous control data to an init-latched
> > synth. This is nothing that an API can fix automatically.
> 
> Ok, let me make it more clear. Again, same example. The host wants
> to send 7 parameters to the Note-on. It sends 3 then VELOCITY. But
> as soon as VELOCITY is received 'init-time' is over. This is bad.

Yes, it's event ordering messed up. This will never happen unless the 
events are *created* out of order, or mixed up by some event 
processor. (Though, I can't see how an event processor could reorder 
incoming events while doing something useful. Remember; we're talking 
about real time events here; not events in a sequencer database.)


> The host has to know which control ends init time.

Why? So it can "automatically" reorder events at some point?


> Thus the
> NOTE/VOICE control we seem to be agreeing on.

Indeed, users, senders and some event processors might need to know 
which events are "triggers", and which events are latched rather than 
continous.

For example, my Wacom tablet would need to know which of X, Y, 
X-tilt, Y-tilt, Pressure and Distance to send last, and as what voice 
control. It may not always be obvious - unless it's always the 
NOTE/VOICE/GATE control.

The easiest way is to just make one event the "trigger", but I'm not 
sure it's the right thing to do. What if you have more than one 
control of this sort, and the "trigger" is actually a product of 
both? Maybe just assume that synths will use the standardized 
NOTE/VOICE/GATE control for one of these, and act as if that was the 
single trigger? (It would have to latch initializers based on that 
control only, even if it doesn't do anything else that way.)


[...]
> > If it has no voice controls, there will be no VVIDs. You can still
> > allocate and use one if you don't want to special case this,
> > though. Sending voice control events to channel control inputs is
> > safe, since the receiver will just ignore the 'vvid' field of
> > events.
> 
> I think that if it wants to be a synth, it understands VVIDS. It
> doesn't have to DO anything with them, but it needs to be aware.

Right, but I'm not even sure there is a reason why they should be 
aware of VVIDs. What would a mono synth do with VVIDs that anyone 
would care about?


> And the NOTE/VOICE starter is a voice-control, so any Instrument
> MUST have that. 

This is very "anti modular synth". NOTE/VOICE/GATE is a control type 
hint. I see no reason to imply that it can only be used for a certain 
kind of controls, since it's really just a "name" used by users 
and/or hosts to match ins and outs.

Why make Channel and Voice controls more different than they have to 
be?

	* Channel->Channel:
	* Voice->Voice:
		Just make the connection. These are obviously
		100% compatible.
	* Voice->Channel:
		Make the connection, and assume the user knows
		what he/she is doing, and won't send polyphonic
		data this way. (The Channel controls obviously
		ignore the extra indexing info in the VVIDs.)
	* Channel->Voice:
		This works IFF the synth ignores VVIDs.


You could have channel/voice control "mappers" and stuff, but I don't 
see why they should be made more complicated than necessary, when in 
most cases that make sense, they can actually just be NOPs.

About VVID management:
	Since mono synths won't need VVIDs, host shouldn't have to
	allocate any for them. (That would be a waste of resources.)
	The last case also indicates a handy shortcut you can take
	if you *know* that VVIDs won't be considered. Thus, I'd
	suggest that plugins can indicate that they won't use VVIDs.


[...]
> > Why? What does "end a voice" actually mean?
> 
> It means that the host wants this voice to stop. If there is a
> release phase, go to it. If not, end this voice (in a
> plugin-dpecific way).
> Without it, how do you enter the release phase?

Right, then we agree on that as well. What I mean is just that "end a 
voice" doesn't *explicitly* kill the voice instantly.

What might be confusing things is that I don's consider "voice" and 
"context" equivalent - and VVIDs refer to *contexts* rather than 
voices. There will generally be either zero or one voice connected to 
a context, but the same context may be used to play several notes.


> > >From the sender POV:
> > I'm done with this context, and won't send any more events
> > referring to it's VVID.
> 
> No. It means I want the sound on this voice to stop. It implies the
> above, too. After a VOICE_OFF, no more events will be sent for this
> VVID.

That just won't work. You don't want continous pitch and stuff to 
work except when the note is on?

Stopping a note is *not* equivalent to releasing the context in which 
it was played.

Another example that demonstrates why this distinction matters would 
be a polyphonic synth with automatic glisando. (Something you can 
hardly get right with MIDI, BTW. You haveto use multiple monophonic 
channels, or trust the synth to be smart enough to do the right 
thing.)

Starting a new note on a VVID when a previous note is still in the 
release phase would cause a glisando, while if the VVID has no 
playing voice, one would be activated and started as needed to play a 
new note. The sender can't reliably know which action will be taken 
for each new note, so it really *has* to be left to the synth to 
decide. And for this, the lifetime of VVIDs/contexts need to span 
zero or more notes, with no upper limit.


> > >From the synth POV:
> > The voice assigned to this VVID is now silent and passive,
> 
> More, the VVID is done. No more events for this VVID.

Nope, not unless *both* the synth and the sender have released the 
VVID.


> The reason
> that VVID_ALLOC is needed at voice_start is because the host might
> never have sent a VOICE_OFF. Or maybe we can make it simpler:

If the host/sender doesn't sent VOICE_OFF when needed, it's broken, 
just like a MIDI sequencer that forgets to stop playing notes when 
you hit the stop button.

And yes, this is another reason to somehow mark the VOICE/NOTE/GATE 
control as special.


> Host turns the NOTE/VOICE on.
> It can either turn the NOTE/VOICE off or DETACH it. Here your
> detach name makes more sense.

VOICE_OFF and DETACH *have* to be separate concepts. (See above.)


> A step sequencer would turn a note
> on, then immediately
> detach.

It would have to send note off as well I think, or we'd have another 
special case to make all senders compatible with all synths. (And if 
"note on" is actually a change of the VOICE/GATE control from 0 to 1, 
you *have* to send an "off" event as well, or the synth won't detect 
any further "note on" events in that context.)


> > assumed to be more special than it really is.
> > NOTE/VOICE_ON/VOICE_OFF
> > is a gate control. What more do you need to say about it?
> 
> Only if you assume a voice lives forever, which is wasteful.

It may be wasteful to use real, active voices just to track control 
changes, but voice control tracking cannot be avoided.


> Besides that, a gate that gets turned off and on and off and on
> does not restart a voice, just mutes it temporarily. Not pause, not
> restart - mute.

Who says? I think that sounds *very* much like a synth implementation 
thing - but point taken; "GATE" is probably not a good name.


[...]
> Well, you CAN change Program any time you like - it is not a
> per-voice control.

In fact, on some MIDI synths, you have to assume it is, sort of. 
Sending a PC to a Roland JV-1080 has it instantl kill any notes 
playing on that channel, go to sleep for a few hundreds of a second, 
and then process any events that might have arrived for the channel 
during the "nap". This really sucks, but that's the way it works, and 
it's probably not the only synth that does this.

(The technical reason is most probably that spare "patch slots" would 
have been required to do it in any other way - and as I've discovered 
with Audiality, that's not as trivial to get right as it might seem 
at first. You have to let the old patch see *some* of the new events 
for the channel, until the old patch decides to die.)

AWE, Live! and Audigy cards don't do it this way - but PC is *still* 
not an ordinary control. Playing notes remain controlled by the old 
patch until they receive their NoteOffs. PC always has to occur 
*before* new notes.


Either way, MIDI doesn't have many voice controls at all, and our 
channel controls are more similar to MIDI (Channel) CCs in some ways. 
(Not addressed by note pitch, most importantly.)

That is, they can't be compared directly - but the concept that some 
controls must be sent before they're latched to have the desired 
effect is still relevant.


[...]
> Idea 2: similar to idea 1, but less explicit.
>  -- INIT:
>       send SET(new_vvid, ctrl) /* implicitly creates a voice */
>       send VOICE_ON(new_vvid) /* start the vvid */
>  -- RELEASE:
>       send SET(new_vvid, ctrl) /* send with time X */
>       send VOICE_OFF(vvid) /* also time X - plug 'knows' it was for 
> release */

I see why you don't like this. You're forgetting that it's the 
*value* that is the "initializer" for the VOICE_OFF action; not the 
SET event that brings it. Of course the plugin "knows" - the last set 
put a new value into the control that the VOICE_OFF action code looks 
at! :-)

A synth is a state machine, and the events are just what provides it 
with data and - directly or indirectly - triggers state changes.


We have two issues to deal with, basically:

	1. Tracking of voice controls.

	2. Allocation and control of physical voices.


The easy way is to assume that you use a physical voice whenever you 
need to track voice controls, but that's just an assumption that a 
synth author would make to make the implementation simpler. It 
doesn't *have* to be done that way.

If 1 and 2 are handled as separate things by a synth, 2 becomes an 
implementation issue *entirely*. Senders and hosts don't really have 
a right to know anything about this - mostly because there are so 
many ways of doing it that it just doesn't make sense to pretend that 
anyone cares.

As to 1, that's what we're really talking about here. When do you 
start and stop tracking voice controls?

Simple: When you get the first control for a "new" VVID, start 
tracking. When you know there will be no more data for that VVID, or 
that you just don't care anymore (voice and/or context stolen), stop 
tracking.

So, this is what I'm sugesting ( {X} means loop X, 0+ times ) :

* Context allocation:
	// Prepare the synth to receive events for 'my_vvid'
	send(ALLOC_VVID, my_vvid)
	// (Control tracking starts here.)

{
	* Starting a note:
		// Set up any latched controls here
		send(CONTROL, <whatever>, my_vvid, <value>)
		...
		// (Synth updates control values.)

		// Start the note!
		send(CONTROL, VOICE, my_vvid, 1)
		// (Synth latches "on" controls and (re)starts
		//  voice. If control tracking is not done by
		//  real voices, this is when a real voice would
		//  be allocated.)

	* Stopping a note:
		send(CONTROL, <whatever>, my_vvid, <value>)
		...
		// (Synth updates control values.)

		// Stop the note!
		send(CONTROL, VOICE, my_vvid, 0)
		// (Synth latches "off" controls and enters the
		//  release phase.)

	* Controling a note (even in release phase!):
		send(CONTROL, <whatever>, my_vvid, <value>)
		// (Synth updates control value.)
}

* Context deallocation:
	// Tell the synth we won't talk any more about 'my_vvid'
	send(DETACH_VVID, my_vvid)
	// (Control tracking stops here.)


This still contains a logic flaw, though. Continous control synths 	
won't necessarily trigger on the VOICE control changes. Does it make 
sense to assume that they'll latch latched controls at VOICE control 
changes anyway? It seems illogical to me, but I can see why it might 
seem to make sense in some cases...


//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`---------------------------> http://olofson.net/audiality -'
   --- http://olofson.net --- http://www.reologica.se ---



More information about the Linux-audio-dev mailing list