[linux-audio-dev] Catching up with XAP

David Olofson david at olofson.net
Wed Jan 15 07:14:01 UTC 2003


On Wednesday 15 January 2003 10.42, Tim Hockin wrote:
> > [Lost touch with the list, so I'm trying to catch up here... I
> > did notice that gardena.net is gone - but I forgot that I was
> > using david at gardena.net for this list! *heh*]
>
> Woops!  Welcome back!

Well, thanks. :-)


[...]
> > The easiest way is to just make one event the "trigger", but I'm
> > not sure it's the right thing to do. What if you have more than
> > one control of this sort, and the "trigger" is actually a product
> > of both? Maybe just assume that synths will use the standardized
>
> The trigger is a virtual control which really just says whether the
> voice is on or not.  You set up all your init-latched controls in
> the init window, THEN you set the voice on.
>
> It is conceptually simple, similar to what people know and it fits
> well enough.  And I can't find any problems with it technically.

The only problem I have with it is that it's completely irrelevant to 
continous control synths - but they can just ignore it, or not have 
the control at all.


> > > And the NOTE/VOICE starter is a voice-control, so any
> > > Instrument MUST have that.
> >
> > This is very "anti modular synth". NOTE/VOICE/GATE is a control
> > type hint. I see no reason to imply that it can only be used for
> > a certain kind of controls, since it's really just a "name" used
> > by users and/or hosts to match ins and outs.
>
> This is not at all what I see as intuitive.  VOICE is a separate
> control used ONLY for voice control.  Instruments have it.  Effects
> do not.

There's this distinct FX vs instrument separation again. What is the 
actual motivation for enforcing that these are kept totally separate?

I don't see the separation as very intuitive at all. The only 
differences are that voices are (sort of) dynamically allocated, and 
that they have an extra dimension of addressing - and that applies 
*only* to polyphonic synths. For mono synths, a Channel is equivalent 
to a Voice for all practical matters.


> > About VVID management:
> > 	Since mono synths won't need VVIDs, host shouldn't have to
> > 	allocate any for them. (That would be a waste of resources.)
> > 	The last case also indicates a handy shortcut you can take
> > 	if you *know* that VVIDs won't be considered. Thus, I'd
> > 	suggest that plugins can indicate that they won't use VVIDs.
>
> This is a possible optimization.  I'll add it to my notes.  It may
> really not be worth it at all.

It's also totally optional. If you don't care to check the hint, just 
always use real VVIDs with Voice Controls, and never connect Channel 
Control outs to Voice Control ins, and everything will work fine.


[...]
> > What might be confusing things is that I don's consider "voice"
> > and "context" equivalent - and VVIDs refer to *contexts* rather
> > than voices. There will generally be either zero or one voice
> > connected to a context, but the same context may be used to play
> > several notes.
>
> I disagree - a VVID refers to a voice at some point in time.  A
> context can not be re-used.  Once a voice is stopped and the
> release has ended, that VVID has expired.

Why? Is there a good reason why a synth must not be allowed to 
function like the good old SID envelope generator, which can be 
switched on and off as desired?

Also, remember that there is nothing binding two notes at the same 
pitch together with our protocol, since (unlike MIDI) VVID != pitch. 
This means that a synth cannot reliably handle a new note starting 
before the release phase of a previous pitch has ended. It'll just 
have to allocate a new voice, completely independent of the old 
voice, and that's generally *not* what you want if you're trying to 
emulate real instruments.

For example, if you're playing the piano with the sustain pedal down, 
hitting the same key repeatedly doesn't really add new strings for 
that note, does it...?

With MIDI, this is obvious, since VVID == note pitch. It's not that 
easy with our protocol, and I don't think it's a good idea to turn a 
vital feature like this into something that synths will have to 
implement through arbitrary hacks, based on the PITCH control. (Hacks 
that may not work at all, unless the synth is aware of which scale 
you're using, BTW.)


> > > No. It means I want the sound on this voice to stop. It implies
> > > the above, too. After a VOICE_OFF, no more events will be sent
> > > for this VVID.
> >
> > That just won't work. You don't want continous pitch and stuff to
> > work except when the note is on?
>
> More or less, yes!  If you want sound, you should tell the synth
> that by allocating a VVID for it, and truning it on.

And when you enter the release phase? If have yet to see a MIDI synth 
where voices stop responding to pitch bend and other controls after 
NoteOff, and although we're talking about *voice* controls here, I 
think the same logic applies entirely.

Synths *have* to be able to receive control changes for as long as a 
voice could possibly be producing sound, or there is a serious 
usability issue.


[...]
> > Another example that demonstrates why this distinction matters
> > would be a polyphonic synth with automatic glisando. (Something
> > you can
> >
> > Starting a new note on a VVID when a previous note is still in
> > the release phase would cause a glisando, while if the VVID has
> > no playing voice, one would be activated and started as needed to
> > play a new note. The sender can't reliably know which action will
> > be taken for each new note, so it really *has* to be left to the
> > synth to decide. And for this, the lifetime of VVIDs/contexts
> > need to span zero or more notes, with no upper limit.
>
> I don't follow you at all - a new note is a new note.

Sure - but where does it belong, logically? The controller or user 
might now, but the synth generally doesn't. I'm just suggesting that 
senders be able to provide useful information when it's there.


>  If your
> instrument has a glissando control, use it.  It does the right
> thing.

How? It's obvious for monophonic synths, but then, so many other 
things are. Polyphonic synths are more complicated, and I'm rather 
certain that the player and/or controller knows better which not 
should slide to which when you switch from one chord to another. 
Anything else will result in "random glisandos in all directions", 
since the synth just doesn't have enough information.


> Each new note gets a new VVID.
>
> Reusing a VVID seems insane to me.  It just doesn't jive with
> anything I can comprehend as approaching reality.

MIDI sequencers are reusing "IDs" all the time, since they just don't 
have a choice, the way the MIDI protocol is designed. Now, all of a 
sudden, this should no longer be *possible*, at all...?

Either way, considering the polyphonic glisando example, VVIDs 
provide a dimension of addressing that is not available in MIDI, and 
that seems generally useful. Why throw it away for no technical (or 
IMHO, logical) reason?


> > > The reason
> > > that VVID_ALLOC is needed at voice_start is because the host
> > > might never have sent a VOICE_OFF. Or maybe we can make it
> > > simpler:
> >
> > If the host/sender doesn't sent VOICE_OFF when needed, it's
> > broken, just like a MIDI sequencer that forgets to stop playing
> > notes when you hit the stop button.
>
> Stop button is different than not sending a note-off.  Stop should
> automatically send a note-off to any VVIDs.  Or perhaps more
> accurately, it should send a stop-all sound event.

Whatever. A sender should still never leave hanging notes, whatever 
it's doing, or whatever protocol is used.

Anyway, the real point is that you may need to talk to the voice 
during the release phase; not just until you decide to switch to the 
release phase.


[...]
> I'm proposing a very simple model for VVID and voice management. 
> One that I think is easy to understand, explain, document, and
> implement.

Sure, that's the goal here.


> It jives with reality and with what users of
> soft-studios expect.

I disagree. I think most people expect controls to respond during the 
full duration of a note; not just during an arbitrary part of it. 
This is the way MIDI CCs work, and I would think most MIDI synths 
handle Poly Pressure that way as well, even though most controllers 
cannot generate PP events after a note is stopped, for mechanical 
reasons. (It's a bit hard to press a key after releasing it, without 
causing a new NoteOn. :-)


> Every active voice is represented by one VVID and vice-versa.
> There are two lifecycles for a voice.

I don't see a good reason to special-case this - and also, there is 
no reason to do so as long as a VVID remains valid for as long as you 
*need* it, rather than until the "formal end of the note".


>   1) The piano-rolled note:
>     a) host sends a VOICE(vvid, VOICE_ON) event
>        - synth allocates a voice (real or virtual) or fails
>        - synth begins processing the voice
>     b) time elapses as per the sequencer
>        - host may send multiple voice events for 'vvid'
>     c) host sends a VOICE(vvid, VOICE_OFF)
>        - synth puts voice in release phase and detaches from 'vvid'
>        - host will not send any more events for 'vvid'
>        - host may now re-use 'vvid'

c) is what I have a problem with here. Why is the VOICE control 
becoming so special again, implying destructive things about the VVID 
passed with it and stuff?


>   2) The step-sequenced note:
>     a) host sends a VOICE(vvid, VOICE_ON) event
>        - synth allocates a voice (real or virtual) or fails
>        - synth begins processing the voice
>     b) host sends a VOICE(vvid, VOICE_DETACH) event
>        - synth handles the voice as normal, but detaches from
> 'vvid' - host will not send any more events for 'vvid'
>        - host may now re-use 'vvid'

This completely eliminates the use of voice controls together with 
step sequencers. What's the logic reasoning behind that?

My other major problem with this is that it makes step sequencers and 
their synths a special class, in that they're using a different 
protocol for "voice off". It *might* still make sense to require that 
all synths implement a special "unarticulated note off", but I'm not 
sure... Sounds like a different discussion, in some way, and the 
scary part is that it's still a special case that means senders will 
have to treat synths differently.

Given that I generally program my drum kits to respond to NoteOff 
anyway, I'm not very motivated to accept step sequencers as something 
special enough to motivate special cases in the API, but I can see 
why it would be handy for step sequencers not having to worry about 
note durations. (The "zero duration note" hack typically used by MIDI 
sequencers won't work if the synths/patches use "voice off" to stop 
sounds quicker than normal.)


> These are very straight forward and handle all cases I can think
> up.  The actual voice allocation is left to the synth.  A
> mono-synth will always use the same physical voice.  A poly-synth
> will normally allocate a voice from it's pool.  A poly-synth under
> voice pressure can either steal a real voice for 'vvid' (and swap
> out the old VVID to a virtual voice), or allocate a virtual voice
> for 'vvid', or fail altogether.  A sampler which is playing short
> notes (my favorite hihat example) can EOL a voice when the sample
> is done playing (and ignore further events for the VVID).
>
> It's cute.  I like it a lot.

It's just that it can't do things that people expect from every MIDI 
synth, just because VVID allocation is integrated with the VOICE 
control.


[...]
> > A synth is a state machine, and the events are just what provides
> > it with data and - directly or indirectly - triggers state
> > changes.
>
> And I am advocating that voice on/off state changes be EXPLICITLY
> handled via a VOICE control,

Sure, but how do you suggest we force this upon continous control 
synths, without breaking them entirely?


> as well as init and release-latched
> controls be EXPLICITLY handled.

Explicitly telling a synth how to do something that's basically for 
the synth author to decide seems a lot more confusing to me than just 
not assuming anything about it at all.


> Yeah, it makes for some extra events.  I think that the benefit of
> clarity in the model is worth it.  We can also optimize the extra
> events away in the case they are not needed.

But when are these extra events needed at all? I still don't see what 
information they bring, and what any synth would use it for.


> > As to 1, that's what we're really talking about here. When do you
> > start and stop tracking voice controls?
>
> And how do you identify control events that are intended to be
> init-latched from continuous events?

I'm not sure what you mean, exactly. On the protocol level, it's just 
a matter of having the values in place before the "trigger" condition 
occurs (normally "note on" or "note off" directly caused by the VOICE 
control).

Controllers and other senders will have to know which controls are 
init-latched and what triggers the latching of them. There's no way 
to avoid that, and it should be covered by the API. The VOICE control 
can be standardized, and then we can have hints for "voice on" 
latched controls and "voice off" latched controls.


> > Simple: When you get the first control for a "new" VVID, start
> > tracking. When you know there will be no more data for that VVID,
> > or that you just don't care anymore (voice and/or context
> > stolen), stop tracking.
>
> Exactly what I want, but I want it to be more explicit

Sure - that's why I'm sugesting explicit VVID allocation and 
detachment events.


> > * Context allocation:
> > 	// Prepare the synth to receive events for 'my_vvid'
> > 	send(ALLOC_VVID, my_vvid)
> > 	// (Control tracking starts here.)
>
> yes - only I am calling it voice allocation - the host is
> allocating a voice in the synth (real or not) and will eventually
> turn it on.  I'd bet 99.999% of the time the ALLOC_VVID and
> VOICE_ON are on the same timestamp.

Quite possible, unless it's legal to use VOICE as a continous 
control. If it isn't, continous control synth simply won't have a use 
for VOICE control input, but will rely entirely on the values of 
other controls.

Also, as soon as you want to indicate that a new note is to be played 
on the same string as a previous not, or directly take over and 
"slide" the voice of a previous note to a new note, you'll need a way 
of expressing that. I can't think of a more obvious way of doing that 
than just using the same VVID.


> > {
> > 	* Starting a note:
> > 		// Set up any latched controls here
> > 		send(CONTROL, <whatever>, my_vvid, <value>)
> > 		...
> > 		// (Synth updates control values.)
> >
> > 		// Start the note!
> > 		send(CONTROL, VOICE, my_vvid, 1)
> > 		// (Synth latches "on" controls and (re)starts
> > 		//  voice. If control tracking is not done by
> > 		//  real voices, this is when a real voice would
> > 		//  be allocated.)
>
> This jives EXACTLY ives with what I have been saying, though I
> characterized it as:
>
> VOICE_INIT(vvid)   -> synth gets a virtual voice, start init-latch
> window VOICE_SETs         -> init-latched events
> VOICE_ON(vvid)     -> synth (optionally) makes it a real voice (end
> init-window)

Well, then that conflict is resolved - provided synths are note 
*required* to take VOICE_ON if they care only about "real" controls. 
:-)


> > 	* Stopping a note:
> > 		send(CONTROL, <whatever>, my_vvid, <value>)
> > 		...
> > 		// (Synth updates control values.)
> >
> > 		// Stop the note!
> > 		send(CONTROL, VOICE, my_vvid, 0)
> > 		// (Synth latches "off" controls and enters the
> > 		//  release phase.)
>
> Except how does the synth know that the controls you send are meant
> to be release-latched?

It's hardcoded, or programmed into the patch, depending on synth 
implementation. I can't see how this could ever be something that the 
sender can decide at run time. There's no need to "tell" the synth 
something it already knows, and cannot change.

For example, your average synth will have a VELOCITY and a DAMPING 
(or something) control pair, corresponding to MIDI NoteOn and NoteOff 
velocity, respectively. You could basically set both right after 
allocating a voice/VVID, as the only requirement is that the right 
values are in place when they should be latched.


[...]
> > * Context deallocation:
> > 	// Tell the synth we won't talk any more about 'my_vvid'
> > 	send(DETACH_VVID, my_vvid)
> > 	// (Control tracking stops here.)
>
> THIS is what I disagree with.  I think VOICE_OFF implicitly does
> this.  What does it mean to send controls after a voice is stopped?

It means the voice doesn't hang at a fixed pitch, with the filter 
wide open and whatnot, just because you decided to start the release 
cycle.


>  The ONLY things I can see this for are mono-synths (who can purely
> IGNORE vvid or flag themselves as non-VVID)

I've often found myself missing the advantages a monophonic patches 
have WRT note->note interaction, when using polyphonic patches. I 
actually think the ability to eliminate this distinction is a major 
feature that comes with VVIDs. If you can reuse VVIDs, a poly synth 
effectively becomes N monosynths playing the same patch - if you want 
it to. If not, just reassign a VVID for each new note.


> and MIDI where you want
> one VVID for each note (so send a VOICE_OFF before you alloc the
> VVID again).

That's just a shortcut, and not really a motivation to be able to 
reuse VVIDs.


> > This still contains a logic flaw, though. Continous control
> > synths won't necessarily trigger on the VOICE control changes.
> > Does it make sense to assume that they'll latch latched controls
> > at VOICE control changes anyway? It seems illogical to me, but I
> > can see why it might seem to make sense in some cases...
>
> It makes *enough* sense that the consistency pays off, IM(ns)HO.

Yes, and more importantly, this simplifies the handling of latched 
voice controls quite a bit.

Further, is there *really* any sense in using latched controls with 
continous control synths? Considering that such controls are usually 
for velocity mapping and the like, the cases where it would be of any 
use at all in a continous control synth are probably very few, if 
there are any at all.

That is, continous control synths can just ignore the VOICE control, 
and everything will just work as expected anyway. (Just connect your 
VOICE output to the VELOCITY input of the synth, and it'll play, at 
least.)


> Welcome back!  As I indicated, I am moving this week, so my
> response times may be laggy.  I am also trying to shape up some
> (admittedly SIMPLE) docs on the few subjects we've reached
> agreement on so far.

Yeah, I "heard" - I'm looking at your post right now. (No risk I'm 
going to comment on it or anything! ;-)


//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`---------------------------> http://olofson.net/audiality -'
   --- http://olofson.net --- http://www.reologica.se ---



More information about the Linux-audio-dev mailing list