Re: [linux-audio-dev] Catching up with XAP

15 Jan 2003

On Wednesday 15 January 2003 10.42, Tim Hockin wrote:
...
   [Lost touch
with the list, so I'm trying to catch up here... I
 did notice that gardena.net is gone - but I forgot that I was
 using david(a)gardena.net for this list! *heh*] 
 Woops!  Welcome back! 
Well, thanks. :-)
[...]
...
   The easiest
way is to just make one event the "trigger", but I'm
 not sure it's the right thing to do. What if you have more than
 one control of this sort, and the "trigger" is actually a product
 of both? Maybe just assume that synths will use the standardized 
 The trigger is a virtual control which really just says whether the
 voice is on or not.  You set up all your init-latched controls in
 the init window, THEN you set the voice on.
 It is conceptually simple, similar to what people know and it fits
 well enough.  And I can't find any problems with it technically. 
The only problem I have with it is that it's completely irrelevant to
continous control synths - but they can just ignore it, or not have
the control at all.
...
    And the NOTE/VOICE starter is a voice-control, so any
 Instrument MUST have that. 
 This is very "anti modular synth". NOTE/VOICE/GATE is a control
 type hint. I see no reason to imply that it can only be used for
 a certain kind of controls, since it's really just a "name" used
 by users and/or hosts to match ins and outs. 
 This is not at all what I see as intuitive.  VOICE is a separate
 control used ONLY for voice control.  Instruments have it.  Effects
 do not. 
There's this distinct FX vs instrument separation again. What is the
actual motivation for enforcing that these are kept totally separate?
I don't see the separation as very intuitive at all. The only
differences are that voices are (sort of) dynamically allocated, and
that they have an extra dimension of addressing - and that applies
*only* to polyphonic synths. For mono synths, a Channel is equivalent
to a Voice for all practical matters.
...
   About VVID
management:
        Since mono synths won't need VVIDs, host shouldn't have to
        allocate any for them. (That would be a waste of resources.)
        The last case also indicates a handy shortcut you can take
        if you *know* that VVIDs won't be considered. Thus, I'd
        suggest that plugins can indicate that they won't use VVIDs. 
 This is a possible optimization.  I'll add it to my notes.  It may
 really not be worth it at all. 
It's also totally optional. If you don't care to check the hint, just
always use real VVIDs with Voice Controls, and never connect Channel
Control outs to Voice Control ins, and everything will work fine.
[...]
...
   What might be
confusing things is that I don's consider "voice"
 and "context" equivalent - and VVIDs refer to *contexts* rather
 than voices. There will generally be either zero or one voice
 connected to a context, but the same context may be used to play
 several notes. 
 I disagree - a VVID refers to a voice at some point in time.  A
 context can not be re-used.  Once a voice is stopped and the
 release has ended, that VVID has expired. 
Why? Is there a good reason why a synth must not be allowed to
function like the good old SID envelope generator, which can be
switched on and off as desired?
Also, remember that there is nothing binding two notes at the same
pitch together with our protocol, since (unlike MIDI) VVID != pitch.
This means that a synth cannot reliably handle a new note starting
before the release phase of a previous pitch has ended. It'll just
have to allocate a new voice, completely independent of the old
voice, and that's generally *not* what you want if you're trying to
emulate real instruments.
For example, if you're playing the piano with the sustain pedal down,
hitting the same key repeatedly doesn't really add new strings for
that note, does it...?
With MIDI, this is obvious, since VVID == note pitch. It's not that
easy with our protocol, and I don't think it's a good idea to turn a
vital feature like this into something that synths will have to
implement through arbitrary hacks, based on the PITCH control. (Hacks
that may not work at all, unless the synth is aware of which scale
you're using, BTW.)
...
    No. It means I want the sound on this voice to stop.
It implies
 the above, too. After a VOICE_OFF, no more events will be sent
 for this VVID. 
 That just won't work. You don't want continous pitch and stuff to
 work except when the note is on? 
 More or less, yes!  If you want sound, you should tell the synth
 that by allocating a VVID for it, and truning it on. 
And when you enter the release phase? If have yet to see a MIDI synth
where voices stop responding to pitch bend and other controls after
NoteOff, and although we're talking about *voice* controls here, I
think the same logic applies entirely.
Synths *have* to be able to receive control changes for as long as a
voice could possibly be producing sound, or there is a serious
usability issue.
[...]
...
   Another
example that demonstrates why this distinction matters
 would be a polyphonic synth with automatic glisando. (Something
 you can
 Starting a new note on a VVID when a previous note is still in
 the release phase would cause a glisando, while if the VVID has
 no playing voice, one would be activated and started as needed to
 play a new note. The sender can't reliably know which action will
 be taken for each new note, so it really *has* to be left to the
 synth to decide. And for this, the lifetime of VVIDs/contexts
 need to span zero or more notes, with no upper limit. 
 I don't follow you at all - a new note is a new note. 
Sure - but where does it belong, logically? The controller or user
might now, but the synth generally doesn't. I'm just suggesting that
senders be able to provide useful information when it's there.
...
   If your
 instrument has a glissando control, use it.  It does the right
 thing. 
How? It's obvious for monophonic synths, but then, so many other
things are. Polyphonic synths are more complicated, and I'm rather
certain that the player and/or controller knows better which not
should slide to which when you switch from one chord to another.
Anything else will result in "random glisandos in all directions",
since the synth just doesn't have enough information.
...
  Each new note gets a new VVID.
 Reusing a VVID seems insane to me.  It just doesn't jive with
 anything I can comprehend as approaching reality. 
MIDI sequencers are reusing "IDs" all the time, since they just don't
have a choice, the way the MIDI protocol is designed. Now, all of a
sudden, this should no longer be *possible*, at all...?
Either way, considering the polyphonic glisando example, VVIDs
provide a dimension of addressing that is not available in MIDI, and
that seems generally useful. Why throw it away for no technical (or
IMHO, logical) reason?
...
    The reason
 that VVID_ALLOC is needed at voice_start is because the host
 might never have sent a VOICE_OFF. Or maybe we can make it
 simpler: 
 If the host/sender doesn't sent VOICE_OFF when needed, it's
 broken, just like a MIDI sequencer that forgets to stop playing
 notes when you hit the stop button. 
 Stop button is different than not sending a note-off.  Stop should
 automatically send a note-off to any VVIDs.  Or perhaps more
 accurately, it should send a stop-all sound event. 
Whatever. A sender should still never leave hanging notes, whatever
it's doing, or whatever protocol is used.
Anyway, the real point is that you may need to talk to the voice
during the release phase; not just until you decide to switch to the
release phase.
[...]
...
  I'm proposing a very simple model for VVID and
voice management.
 One that I think is easy to understand, explain, document, and
 implement. 
Sure, that's the goal here.
...
  It jives with reality and with what users of
 soft-studios expect. 
I disagree. I think most people expect controls to respond during the
full duration of a note; not just during an arbitrary part of it.
This is the way MIDI CCs work, and I would think most MIDI synths
handle Poly Pressure that way as well, even though most controllers
cannot generate PP events after a note is stopped, for mechanical
reasons. (It's a bit hard to press a key after releasing it, without
causing a new NoteOn. :-)
...
  Every active voice is represented by one VVID and
vice-versa.
 There are two lifecycles for a voice. 
I don't see a good reason to special-case this - and also, there is
no reason to do so as long as a VVID remains valid for as long as you
*need* it, rather than until the "formal end of the note".
...
    1) The piano-rolled note:
     a) host sends a VOICE(vvid, VOICE_ON) event
        - synth allocates a voice (real or virtual) or fails
        - synth begins processing the voice
     b) time elapses as per the sequencer
        - host may send multiple voice events for 'vvid'
     c) host sends a VOICE(vvid, VOICE_OFF)
        - synth puts voice in release phase and detaches from 'vvid'
        - host will not send any more events for 'vvid'
        - host may now re-use 'vvid' 
c) is what I have a problem with here. Why is the VOICE control
becoming so special again, implying destructive things about the VVID
passed with it and stuff?
...
    2) The step-sequenced note:
     a) host sends a VOICE(vvid, VOICE_ON) event
        - synth allocates a voice (real or virtual) or fails
        - synth begins processing the voice
     b) host sends a VOICE(vvid, VOICE_DETACH) event
        - synth handles the voice as normal, but detaches from
 'vvid' - host will not send any more events for 'vvid'
        - host may now re-use 'vvid' 
This completely eliminates the use of voice controls together with
step sequencers. What's the logic reasoning behind that?
My other major problem with this is that it makes step sequencers and
their synths a special class, in that they're using a different
protocol for "voice off". It *might* still make sense to require that
all synths implement a special "unarticulated note off", but I'm not
sure... Sounds like a different discussion, in some way, and the
scary part is that it's still a special case that means senders will
have to treat synths differently.
Given that I generally program my drum kits to respond to NoteOff
anyway, I'm not very motivated to accept step sequencers as something
special enough to motivate special cases in the API, but I can see
why it would be handy for step sequencers not having to worry about
note durations. (The "zero duration note" hack typically used by MIDI
sequencers won't work if the synths/patches use "voice off" to stop
sounds quicker than normal.)
...
  These are very straight forward and handle all cases I
can think
 up.  The actual voice allocation is left to the synth.  A
 mono-synth will always use the same physical voice.  A poly-synth
 will normally allocate a voice from it's pool.  A poly-synth under
 voice pressure can either steal a real voice for 'vvid' (and swap
 out the old VVID to a virtual voice), or allocate a virtual voice
 for 'vvid', or fail altogether.  A sampler which is playing short
 notes (my favorite hihat example) can EOL a voice when the sample
 is done playing (and ignore further events for the VVID).
 It's cute.  I like it a lot. 
It's just that it can't do things that people expect from every MIDI
synth, just because VVID allocation is integrated with the VOICE
control.
[...]
...
   A synth is a
state machine, and the events are just what provides
 it with data and - directly or indirectly - triggers state
 changes. 
 And I am advocating that voice on/off state changes be EXPLICITLY
 handled via a VOICE control, 
Sure, but how do you suggest we force this upon continous control
synths, without breaking them entirely?
...
  as well as init and release-latched
 controls be EXPLICITLY handled. 
Explicitly telling a synth how to do something that's basically for
the synth author to decide seems a lot more confusing to me than just
not assuming anything about it at all.
...
  Yeah, it makes for some extra events.  I think that
the benefit of
 clarity in the model is worth it.  We can also optimize the extra
 events away in the case they are not needed. 
But when are these extra events needed at all? I still don't see what
information they bring, and what any synth would use it for.
...
   As to 1,
that's what we're really talking about here. When do you
 start and stop tracking voice controls? 
 And how do you identify control events that are intended to be
 init-latched from continuous events? 
I'm not sure what you mean, exactly. On the protocol level, it's just
a matter of having the values in place before the "trigger" condition
occurs (normally "note on" or "note off" directly caused by the VOICE
control).
Controllers and other senders will have to know which controls are
init-latched and what triggers the latching of them. There's no way
to avoid that, and it should be covered by the API. The VOICE control
can be standardized, and then we can have hints for "voice on"
latched controls and "voice off" latched controls.
...
   Simple: When
you get the first control for a "new" VVID, start
 tracking. When you know there will be no more data for that VVID,
 or that you just don't care anymore (voice and/or context
 stolen), stop tracking. 
 Exactly what I want, but I want it to be more explicit 
Sure - that's why I'm sugesting explicit VVID allocation and
detachment events.
...
   * Context
allocation:
        // Prepare the synth to receive events for 'my_vvid'
        send(ALLOC_VVID, my_vvid)
        // (Control tracking starts here.) 
 yes - only I am calling it voice allocation - the host is
 allocating a voice in the synth (real or not) and will eventually
 turn it on.  I'd bet 99.999% of the time the ALLOC_VVID and
 VOICE_ON are on the same timestamp. 
Quite possible, unless it's legal to use VOICE as a continous
control. If it isn't, continous control synth simply won't have a use
for VOICE control input, but will rely entirely on the values of
other controls.
Also, as soon as you want to indicate that a new note is to be played
on the same string as a previous not, or directly take over and
"slide" the voice of a previous note to a new note, you'll need a way
of expressing that. I can't think of a more obvious way of doing that
than just using the same VVID.
...
   {
        * Starting a note:
                // Set up any latched controls here
                send(CONTROL, <whatever>, my_vvid, <value>)
                ...
                // (Synth updates control values.)
                // Start the note!
                send(CONTROL, VOICE, my_vvid, 1)
                // (Synth latches "on" controls and (re)starts
                //  voice. If control tracking is not done by
                //  real voices, this is when a real voice would
                //  be allocated.) 
 This jives EXACTLY ives with what I have been saying, though I
 characterized it as:
 VOICE_INIT(vvid)   -> synth gets a virtual voice, start init-latch
 window VOICE_SETs         -> init-latched events
 VOICE_ON(vvid)     -> synth (optionally) makes it a real voice (end
 init-window) 
Well, then that conflict is resolved - provided synths are note
*required* to take VOICE_ON if they care only about "real" controls.
:-)
...
           *
Stopping a note:
                send(CONTROL, <whatever>, my_vvid, <value>)
                ...
                // (Synth updates control values.)
                // Stop the note!
                send(CONTROL, VOICE, my_vvid, 0)
                // (Synth latches "off" controls and enters the
                //  release phase.) 
 Except how does the synth know that the controls you send are meant
 to be release-latched? 
It's hardcoded, or programmed into the patch, depending on synth
implementation. I can't see how this could ever be something that the
sender can decide at run time. There's no need to "tell" the synth
something it already knows, and cannot change.
For example, your average synth will have a VELOCITY and a DAMPING
(or something) control pair, corresponding to MIDI NoteOn and NoteOff
velocity, respectively. You could basically set both right after
allocating a voice/VVID, as the only requirement is that the right
values are in place when they should be latched.
[...]
...
   * Context
deallocation:
        // Tell the synth we won't talk any more about 'my_vvid'
        send(DETACH_VVID, my_vvid)
        // (Control tracking stops here.) 
 THIS is what I disagree with.  I think VOICE_OFF implicitly does
 this.  What does it mean to send controls after a voice is stopped? 
It means the voice doesn't hang at a fixed pitch, with the filter
wide open and whatnot, just because you decided to start the release
cycle.
...
   The ONLY things I can see this for are mono-synths
(who can purely
 IGNORE vvid or flag themselves as non-VVID) 
I've often found myself missing the advantages a monophonic patches
have WRT note->note interaction, when using polyphonic patches. I
actually think the ability to eliminate this distinction is a major
feature that comes with VVIDs. If you can reuse VVIDs, a poly synth
effectively becomes N monosynths playing the same patch - if you want
it to. If not, just reassign a VVID for each new note.
...
  and MIDI where you want
 one VVID for each note (so send a VOICE_OFF before you alloc the
 VVID again). 
That's just a shortcut, and not really a motivation to be able to
reuse VVIDs.
...
   This still
contains a logic flaw, though. Continous control
 synths won't necessarily trigger on the VOICE control changes.
 Does it make sense to assume that they'll latch latched controls
 at VOICE control changes anyway? It seems illogical to me, but I
 can see why it might seem to make sense in some cases... 
 It makes *enough* sense that the consistency pays off, IM(ns)HO. 
Yes, and more importantly, this simplifies the handling of latched
voice controls quite a bit.
Further, is there *really* any sense in using latched controls with
continous control synths? Considering that such controls are usually
for velocity mapping and the like, the cases where it would be of any
use at all in a continous control synth are probably very few, if
there are any at all.
That is, continous control synths can just ignore the VOICE control,
and everything will just work as expected anyway. (Just connect your
VOICE output to the VELOCITY input of the synth, and it'll play, at
least.)
...
  Welcome back!  As I indicated, I am moving this week,
so my
 response times may be laggy.  I am also trying to shape up some
 (admittedly SIMPLE) docs on the few subjects we've reached
 agreement on so far. 
Yeah, I "heard" - I'm looking at your post right now. (No risk I'm
going to comment on it or anything! ;-)
//David Olofson - Programmer, Composer, Open Source Advocate
.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`---------------------------> http://olofson.net/audiality -'
   --- http://olofson.net --- http://www.reologica.se ---

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [linux-audio-dev] Catching up with XAP