All - here are some early scribbles of a XAP spec. I took the doc that I
wrote as an overview and started formalizing it. Please read it over and
pooint out the places where it is missing stuff? The Table of Contents
needs a lot more meat. Once I have that, I can write a lot or a little on
each subject.
Again - EARLY SCRIBBLES :) That said, the ToC, content, organization,
structure, and spelling are all up for abuse.
Have at it
The XAP Audio Plugin API
Specification
$Id: xap_overview.txt,v 1.1 2003/01/15 10:55:50 thockin Exp $
0.0 Meta
0.1 Guilty Parties
0.2 Goals
0.3 Terminology
0.4 Conventions
1.0 Overview (introduce ideas, no details)
1.1 Controls
1.2 Events
1.3 Channels
1.4 Ports
2.0 The Descriptor
2.1 Meta-Data
2.2 State Changes
2.2.1 Create
2.2.2 Destroy
2.2.3 Activate
2.2.4 Deactivate
2.3 Channels
2.4 Port Setup
2.5 Controls and EventQueues
2.6 Processing Audio
2.7 Errorcodes
Host struct
- Threading
Control struct
- RT
- we have SYS controls
Port struct
Events
- ramping
-EVQ and Events
Channels
Ports
Instruments and Voices
We have Voice Controls
Special controls
- sys controls
- MIDI controls
Tempo and Meter
Sequencer Controls
Macros
0.1 Guilty Parties
XAP is a combined effort of many people on the linux-audio-dev
(linux-audio-dev(a)music.columbia.edu) mailing list. The discussion is
open, and anyone interested is welcome to join in.
0.2 Goals
The main goal of this project is to provide an API that is full-featured
enough to be the primary plugin system for audio creation and playback
applications, while remaining as simple, lightweight, and self-contained
as possible. The focus of this API is flexibility tempered with
simplicity.
0.3 Terminology
In order to read this document without having your head spin, you should
probably understand the following terms, first.
* Plugin:
A chunk of code, loaded or not, that implements this API (e.g. a .so
file or a running instance).
* Host
The program responsible for loading and controlling Plugins.
* Instrument/Source:
An instance of a Plugin that supports the instrument API and is used
to generate audio signals. Many Instruments will implement audio
output but not input, though they may support both and be used an an
Effect, too.
* Effect:
An instance of a Plugin that supports both audio input and output.
* Output/Sink:
An instance of a Plugin that can act as a terminator for a chain of
Plugins. Many Outputs will will support audio input but not output,
though they may support both and be used as an Effect, too.
* Voice:
A playing sound within an Instrument. Instruments may have multiple
Voices, or only one Voice. A Voice may be silent but still active.
* Event:
A time-stamped notification of some change of something.
* Control:
A knob, button, slider, or virtual thing that modifies behavior of
the Plugin. Controls can be master (e.g. master volume),
per-Channel (e.g. channel pressure) or per-Voice (e.g. aftertouch).
* Port:
An audio input or output.
* EventQueue
A control input or output. Plugins may internally have as many
EventQueues as they deem necessary. The Host will ask the
Plugin for the EventQueue for each Control.
FIXME: what is the full list of things that have a queue?
Controls, Plugin(master), each Channel?
* VVID
Virtual Voice ID. Part of a system that allows sequencers and
the like to control synth Voices without having detailed
knowledge of how the synth manages Voices.
* Channel
A grouping of Controls and Ports, similar to MIDI Channels.
* Tick
The unit of musical time, used with the tempo and
meter interfaces. The unit is decided by the
maintainer of the timeline, in order to keep musical
time calculations exact as far as possible.
* Cue Point
A virtual marker on the musical time line, marking a
position to which the plugin should be able to jump
at any time, without delay. This is used by hard disk
recorders, and other plugins that may need to perform
time consuming and/or nondeterministic processing as
a result of timeline jumps.
0.4 Conventions
//FIXME: datatypes (XAP_foo) and documentation style
1.0 Overview
XAP Plugins live in shared object files. A shared object file holds
one or more plugin descriptors, accessed by index. Each descriptor holds
all the information about a single plugin - it's identification,
meta-data, capabilities, controls, and access methods. The Plugin
descriptors are retrieved by an exported function in each shared object.
XAP plugins are always in one of two states - ACTIVE or IDLE. IDLE
Plugins are not capable of processing audio, and must be activated.
Plugins will spend most of their time in the ACTIVE state. After loading
Plugins, the Host instantiates them, establishes Port and EventQueue
connections, and activates them. Once ACTIVE, a Plugin expects to be run
repeatedly on small blocks of data. When audio processing is done, the
Host can deactivate the Plugin.
XAP is designed to be used in realtime scenarios. XAP plugins specify
their realtime capabilities, and Hosts can allow or disallow operations
based on that information.
All XAP audio data is processed in 32-bit floating point form. Values are
normalized between -1.0 and 1.0, with 0.0 being silence.
1.1 Controls
XAP uses the idea of Controls as abstract carriers of Plugin parameters
and other information. Controls can represent things like knobs and
buttons, but they can also represent things like filenames, MIDI
aftertouch or channel pressure. Not only do they represent audio
parameters, but they can represent chunks of system information, such as
tempo or transport position.
Like audio hardware, knobs and other controls can be global to a Plugin.
However, XAP also allows Instrument Plugins to provide per-Voice controls.
Controls come in a few datatype flavors, and can have min/max limits and
default values, as well as hints to the host about what they are,
semantically. Hosts can use the hints to automatically connect things,
where appropriate.
Controls get their data via Events. Events can immediately set the value
of a Control, or they can establish a ramp - a target and duration for the
Plugin to change the control more smoothly.
1.2 Events
Almost everything that happens during the ACTIVE state is communicated via
Events. The Host can send Events to Plugins, Plugins can send Events to
the Host, and Plugins can send Events to other Plugins (if they are so
connected).
All Events are timestamped. That means that any Control change, or any
other Event is sample-accurate. XAP hosts have a running timer which
counts sample-frames - this is what the timestamp is based on.
Events are passed to Plugins on EventQueues. This allows a Plugin to
receive any number of Events with a minimal per-Event overhead.
1.3 Channels
//FIXME:
1.4 Ports
This is an audio API, so it wouldn't be complete without some mechanism to
transport audio Data. This is a Port. Each Port carries a single stream
of mono audio data.
Plugins may allow the Host to disable Ports. If a Port is disabled, the
Plugin will not read from or write to it. If a Port is not disabled, it
must be connected by the host to a valid buffer.
2.0 The Plugin Descriptor
The Plugin descriptor is a static(*) data structure provided by the Plugin
to describe what it can do. Descriptors are retrieved by calling the
xap_descriptor() function of a XAP shared object. This function is called
repeatedly with an index parameter, starting at 0 and incrementing by one
on each call. The function returns the Plugin descriptor for each index,
up to the number of Plugins in the shared object file, at which time it
returns NULL. Plugin indices are always sequential, and once a NULL is
returned, the host can assume there are no more Plugin descriptors to be
queried.
(*) The descriptor for wrapper Plugins may change upon loading of a
wrapped Plugin. This is a special case, and the Host must be aware of it.
//FIXME: how?
2.1 Meta-Data
The Plugin descriptor provides several fields for Plugin meta-data. The
fields are as follows:
id_code: the vendor and product encoding
api_code: the XAP API version code of this Plugin
ver_code: the Plugin version code - used to identify Plugin data
flags: Plugin-global flags
label: a short, unique Plugin identifier
name: the user-friendly Plugin name string
version: the version string
author: the author string
copyright: the copyright string
license: the license string
url: the URL string
notes: notes about the Plugin
2.2 State Changes
As mentioned above, Plugins exist in one of two states - ACTIVE and
IDLE (well, three if you count non-existance as a state). The Plugin
descriptor holds the methods to create and destroy instances of Plugins,
and to change their state.
2.2.1 Create
A Plugin is instantiated via it's descriptor's create() method. This
method receives two key pieces of information, which the Plugin will use
throughout it's lifetime. The first is a pointer to the Host structure
(see below), and the second is the Host sample rate. If the Host wants to
change either of these, all Plugins must be re-created. This is where the
Plugin's internal structures can be allocated and initialized. There is
no required set of supported sample rates, but Plugins should support the
common sample rates (44100, 48000, 96000) to be generally useful. If the
Plugin does not support the specified sample rate, this method should
fail. Hosts should always check that all Plugins support the desired
sample rate.
Once created, the Plugin instance is in the IDLE state.
2.2.2 Destroy
Plugins are destructed via the descriptor's destroy() method. Plugins
can only be destroyed when in the IDLE state. All Plugin-allocated
resources must be released during this method. After this method is
invoked, the Plugin handle is no longer valid. This function can not
fail.
2.2.3 Activate
From the IDLE state, a Plugin can be changed to the ACTIVE state via the
activate() method. Passed to this method are two arguments which are
valid for the duration of the ACTIVE state - quality level and
realtime state. The quality level is an integer between 1 and 10, with 1
being lowest quality (fastest) and 10 being highest quality (slowest).
Plugins may ignore this value, or may provide less than 10 discrete
quality levels. The realtime state is a boolean value which is boolean
TRUE if the Plugin is in a realtime processing net or boolean FALSE if it
is not realtime (offline).
This method can only be called from the IDLE state. Once ACTIVE, a Plugin
may process audio.
2.2.4 Deactivate
From the ACTIVE state, a Plugin can be changed to the IDLE state via the
deactivate() method. This method can only be called from the ACTIVE
state.
2.3 Channels
//FIXME:
2.4 Port Setup
//FIXME: kinda needs Channels
Once a Plugin is loaded, the Host must connect the audio Ports. All Ports
in a Plugin must be connected or disabled. The Plugin descriptor provides
a connect_port() method, which the host must call to connect a buffer
pointer to a Port. Once connected, a Port remains connected to the
specified buffer until the Host disables it or connects it to a different
buffer. All Plugins are assumed to not use the same input buffers as
output buffers, unless the Plugin flags indicate that it safely handles
in-place processing.
Plugins may allow the Host to disable Ports, rather than connect them.
The Plugin descriptor provides an optional disable_port() method. If this
method is provided, and it returns successfully, the Host can ignore this
port. Once disabled, a Port remains disabled until the Host connects it,
at which point it becomes enabled.
2.5 Controls and EventQueues
//FIXME: kinda needs Channels
In order to be of any use, a Plugin must provide Controls. Control
changes are delivered via Events. Events are passed on EventQueues.
Each Control has an associated EventQueue, on which Events for that
Control are delivered. In addition, there is an EventQueue for each
Channel and a master Queue for the Plugin. The Plugin can internally use
the same EventQueue for multiple targets.
The Host queries the Plugin for EventQueues via the get_input_queue()
method. In order to allow sharing of an EventQueue, the get_input_queue()
method also returns a cookie, which is stored in each Event as it is
delivered. This allows the plugin to use a simpler EventQueue scheme
internally, while still being able to sort incoming Events.
Controls may also output Events. The Host will set up output Controls
with the set_output_queue() method.
2.6 Processing Audio
A Plugin which has been activated and properly set up, may be called upon
to process audio. This is done through the descriptor's run() method.
This method gets the current sample-frame timestamp as an argument. In
this method the Plugin is expected to examine and handle all new Events,
and to read from or write to it's Ports.
This method may only be called from the ACTIVE state.
2.7 Errorcodes
All this stuff needs to be integrated somewhere...
Tempo and Meter
----
XAP uses Controls for transmitting tempo and meter information. If a
Plugin defines a TEMPO control, it can expect to receive tempo Events on
that control. The Host must define some unit of musical-time measurement (Tick),
which represents the smallest granularity the host wants to work with.
This is the basis for tempo and meter. The host publishes the current count of
Ticks/Beat via the host struct.
Control: TEMPO
Type: double
Units: ticks/sec
Range: [-inf, inf]
Events: Hosts must send a TEMPO Event at Plugin init and when tempo
changes.
Control: METER
Type: double
Units: ticks/measure
Range: [0.0, inf]
Events: Hosts must send a METER Event at Plugin init and when meter
changes. Hosts should send a METER Event periodically, such as
every measure or once per second.
Control: METERBASE
Type: double
Units: beats/whole-note
Range: [1.0, inf]
Events: Hosts must send a METERBASE Event at Plugin init and when meter
changes.
This mechanism gives Plugins the ability to be aware of tempo and meter
changes, without forcing information into plugins that don't care. A
Plugin can sync to various timeline events, easily.
The Host struct also provides a mechanism to query the timestamp of the
next Beat or Bar, and to convert timestamps into the following time formats:
* Ticks
* Seconds
* SMPTE frames
Sequencer Control
----
XAP plugins may be aware of certain sequencer events, such as transport
changes, positional jumps., and loop-points. These data are received on
Controls.
Control: POSITION
Type: double
Units: ticks
Range: [0.0, inf]
Events: Hosts must send a POSITION Event at Plugin init, when transport
starts, and when transport jumps. Hosts should send a POSITION
Event periodically, such as every beat, every measure or once per
second.
Control: TRANSPORT
Type: bool
Units: on/off
Events: Hosts must send a TRANSPORT Event at Plugin init and when
transport state changes.
Control: CUEPOINT
Type: double
Units: ticks
Range: [0.0, inf]
Events: Hosts must send a CUEPOINT Event when Cuepoints are added,
changed, or removed.
Control: SPEED
Type: double
Units: scalar
Range: [-inf, inf]
Events: Hosts must send a SPEED Event at Plugin init and when play speed
changes.
Instruments and Voices
----
XAP instruments can be either voiced or non-voiced. Non-voiced
instruments are essentially always on, and their output is controlled
purely by controls, such as the gate control of a modular synth or the
hand-distance of a theremin. Non-voiced instruments are monophonic
per-channel.
Voiced controls are more structured. They must handle Virtual Voice IDs
(VVIDs). VVIDs are unsigned 32 bit integers, which are allocated by the
host and passed to instruments via the XAP_EV_VVID_* events. Instruments
may use VVIDs as a direct index into the host structure's VVID table,
which is an array of unsigned 32 bit integers. The data in the host VVID
table is for use by the instrument.
VVIDs must be allocated before use and de-allocated before re-use.
Instruments can maintain an internal mapping between VVIDs and actual
voices, which allows them to handle voice allocation in a purely
abstracted manner from the host. Once allocated, the host can send events
for a VVID until the VVID is de-allocated.
A VVID has two states - active and inactive. After allocation, the VVID
is inactive. Control events received while inactive can be assumed to set
the control state for the activation event. A VVID is activated via the
VOICE control, which all instruments must provide. Once in the active
state, the instrument may produce sound for the voice. A VVID is
deactivated via the VOICE control, which puts the VVID in the inactive
state. It is important to note that just because a VVID has received a
VOICE OFF event, it is not necessarily silent. It may have a long release
phase, which is dependant on the instrument.
Control: VOICE
Type: bool
Units: on/off
Events: Hosts send VOICE Events any time after a VVID has been allocated
but before it has been deallocated.
Some instruments will choose to only examine some controls at activation
time (such as velocity for MIDI-like instruments) or at deactivation time
(such as release velocity). These controls are referred to as latched.
Setting an init-latched control after activation may or may not set the
value, and may or may not have any effect - this is instrument dependant.
A VVID may be re-used. That is to say, the host can leave a VVID
allocated after deactivation, and re-activate it later. This can be used
for emulation of MIDI synths, where the VVID is akin to the MIDI pitch.
A VVID may be deallocated before the associated voice is done playing.
The instrument should continue to play the voice, as no VOICE OFF event
has been received. Likewise, a voice may end before it has received a
VOICE OFF (such as a drum hit). The VVID can change to the inactive state
and track control changes until another VOICE ON is received.
Because VVIDs and voices have no exact correlation, instrument plugins
have a great deal of control over their voice operations. An Instrument
can be mono, and cut the voice and restart for each new VOICE ON, or it
man be massively polyphonic. The host always alerts the plugin to the
state of a VVID, and to changes of that state. Once a VVID has been
deallocated, it can be re-used.
Plugins can assume that VVIDs are at least global per-Plugin. The host
will not activate the same VVID in different Channels at the same time.