[linux-audio-dev] XAP spec - early scribbles

Tim Hockin thockin at hockin.org
Tue Feb 4 23:09:00 UTC 2003


All - here are some early scribbles of a XAP spec.  I took the doc that I
wrote as an overview and started formalizing it.  Please read it over and
pooint out the places where it is missing stuff?  The Table of Contents
needs a lot more meat.  Once I have that, I can write a lot or a little on
each subject.

Again - EARLY SCRIBBLES :)  That said, the ToC, content, organization,
structure, and spelling are all up for abuse.

Have at it


The XAP Audio Plugin API
Specification
$Id: xap_overview.txt,v 1.1 2003/01/15 10:55:50 thockin Exp $

0.0 Meta
  0.1 Guilty Parties
  0.2 Goals
  0.3 Terminology
  0.4 Conventions

1.0 Overview (introduce ideas, no details)
  1.1 Controls
  1.2 Events
  1.3 Channels
  1.4 Ports

2.0 The Descriptor
  2.1 Meta-Data
  2.2 State Changes
    2.2.1 Create
    2.2.2 Destroy
    2.2.3 Activate
    2.2.4 Deactivate
  2.3 Channels
  2.4 Port Setup
  2.5 Controls and EventQueues
  2.6 Processing Audio
  2.7 Errorcodes

Host struct
  - Threading
Control struct
 - RT
 - we have SYS controls
Port struct
Events
 - ramping
 -EVQ and Events
Channels
Ports
Instruments and Voices
   We have Voice Controls
Special controls
 - sys controls
 - MIDI controls
Tempo and Meter
Sequencer Controls
Macros

0.1 Guilty Parties

  XAP is a combined effort of many people on the linux-audio-dev
  (linux-audio-dev at music.columbia.edu) mailing list.  The discussion is
  open, and anyone interested is welcome to join in.

0.2 Goals

   The main goal of this project is to provide an API that is full-featured
   enough to be the primary plugin system for audio creation and playback
   applications, while remaining as simple, lightweight, and self-contained
   as possible.  The focus of this API is flexibility tempered with
   simplicity.

0.3 Terminology

  In order to read this document without having your head spin, you should
  probably understand the following terms, first.

  * Plugin:
	A chunk of code, loaded or not, that implements this API (e.g. a .so
	file or a running instance).
  * Host
	The program responsible for loading and controlling Plugins.
  * Instrument/Source:
	An instance of a Plugin that supports the instrument API and is used
	to generate audio signals.  Many Instruments will implement audio
	output but not input, though they may support both and be used an an
	Effect, too.
  * Effect:
	An instance of a Plugin that supports both audio input and output.
  * Output/Sink:
	An instance of a Plugin that can act as a terminator for a chain of
	Plugins.  Many Outputs will will support audio input but not output,
	though they may support both and be used as an Effect, too.
  * Voice:
	A playing sound within an Instrument.  Instruments may have multiple
	Voices, or only one Voice.  A Voice may be silent but still active.
  * Event:
	A time-stamped notification of some change of something.
  * Control:
	A knob, button, slider, or virtual thing that modifies behavior of
	the Plugin.  Controls can be master (e.g. master volume),
	per-Channel (e.g. channel pressure) or per-Voice (e.g. aftertouch).
  * Port:
	An audio input or output.
  * EventQueue
	A control input or output. Plugins may internally have as many
	EventQueues as they deem necessary. The Host will ask the
	Plugin for the EventQueue for each Control.
	FIXME: what is the full list of things that have a queue?
	     Controls, Plugin(master), each Channel?

  * VVID
	Virtual Voice ID. Part of a system that allows sequencers and
	the like to control synth Voices without having detailed
	knowledge of how the synth manages Voices.
  * Channel
	A grouping of Controls and Ports, similar to MIDI Channels.
  * Tick
	The unit of musical time, used with the tempo and
	meter interfaces. The unit is decided by the
	maintainer of the timeline, in order to keep musical
	time calculations exact as far as possible.
  * Cue Point
	A virtual marker on the musical time line, marking a
	position to which the plugin should be able to jump
	at any time, without delay. This is used by hard disk
	recorders, and other plugins that may need to perform
	time consuming and/or nondeterministic processing as
	a result of timeline jumps.

0.4 Conventions

  //FIXME: datatypes (XAP_foo) and documentation style

1.0 Overview

  XAP Plugins live in shared object files.  A shared object file holds
  one or more plugin descriptors, accessed by index.  Each descriptor holds
  all the information about a single plugin - it's identification,
  meta-data, capabilities, controls, and access methods.  The Plugin
  descriptors are retrieved by an exported function in each shared object.

  XAP plugins are always in one of two states - ACTIVE or IDLE.  IDLE
  Plugins are not capable of processing audio, and must be activated.
  Plugins will spend most of their time in the ACTIVE state.  After loading
  Plugins, the Host instantiates them, establishes Port and EventQueue
  connections, and activates them.  Once ACTIVE, a Plugin expects to be run
  repeatedly on small blocks of data.  When audio processing is done, the
  Host can deactivate the Plugin.

  XAP is designed to be used in realtime scenarios.  XAP plugins specify
  their realtime capabilities, and Hosts can allow or disallow operations
  based on that information.

  All XAP audio data is processed in 32-bit floating point form.  Values are
  normalized between -1.0 and 1.0, with 0.0 being silence.

1.1 Controls

  XAP uses the idea of Controls as abstract carriers of Plugin parameters
  and other information.  Controls can represent things like knobs and
  buttons, but they can also represent things like filenames, MIDI
  aftertouch or channel pressure.  Not only do they represent audio
  parameters, but they can represent chunks of system information, such as
  tempo or transport position.

  Like audio hardware, knobs and other controls can be global to a Plugin.
  However, XAP also allows Instrument Plugins to provide per-Voice controls.

  Controls come in a few datatype flavors, and can have min/max limits and
  default values, as well as hints to the host about what they are,
  semantically.  Hosts can use the hints to automatically connect things,
  where appropriate.

  Controls get their data via Events.  Events can immediately set the value
  of a Control, or they can establish a ramp - a target and duration for the
  Plugin to change the control more smoothly.

1.2 Events

  Almost everything that happens during the ACTIVE state is communicated via
  Events.  The Host can send Events to Plugins, Plugins can send Events to
  the Host, and Plugins can send Events to other Plugins (if they are so
  connected).

  All Events are timestamped.  That means that any Control change, or any
  other Event is sample-accurate.  XAP hosts have a running timer which
  counts sample-frames - this is what the timestamp is based on.

  Events are passed to Plugins on EventQueues.  This allows a Plugin to
  receive any number of Events with a minimal per-Event overhead.

1.3 Channels

  //FIXME:

1.4 Ports

  This is an audio API, so it wouldn't be complete without some mechanism to
  transport audio Data.  This is a Port.  Each Port carries a single stream
  of mono audio data.

  Plugins may allow the Host to disable Ports.  If a Port is disabled, the
  Plugin will not read from or write to it.  If a Port is not disabled, it
  must be connected by the host to a valid buffer.

2.0 The Plugin Descriptor

  The Plugin descriptor is a static(*) data structure provided by the Plugin
  to describe what it can do.  Descriptors are retrieved by calling the
  xap_descriptor() function of a XAP shared object.  This function is called
  repeatedly with an index parameter, starting at 0 and incrementing by one
  on each call.  The function returns the Plugin descriptor for each index,
  up to the number of Plugins in the shared object file, at which time it
  returns NULL.  Plugin indices are always sequential, and once a NULL is
  returned, the host can assume there are no more Plugin descriptors to be
  queried.

  (*) The descriptor for wrapper Plugins may change upon loading of a
  wrapped Plugin.  This is a special case, and the Host must be aware of it.
  //FIXME: how?

2.1 Meta-Data

  The Plugin descriptor provides several fields for Plugin meta-data.  The
  fields are as follows:

  id_code:	the vendor and product encoding
  api_code:	the XAP API version code of this Plugin
  ver_code:	the Plugin version code - used to identify Plugin data
  flags:	Plugin-global flags
  label:	a short, unique Plugin identifier
  name:		the user-friendly Plugin name string
  version:	the version string
  author:	the author string
  copyright:	the copyright string
  license:	the license string
  url:		the URL string
  notes:	notes about the Plugin

2.2 State Changes

  As mentioned above, Plugins exist in one of two states - ACTIVE and
  IDLE (well, three if you count non-existance as a state).  The Plugin
  descriptor holds the methods to create and destroy instances of Plugins,
  and to change their state.

2.2.1 Create

  A Plugin is instantiated via it's descriptor's create() method.  This
  method receives two key pieces of information, which the Plugin will use
  throughout it's lifetime.  The first is a pointer to the Host structure
  (see below), and the second is the Host sample rate.  If the Host wants to
  change either of these, all Plugins must be re-created.  This is where the
  Plugin's internal structures can be allocated and initialized.  There is
  no required set of supported sample rates, but Plugins should support the
  common sample rates (44100, 48000, 96000) to be generally useful.  If the
  Plugin does not support the specified sample rate, this method should
  fail.  Hosts should always check that all Plugins support the desired
  sample rate.

  Once created, the Plugin instance is in the IDLE state.

2.2.2 Destroy

  Plugins are destructed via the descriptor's destroy() method.  Plugins
  can only be destroyed when in the IDLE state.  All Plugin-allocated
  resources must be released during this method.  After this method is
  invoked, the Plugin handle is no longer valid.  This function can not
  fail.

2.2.3 Activate

  From the IDLE state, a Plugin can be changed to the ACTIVE state via the
  activate() method.  Passed to this method are two arguments which are
  valid for the duration of the ACTIVE state - quality level and
  realtime state.  The quality level is an integer between 1 and 10, with 1
  being lowest quality (fastest) and 10 being highest quality (slowest).
  Plugins may ignore this value, or may provide less than 10 discrete
  quality levels.  The realtime state is a boolean value which is boolean
  TRUE if the Plugin is in a realtime processing net or boolean FALSE if it
  is not realtime (offline).

  This method can only be called from the IDLE state.  Once ACTIVE, a Plugin
  may process audio.

2.2.4 Deactivate

  From the ACTIVE state, a Plugin can be changed to the IDLE state via the
  deactivate() method.  This method can only be called from the ACTIVE
  state.

2.3 Channels

  //FIXME:

2.4 Port Setup

  //FIXME: kinda needs Channels
  Once a Plugin is loaded, the Host must connect the audio Ports.  All Ports
  in a Plugin must be connected or disabled.  The Plugin descriptor provides
  a connect_port() method, which the host must call to connect a buffer
  pointer to a Port.  Once connected, a Port remains connected to the
  specified buffer until the Host disables it or connects it to a different
  buffer.  All Plugins are assumed to not use the same input buffers as
  output buffers, unless the Plugin flags indicate that it safely handles
  in-place processing.

  Plugins may allow the Host to disable Ports, rather than connect them.
  The Plugin descriptor provides an optional disable_port() method.  If this
  method is provided, and it returns successfully, the Host can ignore this
  port.  Once disabled, a Port remains disabled until the Host connects it,
  at which point it becomes enabled.

2.5 Controls and EventQueues

  //FIXME: kinda needs Channels
  In order to be of any use, a Plugin must provide Controls.  Control
  changes are delivered via Events.  Events are passed on EventQueues.
  Each Control has an associated EventQueue, on which Events for that
  Control are delivered.  In addition, there is an EventQueue for each
  Channel and a master Queue for the Plugin.  The Plugin can internally use
  the same EventQueue for multiple targets.

  The Host queries the Plugin for EventQueues via the get_input_queue()
  method.  In order to allow sharing of an EventQueue, the get_input_queue()
  method also returns a cookie, which is stored in each Event as it is
  delivered.  This allows the plugin to use a simpler EventQueue scheme
  internally, while still being able to sort incoming Events.

  Controls may also output Events.  The Host will set up output Controls
  with the set_output_queue() method.

2.6 Processing Audio

  A Plugin which has been activated and properly set up, may be called upon
  to process audio.  This is done through the descriptor's run() method.
  This method gets the current sample-frame timestamp as an argument.  In
  this method the Plugin is expected to examine and handle all new Events,
  and to read from or write to it's Ports.

  This method may only be called from the ACTIVE state.

2.7 Errorcodes







All this stuff needs to be integrated somewhere...

Tempo and Meter
----
  XAP uses Controls for transmitting tempo and meter information.  If a
  Plugin defines a TEMPO control, it can expect to receive tempo Events on
  that control.  The Host must define some unit of musical-time measurement (Tick),
  which represents the smallest granularity the host wants to work with.
  This is the basis for tempo and meter.  The host publishes the current count of
  Ticks/Beat via the host struct.

  Control: TEMPO
  Type: double
  Units: ticks/sec
  Range: [-inf, inf]
  Events: Hosts must send a TEMPO Event at Plugin init and when tempo
          changes.

  Control: METER
  Type: double
  Units: ticks/measure
  Range: [0.0, inf]
  Events: Hosts must send a METER Event at Plugin init and when meter
          changes.  Hosts should send a METER Event periodically, such as
          every measure or once per second.

  Control: METERBASE
  Type: double
  Units: beats/whole-note
  Range: [1.0, inf]
  Events: Hosts must send a METERBASE Event at Plugin init and when meter
          changes.

  This mechanism gives Plugins the ability to be aware of tempo and meter
  changes, without forcing information into plugins that don't care.  A
  Plugin can sync to various timeline events, easily.

  The Host struct also provides a mechanism to query the timestamp of the
  next Beat or Bar, and to convert timestamps into the following time formats:
    * Ticks
    * Seconds
    * SMPTE frames

Sequencer Control
----
  XAP plugins may be aware of certain sequencer events, such as transport
  changes, positional jumps., and loop-points.  These data are received on
  Controls.

  Control: POSITION
  Type: double
  Units: ticks
  Range: [0.0, inf]
  Events: Hosts must send a POSITION Event at Plugin init, when transport
          starts, and when transport jumps.  Hosts should send a POSITION
          Event periodically, such as every beat, every measure or once per
          second.

  Control: TRANSPORT
  Type: bool
  Units: on/off
  Events: Hosts must send a TRANSPORT Event at Plugin init and when
          transport state changes.

  Control: CUEPOINT
  Type: double
  Units: ticks
  Range: [0.0, inf]
  Events: Hosts must send a CUEPOINT Event when Cuepoints are added,
          changed, or removed.

  Control: SPEED
  Type: double
  Units: scalar
  Range: [-inf, inf]
  Events: Hosts must send a SPEED Event at Plugin init and when play speed
          changes.

Instruments and Voices
----
  XAP instruments can be either voiced or non-voiced.  Non-voiced
  instruments are essentially always on, and their output is controlled
  purely by controls, such as the gate control of a modular synth or the
  hand-distance of a theremin.  Non-voiced instruments are monophonic
  per-channel.

  Voiced controls are more structured.  They must handle Virtual Voice IDs
  (VVIDs).  VVIDs are unsigned 32 bit integers, which are allocated by the
  host and passed to instruments via the XAP_EV_VVID_* events.  Instruments
  may use VVIDs as a direct index into the host structure's VVID table,
  which is an array of unsigned 32 bit integers.  The data in the host VVID
  table is for use by the instrument.

  VVIDs must be allocated before use and de-allocated before re-use.
  Instruments can maintain an internal mapping between VVIDs and actual
  voices, which allows them to handle voice allocation in a purely
  abstracted manner from the host. Once allocated, the host can send events
  for a VVID until the VVID is de-allocated.

  A VVID has two states - active and inactive.  After allocation, the VVID
  is inactive.  Control events received while inactive can be assumed to set
  the control state for the activation event.  A VVID is activated via the
  VOICE control, which all instruments must provide.  Once in the active
  state, the instrument may produce sound for the voice.  A VVID is
  deactivated via the VOICE control, which puts the VVID in the inactive
  state.  It is important to note that just because a VVID has received a
  VOICE OFF event, it is not necessarily silent.  It may have a long release
  phase, which is dependant on the instrument.
  
  Control: VOICE
  Type: bool
  Units: on/off
  Events: Hosts send VOICE Events any time after a VVID has been allocated
          but before it has been deallocated.

  Some instruments will choose to only examine some controls at activation
  time (such as velocity for MIDI-like instruments) or at deactivation time
  (such as release velocity).  These controls are referred to as latched.
  Setting an init-latched control after activation may or may not set the
  value, and may or may not have any effect - this is instrument dependant.

  A VVID may be re-used.  That is to say, the host can leave a VVID
  allocated after deactivation, and re-activate it later.  This can be used
  for emulation of MIDI synths, where the VVID is akin to the MIDI pitch. 

  A VVID may be deallocated before the associated voice is done playing.
  The instrument should continue to play the voice, as no VOICE OFF event
  has been received.  Likewise, a voice may end before it has received a
  VOICE OFF (such as a drum hit).  The VVID can change to the inactive state
  and track control changes until another VOICE ON is received.

  Because VVIDs and voices have no exact correlation, instrument plugins
  have a great deal of control over their voice operations.  An Instrument
  can be mono, and cut the voice and restart for each new VOICE ON, or it
  man be massively polyphonic.  The host always alerts the plugin to the
  state of a VVID, and to changes of that state.  Once a VVID has been
  deallocated, it can be re-used.

  Plugins can assume that VVIDs are at least global per-Plugin.  The host
  will not activate the same VVID in different Channels at the same time.  



More information about the Linux-audio-dev mailing list