[LAU] tool to seperate elements of a song

David Olofson david at olofson.net
Sat Feb 23 05:36:50 EST 2008

On Saturday 23 February 2008, Sebastian Tschöpel wrote:
> > - we are able to collect and compute all the necessary information
> > with our senses and brain to perform this "switch off" ( i doubt
> > that ) 
> mmhmm.... maybe i shouldn't use such a superlative like "i doubt
> that" here :) But it's definitely not a piece of cake, i never heard
> of someone who can do such a thing and okay: There is a lot more to
> our brain than what I can think of - haha! thats like a dog hunting
> his own tail :)

I think that would be a matter of training, basically; much like many 
people can "hear" what something will sound like before playing it. 
Untrained ears have a much harder time picking out and following 
individual instruments from a mix, but this is something that 
improves over time, in my experience. It's a lot like learning to 
read by recognizing the words rather than the individual letters. It 
would seem like the same basic mechanisms at play, though I don't 
know if that's actually how the human brain implements it.

Now, theoretically, if you can make out the individual instruments 
from a mix, and know what these instruments would sound like on their 
own, you should basically be able to recreate any combination of 
these instruments in your mind.

Anyway, from the strictly technical POV, there's this major 
overlapping problem. Getting the individual frequency components out 
of a fragment of music is trivial (relatively speaking) - but how do 
you know which ones go with what instrument?

Well, consider a simple example with a solo melody voice over a simple 
bass line. As long as you can make out the fundamentals (which can 
turn out to be quite hard enough in real applications!), you can look 
at the spectrum and figure out which components follow what melody. 
This would require multipple analysis passes (to learn the instrument 
sounds and melodies), and/or a database of "familiar sounds"; it's 
not something you can just do frame by frame on unknown data.

...and of course, there's a million "minor" issues around this that 
make it a lot harder than it appears to be. Logically, it has to be 
possible, but maybe the first step would be to dispell a few 
confusing myths about how the human brain does this stuff. I think 
the brain has access to a lot of data that algorithms of this sort 
generally don't have.

For example, I don't think it's realistically possible to do this 
without some sort of database of "familiar sounds", and/or some model 
of how the "average musically trained" human brain infers 
fundamentals from audible spectra. (There are plenty of instruments 
that have very little energy around the fundamental frequency, which 
makes even "simple" pitch tracking non-trivial.) Considering how the 
brain appears to work, I don't think there's a strict distinction 
between a "database" and a "model" in this regard. A neural network 
might be the proper model for software, and it's state after 
appropriate training would be the "database."

//David Olofson - Programmer, Composer, Open Source Advocate

.-------  http://olofson.net - Games, SDL examples  -------.
|        http://zeespace.net - 2.5D rendering engine       |
|       http://audiality.org - Music/audio engine          |
|     http://eel.olofson.net - Real time scripting         |
'--  http://www.reologica.se - Rheology instrumentation  --'

More information about the Linux-audio-user mailing list