On Wed, Oct 16, 2013 at 6:51 PM, Philipp Überbacher <murks(a)tuxfamily.org>wrote;wrote:
I was hoping for something that requires less DSP
knowledge.
I think we all do... note although I dabble in DSP, I won't claim to
"know"
DSP...
However given that those low-level tools are
available,
hints on how to combine them or on possibly useful algorithms etc.
would be appreciated as well.
Of the three catagories you mentioned (speech, music, noise), speech is
probably the easiest to find...
FFT the whole track (windows of... 8192 or so perhaps), then check for
frequency content in the speech range[1]: 300 - 3.400 Hz.
If the content is steadily within those frequency ranges (allowing for some
FFT windowing error), the that should be ok.
Music (depending on type) is generally rythmical, so transients should be
present, and somewhat evenly spaced. Easier to detect if the music hasn't
been compressed to a brick-wall.
Noise (depending on type) is generally *not* rythmical, so transients
should be present but not evenly spaced...
The above is a suggestion only: I don't know is it the best way to go.
Depending on the content, you'll have some success with the above approach.
Advice on "music-information-retrieval" or content analysis is probably
better on the Music-DSP mailing list, perhaps ask there?
HTH, -Harry
[1]: Voice frequencies,
http://en.wikipedia.org/wiki/Voice_frequency