Much of the latest speech recognition innovations use neural network
technology with back propogation for training and learning. They can be
trained to recognize a wide range of voice types and can detect works strung
together into normal speech. The input to the neural net is a formant
analysis using fft to create the harmonic pattern. With proper arrangment it
will even accomodate variances in the speed of speech as well as whether the
voice is male or female. It can also return a signal of the inflections made
by the speaker.
It is an item that has been studied for years in the computer science realm
and there is no quick solution to do it well.
-----Original Message-----
From: linux-audio-dev-bounces(a)music.columbia.edu
[mailto:linux-audio-dev-bounces@music.columbia.edu]On Behalf Of Toby
Sent: Monday, June 06, 2005 10:07 AM
To: linux-audio-dev(a)music.columbia.edu
Subject: [linux-audio-dev] Re: Speech analysis
Jean-Marc Valin wrote:
you just can't make the difference between two
words separated by
silence and a longer word.
I see your point. The pauses between every couple of vocal emissions
should be measured and taken into consideration in the large picture,
just as the accent on each vocal emission, the probability of each
sub-emission of representing a given phoneme, etc.
Alas, I don't know a thing about voice recognition, except that I can
play Chess on my Mac with voice alone. But I haven't booted MacOS X
in a while :-D
Cheers
Toby