On Mon, Mar 9, 2009 at 2:33 PM, Olivier Guilyardi <list@samalyse.com> wrote:
alex stone wrote:
>
> On Mon, Mar 9, 2009 at 1:10 PM, Olivier Guilyardi <list@samalyse.com
> <mailto:list@samalyse.com>> wrote:
>
>     I'm looking for material, docs and/or software to remove speech
>     noise, as caused
>     by the movements of the mouth.
>
> In the commercial world this one of the strengths of protools, and i
> know 2 voice editors (full time) who use pt for precisely this task.
> (Others include Sequoia, Nuendo, etc...)

Does protools include some automatic speech noise detection and/or removal
utilities?

> There's a lot of work editing voice, so tools that make this type of
> workflow easier are at a premium. (Ask any foley engineer as an example)
>
> I respectfully suggest you take a look at Ardour, Audacity, and Rezound.

I know these of course, not as an advanced user though.

> Ardour, with it's excellent user control over regions, gives the
> operator good editing opportunities, and although there are a couple of
> keystroke/editing tools missing when compared to pt, it is already solid
> and capable for fine tune editing.

It looks like you are talking about 100% manual noise removal. The users I'm
dealing with already do that on protools on Mac. They lose a lot of time with
this. I'm looking for a way to automate this, at least partly. I think they
really need this, so, if needed, I suppose I could convince them to use Ardour
for this specific task. In which case I'd try to code something to add this
feature to Ardour.

I suppose there is mainly two ways to do this:

1 - noise detection, manual removal (as mentionned by Dave on this thread). This
is certainly the most flexible thing, safest, and also the most language
independent. Because all selected regions could be removed at once it could also
provide a high level of productivity

2 - complete automatic removal. It would be more language specific (just think
about those tongue sounds in some african languages..), but may still work for a
certain range of European languages (I'm primarily targetting french).

The second solution has the advantage that it could be developed as a simple LV2
plugin or so, so that I wouldn't need to go and deal with the details of Ardour
editing routines, as in the first solution. I can't exactly tell how much work I
will be able to put into this, I might need to restrain my ambitions to stay
within some realistic professional requirements (read: money).

> Audacity, with it's multi track recording, and excellent importing
> framework, including many tools and effects to manipulate wavs, is a
> contender too.
>
> Rezound, although a bit older in the app stakes, is an excellent editor
> (in my humble opinion), and has the capability to give the user
> virtually complete control over recordings, to a degree of excellence
> that would fit in any studio, full time. I have a colleague in britain
> who uses Rezound full time for all his album work and voice over editing
> he does for others, and he's constantly in admiration of what he
> achieves with this modest but powerful app, particularly when it's put
> up against some high priced commercial competitors.

I like Rezound too, but the above remarks still apply.

> 3 contenders as initial suggestions. No doubt there will be other
> offerings from those who do this more frequently than i do.

Thanks. But don't forget this is a LAD question, and that I'm not afraid to code
(that's my job actually).

 Olivier


Olivier, i wasn't aware you have coding skills, so freely ignore anything i've written that isn't relavent.

If you're intent on automating a speech analysis, voice noise removal device of some sort, then you might do well to start with a 'pre and post' framework. Things like lipsmacking, glottal and nasal noise for the end of phrase, etc, are fairly easy to identify, and generally occur pre and post. So that may well be a decent percentage of any cleanup done quickly. (Dependent of course on language. Cleaning up russian would be a different 'module' to cleaning up French, or Finnish.)

And maybe there's a coding clue there too. A modular, language centric approach, based on loading a module designed specifically for a particular language and phrasal interpretation. (Module 1 spoken=French, Module 2 sung=French, Module 3 spoken=Finnish, etc....)




Alex.



--
Parchment Studios (It started as a joke...)