Analysis of monophonic audio signals on the commandline

List overview All Threads
Download

newer

older

Full Stack Engineer job opening

Re: [LAU] Analysis of monophonic...

Jeanette C.

27 Feb 2022 27 Feb '22

2:06 a.m.

Hey hey, I am looking for tools to analyse bird song. I have many recordings, which can probably be cleaned up enough to have single calls as good as isolated. On the other hand there are quite a few sonograms of bird calls, used to identify species. What I am looking for is a kind of mathematical analysis tool to get an idea of frequency and volume, perhaps with an estimate of curves. I suppose that requires statistical analysis. This is far from my forte, so I wouldn't know which keywords for the relevant math to enter and in which tool to look. Are there audio specific programs? Or is a math package like R, octave or maxima able to handle such data input and the task? If you are not immediately aware of a relevant tool, but could supply me with a few keywords and considerations, that would already be quite helpful. Best wishes, Jeanette -- * Website: http://juliencoder.de - for summer is a state of sound * Youtube: https://www.youtube.com/channel/UCMS4rfGrTwz8W7jhC1Jnv7g * Audiobombs: https://www.audiobombs.com/users/jeanette_c * GitHub: https://github.com/jeanette-c There's a girl in the mirror I wonder who she is Sometimes I think I know her Sometimes I really wish I did <3 (Britney Spears)

Show replies by date

Fons Adriaensen

27 Feb 27 Feb

10:50 a.m.

On Sun, Feb 27, 2022 at 01:06:25AM +0100, Jeanette C. wrote:

...

I am looking for tools to analyse bird song. I have many recordings, which can probably be cleaned up enough to have single calls as good as isolated. On the other hand there are quite a few sonograms of bird calls, used to identify species.

The sonogram is probably the tool of choice here. Mathematically it's quite simple (compared to some other methods), but the entire interpretation is left to the eye and brains of the user. It's little more than a transformation that makes this interpretation easier. If probably also does not reveal anything that you couldn't just hear, just allows to put some numerical values on those features.

...

What I am looking for is a kind of mathematical analysis tool to get an idea of frequency and volume, perhaps with an estimate of curves. I suppose that requires statistical analysis.

I think there are three aspects to this: 1. Understanding the maths. Some signal analysis methods are simple, some are absolutely not. 2. Finding a tool to do the maths. This is probably the easiest part. My preference for that has been python + numpy + scipy for a long time. The advantage of those is that you have a general purpose programming language as well as tools for signal processing. This can make the third part a lot easier. 3. Present the results in way that is accessible to you. This will very probably mean some data reduction, or algorithms that look for specific features. A sonogram is a 2D picture. Curves for e.g. frequency or amplitude as a function of time can be reduced to 1D, probably making the presentation easier. I know of experiments with a 2D 'bed of needles' (like a Braille display but 2D) that with some training allows blind people to 'see' images. This is probably beyond reach. How useful would it be to present a relatively low-res image as ascii-art ? Or a function of time as one line of ascii ? Other formats ? Ciao, -- FA

Jeanette C.

12:37 p.m.

Hi Fons, and first of all many thanks for coming to my aid, yet again! Feb 27 2022, Fons Adriaensen has written: ...

...

A sonogram is a 2D picture. Curves for e.g. frequency or amplitude as a function of time can be reduced to 1D, probably making the presentation easier.

Hm, if such images are clean, I suppose a program can be written to translate the sonogram to values.

...

I know of experiments with a 2D 'bed of needles' (like a Braille display but 2D) that with some training allows blind people to 'see' images. This is probably beyond reach.

... Definitely. I've seen those. As you say though, looking at the shape is just an aid. 2D representations of all kind are unfeasible really. I think I'd have to go with pure numbers representation and then at least identifying peaks in amplitude and pitch spitting them out with times. Thanks for clarifying the issues at hand and putting them so concisely. That alone has sparked off a few ideas, be they not as conclusive as I may have hoped. But real data from living organisms no less would have the unpredictability. Best wishes and thanks again, Jeanette -- * Website: http://juliencoder.de - for summer is a state of sound * Youtube: https://www.youtube.com/channel/UCMS4rfGrTwz8W7jhC1Jnv7g * Audiobombs: https://www.audiobombs.com/users/jeanette_c * GitHub: https://github.com/jeanette-c Make all the clouds disappear Put all your fears to rest <3 (Britney Spears)

Fons Adriaensen

2:24 p.m.

On Sun, Feb 27, 2022 at 11:37:45AM +0100, Jeanette C. wrote:

...

Hm, if such images are clean, I suppose a program can be written to translate the sonogram to values.

They are not always very clean and there are two reasons for this: - the quality of the recording (filtering will help), - the complexity of the sound.

...

2D representations of all kind are unfeasible really.

The problem here is that some bird sounds can only be represented correctly in 2D parameter space. Some contain a clear single frequency, usually sweeping and modulated. Such modulation can be quite fast, in the tens of Hz region. Some others contain very short broadband features, and the whole notion of a single frequency is not valid at all. I've seen impulsive waveforms of only a few milliseconds in some recordings. And many bird sounds are a mix of those two extremes. A sonogram deals with both of them, that is why it is useful. So what would be needed is some form of analysis that produces less output but is still able to handle both cases and everything in between. Using classical analysis methods, there is a limit to the product of resolution in time and frequency, similar to the uncertainty principle in quantum physics. Human (and animal) hearing can in some cases go beyond that limit - this is possible only by making some a-priori assumptions about the signal. The problem is similar to one that occurs in time-stretching of audio: the algorithm must decide if some feature should be regarded as significant in the time domain or in the frequency domain. Which is why software such as rubberband has both user options and some not-so-simle internal decision making. As a simple example, take a 1 kHz sinewave that is amplitude modulated by a 10 Hz signal. The actual frequencies present then are 990, 1000, and 1010 Hz. Now how should this be analysed ? Option 1: as a modulated 1 kHz signal. When time-stretched, e.g by a factor of 2, the amplitude as a function of time is preserved, the modulation frequency becomes 5 Hz, and the output frequencies are 995, 1000, and 1005 Hz. Option 2: as three separate and unrelated frequencies. Each of them is stretched separately, and the output is 990, 1000, and 1010 Hz. So this will still sound as 10 Hz modulation, just longer. Which one is correct ? The simple fact is that both are, it is just a matter of interpretation. Deciding this is something our brains are good at, based on experience and expectations. Exactly the same question arises when trying to reduce a signal to something that can be described by a 1D function. Ciao, -- FA

Jeanette C.

3:05 p.m.

Thank you for the detailed elaboration. Those were indeed factors I have so far ignored, even though I should be aware of them. Somehow I didn't link the anlysis of a sonogram to FFT like challenges. At least, analysing the song of the chaffinch is free of broadband features. Certainly going by ear alone. That together with some more already coded data from other bird calls may greatly help in a semi-analytical approach, which may yield good enough results. I will start a few experiments and see how good it gets. Best wishes, Jeanette Feb 27 2022, Fons Adriaensen has written:

...

On Sun, Feb 27, 2022 at 11:37:45AM +0100, Jeanette C. wrote:

Hm, if such images are clean, I suppose a program can be written to translate the sonogram to values.

They are not always very clean and there are two reasons for this: - the quality of the recording (filtering will help), - the complexity of the sound.

2D representations of all kind are unfeasible really.

-- * Website: http://juliencoder.de - for summer is a state of sound * Youtube: https://www.youtube.com/channel/UCMS4rfGrTwz8W7jhC1Jnv7g * Audiobombs: https://www.audiobombs.com/users/jeanette_c * GitHub: https://github.com/jeanette-c Make all the clouds disappear Put all your fears to rest <3 (Britney Spears)

Fons Adriaensen

3:48 p.m.

On Sun, Feb 27, 2022 at 02:06:06PM +0100, Jeanette C. wrote:

...

At least, analysing the song of the chaffinch is free of broadband features. Certainly going by ear alone. That together with some more already coded data from other bird calls may greatly help in a semi-analytical approach, which may yield good enough results.

For us humans, the gray area between time-domain and frequency domain features is around 20 Hz. For example a 10 Hz modulation we will hear as vibrato, but 30 Hz certainly will give a very different impression, that of a rough sound or a complex spectrum. Since birds are much smaller than humans, they can produce faster modulations and they also tend to generate higher frequencies. Very probably for them the gray zone occurs at a higher frequency. That means that of the most revealing ways to analyse bird sounds is to listen to them slowed down (not time-stretched). That will allow to hear modulations that are too fast for us to be perceived as such otherwise. Ciao, -- FA

Fernando Lopez-Lezcano

1 Mar 1 Mar

8:28 a.m.

Hard to add anything other than what Fons, Bill (his birds.clm are great!) and others have written. I thought I should maybe mention ATS (Analysis, Transformation, Synthesis) and associated programs: https://dxarts.washington.edu/wiki/analysis-transformation-and-synthesis-ats There is a command line analysis tool that can generate the analysis files (a mix of sinusoidal components and bandlimited noise bands). I have used it successfully for some of my music in the context of CLM and SuperCollider (there are SC UGens that can read those analysis files). In any case it shares all the limitations of analysis that have been mentioned before. Maybe slowing down the bird songs and then analyzing them, as Fons suggested? Can't wait to hear the results! Best, -- Fernando On 2/27/22 5:48 AM, Fons Adriaensen wrote:

...

On Sun, Feb 27, 2022 at 02:06:06PM +0100, Jeanette C. wrote:

1229

days inactive

1231

days old

linux-audio-user@lists.linuxaudio.org

Manage subscription

6 comments

3 participants

tags (0)

participants (3)

Fernando Lopez-Lezcano
Fons Adriaensen
Jeanette C.