0:00:13 | oh |
---|
0:00:28 | okay |
---|
0:00:28 | so |
---|
0:00:29 | so the this presentation is an a completely different project this is far less |
---|
0:00:34 | uh and political far less convergence proofs |
---|
0:00:38 | um this is basically we we have a |
---|
0:00:41 | by acoustics group there there is interested in |
---|
0:00:44 | um |
---|
0:00:44 | learning |
---|
0:00:45 | species distributions basically a connection between |
---|
0:00:48 | a signal processing machine learning and you college |
---|
0:00:52 | and |
---|
0:00:54 | we we form between myself |
---|
0:00:56 | and and we for an a group that |
---|
0:00:58 | looks at doing that problem |
---|
0:01:01 | um |
---|
0:01:01 | microphones |
---|
0:01:02 | placing microphones in the force and listening to birds |
---|
0:01:05 | uh |
---|
0:01:06 | michael there's here are lawrence knee of is |
---|
0:01:09 | i think it the time you wrote this paper was a a |
---|
0:01:12 | third year |
---|
0:01:13 | uh and grad student |
---|
0:01:14 | you know group |
---|
0:01:16 | and it was very enthusiastic about this problem so |
---|
0:01:20 | you want to help out then we got a of this problem in very quickly caught up with |
---|
0:01:24 | the signal processing the machine learning techniques use the that |
---|
0:01:27 | for ratings |
---|
0:01:28 | is a P D |
---|
0:01:29 | student in the group |
---|
0:01:31 | and |
---|
0:01:32 | basically self and mike |
---|
0:01:34 | collaborate |
---|
0:01:37 | okay so |
---|
0:01:38 | so the bird my acoustic project |
---|
0:01:40 | deals with um |
---|
0:01:43 | do with a variety of questions |
---|
0:01:45 | ah |
---|
0:01:46 | ecological questions and how to solve them using |
---|
0:01:49 | audio |
---|
0:01:50 | i don't |
---|
0:01:50 | set a microphone and force this to |
---|
0:01:52 | the audio and |
---|
0:01:54 | the term |
---|
0:01:55 | species distributions so on |
---|
0:01:58 | individual be |
---|
0:01:59 | so |
---|
0:02:01 | um the uh the other motivation for this particular paper is coming from just audio segmentation |
---|
0:02:07 | and the focus is |
---|
0:02:08 | rather than a classical one the audio segmentation we're is typically used in this |
---|
0:02:13 | a type of data |
---|
0:02:14 | uh we present a time free "'cause" is so way to D |
---|
0:02:17 | uh audio segmentation |
---|
0:02:19 | which is a rather than a threshold meant method is based on |
---|
0:02:22 | a classification so learning from examples |
---|
0:02:26 | um |
---|
0:02:27 | specifically |
---|
0:02:29 | oh well i'll present a segmentation system that is |
---|
0:02:31 | using the random of classifier |
---|
0:02:34 | um |
---|
0:02:35 | to segment |
---|
0:02:36 | and we will show some results |
---|
0:02:38 | so here's the general idea so |
---|
0:02:41 | i think this this particular |
---|
0:02:43 | picture was taken at the harvard broke |
---|
0:02:45 | a long term ecological research site |
---|
0:02:48 | i and the idea is the following |
---|
0:02:50 | what people interested in exploring |
---|
0:02:52 | species distribution for the purpose of ecology research |
---|
0:02:55 | this end an individual |
---|
0:02:57 | there will go and in each of these points this is actually a topographic map |
---|
0:03:01 | and the will standing individual we'll send individual the will stand each of those points |
---|
0:03:05 | for ten minute |
---|
0:03:07 | and we'll move on to the next point sounds that pretty exhausting something they send multiple individuals to do that |
---|
0:03:13 | and for |
---|
0:03:14 | a sampling of ten minutes there in the day |
---|
0:03:17 | uh |
---|
0:03:17 | the idea was to estimate |
---|
0:03:19 | the distribution of species during and they're of course |
---|
0:03:22 | everybody here we no sampling realise the something must be very wrong with |
---|
0:03:26 | just ten minutes during the day to |
---|
0:03:28 | so be obtain a a an accurate |
---|
0:03:31 | uh distribution of speech |
---|
0:03:34 | and of course |
---|
0:03:35 | part of the a here is to |
---|
0:03:37 | uh keep these maps across the years and learn how |
---|
0:03:40 | uh |
---|
0:03:41 | how species distribution change over time |
---|
0:03:43 | how of for example once C estimate the distribution |
---|
0:03:46 | um |
---|
0:03:48 | a how to |
---|
0:03:49 | integrate that we've ecological research in other words |
---|
0:03:52 | how to connect the distributions to environmental parameter |
---|
0:03:56 | so |
---|
0:03:57 | in in this project we actually have placed microphones in the A J and use um |
---|
0:04:02 | long term ecological research side |
---|
0:04:05 | we're using the song leader which is |
---|
0:04:06 | commonly using in on sound analysis and placing in and |
---|
0:04:10 | a variety of places i think in this |
---|
0:04:12 | this time this |
---|
0:04:13 | preliminary search we put |
---|
0:04:15 | in fifteen places |
---|
0:04:16 | um |
---|
0:04:17 | a what happens is this is not a fully automatic yeah |
---|
0:04:20 | somebody has to replace the batteries every two week |
---|
0:04:24 | "'cause" uh because of the |
---|
0:04:25 | time of the recording and also replace them every chips because it takes |
---|
0:04:29 | a lot of memory to record over |
---|
0:04:31 | oh |
---|
0:04:32 | couple weeks |
---|
0:04:34 | so so the system that we developed in order to do a species distribution involves |
---|
0:04:39 | or a collection of automated |
---|
0:04:41 | uh sorry placing automated recorders collection of |
---|
0:04:45 | uh data from these recorded the recording |
---|
0:04:48 | uh converted to spectrograms |
---|
0:04:50 | perform segmentation |
---|
0:04:52 | in the sake spectrograms |
---|
0:04:53 | uh and you know order to extract syllables and the reason why we do that |
---|
0:04:57 | is because um |
---|
0:04:59 | burt's tend to vocalise |
---|
0:05:01 | a simultaneously |
---|
0:05:03 | and so one B |
---|
0:05:04 | a segmentation would have been |
---|
0:05:06 | uh |
---|
0:05:06 | useful he |
---|
0:05:08 | and so the idea is that to you you get those syllables you extract features are each syllables |
---|
0:05:13 | and then you you build a you probabilistic model which is probably |
---|
0:05:16 | um um so the the goal |
---|
0:05:18 | i and and the focus of of this project |
---|
0:05:21 | not this particular paper |
---|
0:05:22 | how to build a probabilistic model |
---|
0:05:24 | that takes recordings a collection of recordings |
---|
0:05:27 | each recording is labeled |
---|
0:05:29 | with the speech is present or subset of the recordings |
---|
0:05:33 | uh yes |
---|
0:05:34 | is labeled with the speech is present and species absent |
---|
0:05:37 | and |
---|
0:05:39 | the idea that here is to learn from the collection of the recording and to be able to analyse |
---|
0:05:43 | and you're recording which doesn't have labels in other words |
---|
0:05:46 | we just been pointing in the recording which syllable be long to which bird and what birds are present in |
---|
0:05:51 | that recording |
---|
0:05:55 | so as i mentioned before um birds |
---|
0:05:59 | tend to have many independent vocalisations |
---|
0:06:01 | and |
---|
0:06:02 | open these vocalisation |
---|
0:06:04 | a overlap in time another other word |
---|
0:06:07 | or ten in order to communicate birds to tend to pick different frequency channel |
---|
0:06:11 | so that they're not overlapping and da da here's communication used for |
---|
0:06:15 | i just caring off and i means |
---|
0:06:17 | which are short simple calls or |
---|
0:06:20 | make a |
---|
0:06:21 | and so the idea they what's the point of |
---|
0:06:23 | communicating with |
---|
0:06:24 | a a your mate |
---|
0:06:26 | when they can hear so the D is |
---|
0:06:28 | birds tend to space themselves in frequency as well |
---|
0:06:33 | so so as i mentioned one D segmentation may look like this when there's no overlap |
---|
0:06:38 | oh birds singing |
---|
0:06:39 | but in practice |
---|
0:06:40 | a lot of the recordings that we get |
---|
0:06:42 | uh would look something like this some of the stuff use |
---|
0:06:45 | barely visible |
---|
0:06:47 | uh but and what i'd like to point out is that |
---|
0:06:50 | uh this is the |
---|
0:06:51 | frequency axis and this is the time axis and one can see that at |
---|
0:06:55 | individual |
---|
0:06:56 | time instance there's an overlap between syllable for one bird species |
---|
0:07:00 | syllables from other |
---|
0:07:01 | speech |
---|
0:07:04 | and so |
---|
0:07:05 | in this particular paper the focus is how to one |
---|
0:07:08 | good segmentation as i mentioned |
---|
0:07:10 | it apparently even though segmentation is |
---|
0:07:13 | is not as fine |
---|
0:07:14 | for for us to this signal processing is is developing the models |
---|
0:07:17 | and how to a how to estimate |
---|
0:07:19 | species distribution an your recording this and models |
---|
0:07:22 | it is |
---|
0:07:23 | probably one of the most important problems in |
---|
0:07:25 | for a and an important aspects of this project |
---|
0:07:28 | when you break a syllable either |
---|
0:07:31 | over segment of syllable or |
---|
0:07:33 | you know segment syllables jointly |
---|
0:07:35 | um a lot of the models |
---|
0:07:37 | seem to fail |
---|
0:07:39 | so the process is the following |
---|
0:07:41 | uh we get these spectrograms and we have somebody |
---|
0:07:45 | sit and use |
---|
0:07:46 | either paint |
---|
0:07:47 | or other program to going mark |
---|
0:07:49 | uh a what is noise and what is signal and the a D here is |
---|
0:07:53 | will give the computer enough examples of this |
---|
0:07:56 | and |
---|
0:07:57 | basically will |
---|
0:07:59 | uh will take the or to generate spectrograms |
---|
0:08:01 | um |
---|
0:08:03 | extract |
---|
0:08:04 | features feed it into a classifier a along with the labels with |
---|
0:08:07 | those particular mass the were created |
---|
0:08:09 | and hope that |
---|
0:08:11 | a given in your recording |
---|
0:08:12 | the algorithm will uh uh with the us to classify a particular |
---|
0:08:16 | picks on the spectrogram is belonging longing to |
---|
0:08:19 | background of foreground |
---|
0:08:20 | so the the the framework is very similar to those of you are |
---|
0:08:23 | in image post processing were computers |
---|
0:08:27 | so we we borrow |
---|
0:08:28 | some of the principles from there |
---|
0:08:32 | so |
---|
0:08:34 | the details are as follows we we |
---|
0:08:36 | we obtain the spectrum grants what you can see on the screen as well as on my on the lap |
---|
0:08:40 | of here is that |
---|
0:08:41 | uh after converting the recordings in the spectrogram |
---|
0:08:44 | there are um there's is a stationary background noise |
---|
0:08:48 | it it corresponding to |
---|
0:08:50 | stream or other |
---|
0:08:51 | environment noise |
---|
0:08:52 | that exist low frequency and so we're applying a whitening filter to get rid of the |
---|
0:08:57 | and then at this point |
---|
0:08:58 | we take the spectrogram in we extract feature |
---|
0:09:01 | for each |
---|
0:09:01 | a a pixel in the spectrogram we collect the neighborhood of values |
---|
0:09:05 | and extract features for that neighbourhood of values |
---|
0:09:08 | and |
---|
0:09:09 | once we of the features we applied the random forest class to like i will skip the details of of |
---|
0:09:14 | the classifier |
---|
0:09:16 | so |
---|
0:09:17 | da da here is to then |
---|
0:09:19 | be able to |
---|
0:09:20 | predict done in your recording |
---|
0:09:22 | the right uh |
---|
0:09:24 | the right label either foreground or background |
---|
0:09:27 | another aspect is |
---|
0:09:28 | um another advantage of the random forest classifier which is very suitable for segmentation is a for example to svm |
---|
0:09:35 | other classifiers |
---|
0:09:36 | is the fact that the classifier can give you a weighted threshold the performance |
---|
0:09:40 | in a a forest classifier you you know |
---|
0:09:42 | you obtain multiple trees each tree provides a classification so either zero or one |
---|
0:09:47 | like taking for example fifty of those |
---|
0:09:49 | and averaging it can get the probability |
---|
0:09:52 | all be long to one class of the other |
---|
0:09:54 | which allows you to then threshold |
---|
0:09:56 | and helps in segmentation |
---|
0:09:59 | and so he here an example how |
---|
0:10:01 | after segmenting a after applying to class for segmenting we obtain |
---|
0:10:05 | uh we obtain this result and of course we filter out |
---|
0:10:09 | small |
---|
0:10:11 | uh |
---|
0:10:12 | small segments |
---|
0:10:13 | a small connected components |
---|
0:10:16 | and the is up to you do that nick and then i should be extract |
---|
0:10:20 | individual syllables from the recordings |
---|
0:10:22 | which is really any input to all the work that we to late |
---|
0:10:27 | so the data that we have |
---|
0:10:29 | uh the we worked on |
---|
0:10:31 | for the experiment is |
---|
0:10:32 | steer P C and data recorded at sixteen khz we will increase that in the future "'cause" some birds |
---|
0:10:37 | articulate |
---|
0:10:39 | uh if frequencies higher than eight khz |
---|
0:10:41 | are we have |
---|
0:10:43 | dataset set of six are twenty five audio segments that we collected fifteen seconds each |
---|
0:10:48 | um |
---|
0:10:49 | basically |
---|
0:10:51 | two segments the the way the that it was course to segments out twenty four hours per side from thirteen |
---|
0:10:56 | sites |
---|
0:10:56 | so there's a mix of sites and mix of hours of the day |
---|
0:11:00 | and |
---|
0:11:01 | each spectrogram in order to do this evaluation properly we had to label all six hundred sixty five spectrograms |
---|
0:11:07 | and then to see whether the |
---|
0:11:09 | a segmentation algorithm can predict |
---|
0:11:11 | the human labelling |
---|
0:11:13 | and |
---|
0:11:14 | in this particular we use forty trees for each other in of forest and we use |
---|
0:11:18 | well we do is we cut out |
---|
0:11:20 | um and random neighbour for this we can use |
---|
0:11:23 | all the patches in the spectrogram so we cut a |
---|
0:11:25 | five and five |
---|
0:11:26 | sorry half a million |
---|
0:11:27 | exam |
---|
0:11:31 | and we're considering |
---|
0:11:33 | to to evaluations uh of the roc C one is in terms of time-frequency area the number of spectrogram units |
---|
0:11:39 | correctly classified you look at the number of pixels the work are like could correctly classify |
---|
0:11:44 | and i another one is you consider the energy weighting in other words |
---|
0:11:48 | um there could be these pixels that are hard to get |
---|
0:11:52 | and perhaps you want take that into account |
---|
0:11:54 | so how how obvious as a picks a how high the energy there |
---|
0:11:57 | um |
---|
0:11:59 | is incorporated in the second form |
---|
0:12:02 | so next |
---|
0:12:03 | i'm presenting the you results the results uh we |
---|
0:12:06 | the classifier |
---|
0:12:07 | and scan through the threshold in order to obtain |
---|
0:12:10 | the uh receiver operating characteristic the C |
---|
0:12:13 | and what we can see is that you know our intuition why not just use of the first question is |
---|
0:12:18 | why don't you just use an energy thresholding |
---|
0:12:20 | and that the performs that we get with energy |
---|
0:12:23 | thresholding |
---|
0:12:24 | so okay so energy thresholding work pixel wise and that could be inaccurate accuracy it to use |
---|
0:12:29 | a a thresholding and well or perhaps to take advantage of |
---|
0:12:33 | neighborhood and in this case |
---|
0:12:35 | oh we try that as well and then of course we compare that to the classifier and |
---|
0:12:40 | the close to the R C is |
---|
0:12:42 | this line or that i'm the better it is and we can see that |
---|
0:12:46 | the classifier |
---|
0:12:47 | does a far better |
---|
0:12:49 | then energy thresholding which are the common myth |
---|
0:12:54 | and of course we look |
---|
0:12:55 | yeah well as well as |
---|
0:12:56 | uh |
---|
0:12:57 | at the our C in terms of total acoustic energy |
---|
0:13:00 | uh rather than |
---|
0:13:01 | calculate pixels as zero and one we signed the weight |
---|
0:13:04 | a corresponding to the pixel on a spectrum |
---|
0:13:07 | and once again we see the same relationship that |
---|
0:13:11 | um |
---|
0:13:12 | basically the classified does better than thresholding and blurring and thresholding and blurring does |
---|
0:13:17 | some of better than um just simply threshold |
---|
0:13:22 | so |
---|
0:13:23 | i think part of the idea is that once we do this classification in terms of future work our goal |
---|
0:13:28 | is to use and and these syllables is a dictionary and once we have a dictionary there's a lot of |
---|
0:13:33 | um |
---|
0:13:35 | a lot of all the can be applied |
---|
0:13:37 | um |
---|
0:13:38 | once you have an dictionary so for example topic models are some |
---|
0:13:41 | some of our research interest and were interested in applying topic models |
---|
0:13:45 | to identifying a bird species in these recordings you are |
---|
0:13:49 | some examples of a how visible the are of class is |
---|
0:13:52 | that were form |
---|
0:13:53 | after are um |
---|
0:13:55 | after applying the segmentation you can see |
---|
0:13:57 | a fairly |
---|
0:13:58 | a repetitive patterns will in each class there or you can barely see i guess you to the |
---|
0:14:04 | different contrary |
---|
0:14:06 | so so the point here |
---|
0:14:07 | is that we're getting |
---|
0:14:09 | a segmentation that is |
---|
0:14:11 | to the level that were interested in we're not over segmenting or under segmenting |
---|
0:14:16 | and and then allows us to perform class there is and to basically convert |
---|
0:14:20 | these audio |
---|
0:14:21 | into two docking |
---|
0:14:26 | questions |
---|
0:14:41 | this is a very interesting problem |
---|
0:14:42 | uh |
---|
0:14:43 | do you happen to have a a a priori the number of a classes of words that you're looking at |
---|
0:14:49 | or this is an open set |
---|
0:14:51 | so |
---|
0:14:52 | yes and also so we we do have a they're going here and and at this but we think the |
---|
0:14:56 | humans a better than the computer um so we trust the what they tell is right so they do know |
---|
0:15:00 | about the species that are present in that area |
---|
0:15:03 | and |
---|
0:15:04 | the no part is that there some rare species that often |
---|
0:15:07 | get into that area that |
---|
0:15:09 | maybe present and vocalise |
---|
0:15:10 | in the recordings of will not be able to |
---|
0:15:12 | that |
---|
0:15:13 | yeah because you know my |
---|
0:15:15 | my idea is that if you know the class you probably can |
---|
0:15:19 | a a to the segmentations to classifier |
---|
0:15:21 | me |
---|
0:15:22 | monitor |
---|
0:15:23 | right |
---|
0:15:23 | right |
---|
0:15:24 | if you know what right out the we want to be i mean one of the problems to the interesting |
---|
0:15:28 | to us is new class detection |
---|
0:15:30 | and we would like to avoid relying on the class of that we know |
---|
0:15:46 | since for the presentations really uh really impressive work |
---|
0:15:49 | and uh uh so i would like to know who with to um some yeah this pretty good the look |
---|
0:15:55 | kernel with the the the random forest and number would uh could you describe the the the kind of phones |
---|
0:15:59 | but it to use the that you good because |
---|
0:16:02 | since you're already have like really good results like goes the focus would be like |
---|
0:16:05 | oh can you even |
---|
0:16:07 | a a or |
---|
0:16:09 | that i think that's a good question um |
---|
0:16:11 | i i don't know that i can answer directly or question about what i wanna point out is that |
---|
0:16:15 | um |
---|
0:16:16 | you know this is highly dependent how |
---|
0:16:19 | people segment the data in this part we we've tried this supports many times with |
---|
0:16:23 | different features |
---|
0:16:24 | um and and |
---|
0:16:26 | basically different people who signal of what we notice is that |
---|
0:16:29 | the results are highly dependent |
---|
0:16:31 | in terms of |
---|
0:16:31 | form |
---|
0:16:33 | on how because and we still see the same relationship between the methods but |
---|
0:16:37 | the kurds themselves would very quite a lot depending on what features you're speaking |
---|
0:16:41 | uh to estimate syllables and depending on how |
---|
0:16:43 | you're segmenting in fact |
---|
0:16:45 | you know we're using a uh |
---|
0:16:46 | um |
---|
0:16:48 | forget get in in in my of right |
---|
0:16:51 | a simple as a as it goes to to mark the syllables |
---|
0:16:54 | and at everybody uses a different a different brush so |
---|
0:16:57 | so it the actually does |
---|
0:16:59 | generate different |
---|
0:17:00 | different |
---|
0:17:01 | oh estimate to classify different |
---|
0:17:03 | E |
---|
0:17:06 | but i i can talk to have to |
---|
0:17:08 | to get to the point of a |
---|
0:17:20 | i don't know i don't know maybe it's worth to consider different um |
---|
0:17:24 | different features a or different representation that |
---|
0:17:28 | we haven't tried that i know that in this field people use |
---|
0:17:30 | a coral eloquence as well and another method |
---|
0:17:33 | we simply do not right |
---|
0:17:38 | yeah |
---|
0:17:41 | we will make it available so |
---|
0:17:45 | we we we basically have a a paper |
---|
0:17:47 | in submission right now once it gets |
---|
0:17:50 | one one it gets except that all we will will make the data available for everyone to |
---|
0:17:55 | try now do that |
---|
0:17:56 | the man |
---|