0:00:13 | so now we move away from um the perceptual um side of things to it to the statistical properties the |
---|
0:00:18 | sound |
---|
0:00:19 | you know the sound of wind um person the sound of building block |
---|
0:00:22 | um is a not formants you know it's that it's to ms to new deterministic signal |
---|
0:00:26 | which a lot of as to with and terms speech |
---|
0:00:28 | so how do you recognise the statistical behavior of a sound |
---|
0:00:31 | um dan ellis a columbia and just term and what you |
---|
0:00:34 | will describe their um sound texture representation |
---|
0:00:37 | in talk about use the sound classification |
---|
0:00:40 | well |
---|
0:00:40 | alright right malcolm and that thanks everybody for showing up to our uh of session |
---|
0:00:45 | um select like comes that i'm gonna talk about texture |
---|
0:00:47 | today |
---|
0:00:48 | um and and textures are sounds a result from large numbers of acoustic events main a include things you hear |
---|
0:00:53 | all the time at sound of rain |
---|
0:00:59 | win |
---|
0:01:02 | birds |
---|
0:01:05 | running water |
---|
0:01:09 | in |
---|
0:01:13 | crowd noise |
---|
0:01:16 | applause |
---|
0:01:18 | a |
---|
0:01:19 | and fire |
---|
0:01:23 | so these kinds of sounds are common in the world and it seems like to are important for a lot |
---|
0:01:26 | of task that humans have perform in that we might wanna get machines to perform |
---|
0:01:30 | like figure out where you are or what the weather's like for instance |
---|
0:01:34 | um but in contrast to the vast literature on visual texture um both in human a machine vision um sound |
---|
0:01:39 | textures are largely on study |
---|
0:01:41 | so |
---|
0:01:42 | uh the question that we've been looking into is how can textures be represented |
---|
0:01:46 | and recognise um so that there is some previous work on modeling sound texture on this is probably not a |
---|
0:01:51 | completely exhaustive list of the publications but certainly have a big chunk of them so it's it's a pretty small |
---|
0:01:56 | literature |
---|
0:01:57 | as also a lot of work on environmental sounds at off an inclusive of texture |
---|
0:02:00 | um um of the work done be talking about is |
---|
0:02:02 | different a little bit from these approaches |
---|
0:02:04 | uh and that our perspective is that machine recognition might be able to get some some clues from human texture |
---|
0:02:09 | perception and so on this end |
---|
0:02:10 | this is very much in this period of the work the new talked about um and in what did was |
---|
0:02:14 | just talking about um for me |
---|
0:02:17 | so |
---|
0:02:18 | are we we've been looking in to how humans represent and recognise textures and |
---|
0:02:22 | i we'll the starting point for the work is the observation that unlike |
---|
0:02:26 | uh the sounds are made by individual of bands like a spoken word |
---|
0:02:29 | textures are stationary so there essential properties don't change over time and that that sort of one of the defining |
---|
0:02:34 | properties so where as |
---|
0:02:35 | on the waveform of a word um here clearly has a beginning in an and then temporal evolution |
---|
0:02:41 | on the sound of rain is just kind of there |
---|
0:02:42 | right so that the call it make it rain |
---|
0:02:45 | don't change over time |
---|
0:02:46 | and so the key proposals that because they're stationary textures can be captured by statistics |
---|
0:02:51 | that is just time averages of acoustic measurement |
---|
0:02:54 | at the thing doesn't change we can just makes the measurement average them over time and not a to do |
---|
0:02:57 | a good job of capturing its qualities |
---|
0:03:00 | so what what we propose is that |
---|
0:03:01 | when you recognise the sound of fire or the sound of rain |
---|
0:03:05 | you're recognising these summaries to test |
---|
0:03:08 | and what whatever statistics you your auditory system are measuring are presumably derived from for full auditory representations that we |
---|
0:03:15 | know something about a you've you've heard a bit about this and the first you talks |
---|
0:03:19 | so we know that that sound is filter by the coke clean you can think of |
---|
0:03:22 | yeah output of the coke the sort of a sub type representation |
---|
0:03:26 | know that a lot of information in sub-bands is conveyed |
---|
0:03:28 | um by to their amplitude envelopes |
---|
0:03:30 | um after they've been compressed but by the coke the |
---|
0:03:33 | um i is now quite a bit of evidence that |
---|
0:03:35 | um the a the on blocks are the spectrogram like representation that that |
---|
0:03:39 | um they comprise |
---|
0:03:41 | are then |
---|
0:03:41 | subsequently filtered by |
---|
0:03:43 | a another stage of filters that are often called modulation filter and so the |
---|
0:03:46 | the things that new more showing those receptive fields of cortical neurons |
---|
0:03:50 | are like this a although he would he was showing examples from the cortex where the tuning is a little |
---|
0:03:54 | bit more complicated new C |
---|
0:03:56 | patterns in in both frequency and time |
---|
0:03:58 | the modulation filled is that we typically |
---|
0:04:00 | look look at are those that mimic the things you find certain cortical in the inferior click a listen and |
---|
0:04:05 | foul mess where things are |
---|
0:04:06 | primarily tuned for temporal modulation so these little things here |
---|
0:04:09 | represent |
---|
0:04:10 | um |
---|
0:04:11 | the the past and and temporal modulation |
---|
0:04:13 | um frequency |
---|
0:04:15 | so who the question that that we've been looking into is how much of texture perception can be captured with |
---|
0:04:21 | relatively simple summary statistics of representations like these that we believe to be present a biological auditory systems |
---|
0:04:27 | and what these |
---|
0:04:27 | summers to is with then be useful for machine recognition task |
---|
0:04:32 | so the methodological proposal that underlies most the work |
---|
0:04:35 | is that synthesis is a very powerful way to test a perceptual theory |
---|
0:04:39 | and the notion is that if you're brain represents sounds with some set of measurements like statistics |
---|
0:04:45 | then signals that have the same values of those measurements are to some the same to you |
---|
0:04:49 | any particular |
---|
0:04:50 | sounds that we synthesise |
---|
0:04:52 | to have the same measurements as some real-world recording |
---|
0:04:55 | or to sound like another example of the same kind of thing if the measurements that we used it is |
---|
0:05:00 | synthesis |
---|
0:05:01 | or are like the ones of the brain uses to represent sound |
---|
0:05:04 | in we we've been taking as approach with with sound texture perception synthesizing textures from statistics measure |
---|
0:05:09 | in real-world sound |
---|
0:05:11 | so the basic idea is to take some examples signal like |
---|
0:05:14 | a recording of rain |
---|
0:05:15 | measure some statistics and then send the size new signals |
---|
0:05:18 | constraining them only to have the same statistics and another respect making them as random as possible |
---|
0:05:23 | and the approach that that we've taken here is very much inspired by work it was done |
---|
0:05:27 | i'm quite a while back on visual texture and this the some of the authors of which |
---|
0:05:31 | uh are mentioned here |
---|
0:05:33 | so |
---|
0:05:34 | i'm just gonna give you of a very simple toy example to just illustrate the logic let's suppose that we |
---|
0:05:38 | want to has the all the power spectrum |
---|
0:05:40 | you might think that power spectrum place on texture so we do is we measure the spectrum of some real-world |
---|
0:05:44 | world texture like this |
---|
0:05:49 | now we just one this as a random signal the same spectrum obvious is really easy we just filter noise |
---|
0:05:53 | and then we listen to them and see with they sound like and |
---|
0:05:56 | unfortunately for um had of the power spectrum and they re of texture |
---|
0:06:00 | i'm things generally sound like noise when you do this |
---|
0:06:04 | a certain like grant sounds like noise |
---|
0:06:08 | i |
---|
0:06:09 | i |
---|
0:06:11 | and is is as opposed to this |
---|
0:06:18 | i right so this this is not realistic |
---|
0:06:20 | and this tells us that we're not simply write registering the spectrum only recognise textures |
---|
0:06:24 | alright so the question is |
---|
0:06:26 | well additional simple statistic do any better |
---|
0:06:28 | um and so we been mostly looking at statistics of these two stages of representations on the on lot of |
---|
0:06:34 | subband |
---|
0:06:35 | and the modulation bands you can drive from them with with simple linear filters |
---|
0:06:40 | and we've looked in to how far we can get with with very simple statistics things like marginal moments like |
---|
0:06:44 | the variance in this you the kurtosis and to the mean |
---|
0:06:47 | as was pairwise correlations between different pieces of the representation for since different |
---|
0:06:52 | on a difference of dance |
---|
0:06:54 | or different modulation band |
---|
0:06:56 | these statistics are generic |
---|
0:06:58 | they're not tailored any specific natural sound |
---|
0:07:00 | um but they are simple and their easy to measure |
---|
0:07:02 | on the other hand because of this is not obvious that they would account for much of sound recognition but |
---|
0:07:06 | maybe there are reasonable place to start |
---|
0:07:09 | now for pretty statistics to have any hope would being useful for recognition what a minimum a have to yield |
---|
0:07:14 | different values for different types of sounds and so |
---|
0:07:16 | when i'm gonna quickly do just give you a couple of examples to give you some intuition for what kinds |
---|
0:07:20 | of things uh these might capture |
---|
0:07:22 | so it's quickly look |
---|
0:07:23 | at some of the marginal moments of uh coke clear on blocks the are able to cope there's sub ban |
---|
0:07:29 | these moments again things like the mean and the variance in this skew |
---|
0:07:32 | i statistics to describe how the up is distributed so you take |
---|
0:07:36 | a stripe of a of a |
---|
0:07:37 | cochlear their spectrogram |
---|
0:07:39 | you take the on below |
---|
0:07:40 | um collapse that across time to give you a histogram they give you the frequency of occurrence of different amplitudes |
---|
0:07:45 | um in this is a a very simple sort of representation of sound but as many you will know these |
---|
0:07:50 | kinds of ample to distributions generally differ |
---|
0:07:53 | for natural sounds and for noise and they they vary between different kinds of natural sounds so you just a |
---|
0:07:57 | quick example |
---|
0:07:58 | these are ample dude histograms for noise is uh recording of a stream |
---|
0:08:02 | and according of geese |
---|
0:08:04 | from one particular channel |
---|
0:08:06 | and the the thing to note here is that although these distributions have about the same mean |
---|
0:08:10 | indicating a there's roughly the same acoustic power this channel |
---|
0:08:13 | the is reasons that different shape |
---|
0:08:16 | and you can see also see this visually |
---|
0:08:17 | if you just look at the spectrograms jeans that the pink noise is mostly grey |
---|
0:08:22 | where is the stream and the geese of got more black and white and so in this case the white |
---|
0:08:25 | we've we down here |
---|
0:08:26 | and the black would be up here so they they deviate more |
---|
0:08:29 | from the mean with more high able to the more low able to |
---|
0:08:33 | so many you probably recognise that |
---|
0:08:35 | this is an indication of of the common observation that natural signals are sparse the noise |
---|
0:08:39 | so the intuition is that natural sound can in events like raindrops in these calls and |
---|
0:08:43 | is a are infrequent but when they occur they produce large sample to someone they don't occur the amplitude |
---|
0:08:48 | uh tends to be low |
---|
0:08:49 | and the sparsity behaviour |
---|
0:08:51 | which alters the shape of these histograms |
---|
0:08:53 | is reflected in pretty simple statistics like the variance which measures the spread of the distribution |
---|
0:08:58 | and this you it measures the asymmetry about the mean |
---|
0:09:02 | alright so one more example let's take a quick look at |
---|
0:09:04 | what kind of correlations we can observe |
---|
0:09:06 | between on of different channels |
---|
0:09:09 | and these things also very across sounds and one of the main reasons for this is the presence of broadband |
---|
0:09:13 | events |
---|
0:09:13 | so if you listen to the sound of fire |
---|
0:09:19 | fire that what the crackles and pops and clicks |
---|
0:09:21 | and those crackles and pops are visible on spectrogram as these vertical street |
---|
0:09:27 | so these broad band of and these dependencies between channels because they excite them all at once and you can |
---|
0:09:31 | see it is if you look at correlations between channels so is just a a big matrix of correlation coefficients |
---|
0:09:36 | between pairs of |
---|
0:09:37 | cochlear filters |
---|
0:09:39 | going from low frequency to high and low to high |
---|
0:09:41 | so the tag one here's got gotta be one |
---|
0:09:43 | but they are bad channels can be whatever and you can see that for fire there's a lotta yellow |
---|
0:09:47 | and a lot red indicating at there this correlations between channels and not all sounds are like this use a |
---|
0:09:52 | stream |
---|
0:09:53 | and you can see that there's mostly green here oh looks yellow on the screen but |
---|
0:09:56 | just me green |
---|
0:09:58 | um mess we because for a lot of water sounds on the of channels are mostly correlated |
---|
0:10:02 | okay so these statistics although though there's simple they capture variation across sound |
---|
0:10:07 | um and of the question we we're |
---|
0:10:08 | trying to get at is whether they actually capture the sound of real-world text |
---|
0:10:12 | so gonna strategy |
---|
0:10:13 | is the synthesized signal constraint only to have the same statistics as some real-world world sound |
---|
0:10:18 | um but in other respects being as random as possible way we do that |
---|
0:10:21 | is by starting with the noise signal |
---|
0:10:23 | and then adjusting the noise signal to get it to have the desired statistics |
---|
0:10:26 | training in to some new signal |
---|
0:10:28 | the basic idea |
---|
0:10:29 | um is to to uh filter to the noise with the same set of filters giving you a subband representation |
---|
0:10:34 | and then to adjust the subband envelopes via a gradient descent |
---|
0:10:38 | the "'cause" and to have the desired statistical properties and so |
---|
0:10:41 | the statistics are just function as they a but we can compute their gradient |
---|
0:10:44 | and then change the a open a great direction till we get the desired statistics so that gives as new |
---|
0:10:49 | subbands and we add them back up to get a new sound signal we can listen to |
---|
0:10:53 | there is just a a a a flowchart i won't give you all the details here but |
---|
0:10:56 | and the the basic strategy is the first measure the statistics of a real world sound texture i'm after processing |
---|
0:11:02 | it in the auditory model |
---|
0:11:03 | um and then processing noise in the same way and altering its on blocks to give the same statistics an |
---|
0:11:08 | an iterative process that you have to do to get this to converge |
---|
0:11:11 | but the end result as the sound signal that shares the statistics of a real world sound so the question |
---|
0:11:15 | is questions had they sound |
---|
0:11:16 | um so |
---|
0:11:17 | we we're asking this question again because of the statistics account for texture perception will then the synthetic signal should |
---|
0:11:22 | sound like new examples of the real thing |
---|
0:11:24 | um |
---|
0:11:26 | and interestingly in in many cases they do some in play you |
---|
0:11:28 | a sequence of synthetic sounds are just generated from noise |
---|
0:11:32 | um by forcing the noise to have some of the same statistics as various real world sounds so you get |
---|
0:11:36 | things a sound like rain |
---|
0:11:41 | streams |
---|
0:11:44 | bubbles |
---|
0:11:47 | fire |
---|
0:11:50 | applause |
---|
0:11:53 | when |
---|
0:11:54 | i |
---|
0:11:57 | in |
---|
0:12:01 | birds |
---|
0:12:05 | a crowd noise |
---|
0:12:09 | so it also works for a lot of a natural sounds things like rustling paper |
---|
0:12:16 | or a jackhammer |
---|
0:12:21 | i so the success of the is the suggest these statistics could underlie the representation and recognition of text |
---|
0:12:27 | so we did it a quick experiment to to see whether this was true and human listeners |
---|
0:12:31 | uh people were presented with a five second sound clip and had identified from five choices so chance performance here |
---|
0:12:37 | is twenty percent |
---|
0:12:38 | and we presented them with |
---|
0:12:40 | uh synthetic signals that we're synthesized with different numbers of statistical constraints |
---|
0:12:44 | as well as the original |
---|
0:12:46 | you can see here that when we just match the power spectrum i'm people are are above chance but not |
---|
0:12:50 | very good |
---|
0:12:51 | but the performance improves as we add in more statistics and to with the force set |
---|
0:12:55 | um of |
---|
0:12:55 | the the model that i should you previously |
---|
0:12:57 | um you all as gets with the originals |
---|
0:13:01 | so this all states that these simple statistics can in fact support recognition of of real world text |
---|
0:13:07 | another point that that's just worth quickly mentioning is that the scent is here is not simply reproducing the original |
---|
0:13:11 | waveform um so because the procedure |
---|
0:13:14 | is initialised with noise |
---|
0:13:15 | it turns out a different sound single every time that share only the statistical properties and these are just three |
---|
0:13:20 | examples of |
---|
0:13:21 | waves that we're synthesized from a single set of statistics measured and a single recording and you get a a |
---|
0:13:26 | very different thing each time and you can make as many these as you want so this it is a |
---|
0:13:30 | really capturing |
---|
0:13:31 | i'm some more abstract property of the sound signal |
---|
0:13:34 | alright so uh one other in question is whether these texture statistics that seem to be implicated in human texture |
---|
0:13:39 | perception would will also be useful for machine recognition |
---|
0:13:43 | and at present we don't really have an ideal task |
---|
0:13:45 | um with with which to test this because we we need lots and lots of label textures and of any |
---|
0:13:49 | of you have those i be interested to get them |
---|
0:13:52 | uh but then S i'm had an idea that a in interesting potential application for this |
---|
0:13:56 | um might be video soundtrack classification so as a everybody knows there's lots of interest these days |
---|
0:14:02 | in |
---|
0:14:02 | um being able to search for video clips um depending on the on their content |
---|
0:14:06 | and the got is hands on |
---|
0:14:08 | i dataset courtesy of a colleague at can be a you gain G |
---|
0:14:12 | where you getting |
---|
0:14:13 | and had a a a bunch of people view video clips in an interface like this |
---|
0:14:16 | so they would |
---|
0:14:17 | watch something like this and then |
---|
0:14:21 | uh |
---|
0:14:22 | but they would hear something like this |
---|
0:14:25 | blue |
---|
0:14:30 | funny here |
---|
0:14:32 | powerpoint got a low |
---|
0:14:34 | a |
---|
0:14:37 | i |
---|
0:14:37 | alright so that was the soundtrack and then they would look at this thing and a that the check all |
---|
0:14:41 | the boxes that applied and so some of these things are attribute you pry can seem in the back but |
---|
0:14:45 | some of them are attribute of the video |
---|
0:14:46 | others there's describe the audience of this case the person is check cheering and clapping |
---|
0:14:51 | they said this was a outdoor in a all environments as a whole bunch of labels a get attached to |
---|
0:14:55 | each of these videos |
---|
0:14:57 | and so the idea is that the texture statistics can be used as uh features for um svm classifications you |
---|
0:15:03 | can train up svms |
---|
0:15:04 | um to to recognise these particular labels and distinguish them from others |
---|
0:15:09 | an order for this to work |
---|
0:15:10 | course the statistics have to give you different values for different labels and what this um plot shows is just |
---|
0:15:15 | the average values of the different statistics for the different labels and in the set |
---|
0:15:20 | i'm so the labels are going on the vertical axis and the different statistics on the horizontal axis and the |
---|
0:15:25 | point here is just that as you scan down the columns the the colours change |
---|
0:15:28 | right so |
---|
0:15:29 | the different labels are on average associated with different statistics |
---|
0:15:33 | um |
---|
0:15:33 | and so |
---|
0:15:34 | we find that uh you you can do some degree of classification with with these statistics |
---|
0:15:39 | um this is the average performance across all the categories |
---|
0:15:42 | which is overall modest but one one of the things to point out here is that |
---|
0:15:46 | you know the see a pattern it's qualitative like what you see in in the human uh observers that is |
---|
0:15:50 | performance is not that great when you just match the mean value of the on loves that is the the |
---|
0:15:55 | spectrum |
---|
0:15:56 | we gets better as you add in more statistic |
---|
0:15:59 | um some kind click labels get categories better than others the speech or music are pretty easy as a lot |
---|
0:16:03 | you probably know |
---|
0:16:04 | um i'm showing here part |
---|
0:16:06 | this it to show that the pattern that you get is a little different for different categories so for music |
---|
0:16:11 | for instance |
---|
0:16:12 | on the modulation power in in the modulation band seems to matter a lot you get a in classification there |
---|
0:16:17 | was for speech |
---|
0:16:18 | um the cross and correlations |
---|
0:16:20 | on that measure things like a modulation is more |
---|
0:16:24 | um performance is is poor for some of the labels ending in part because they're acoustically heterogeneous that is it |
---|
0:16:28 | just not really well suited to |
---|
0:16:30 | the representation we have so one of the labels is er and you can imagine that that consists of like |
---|
0:16:35 | a lot of different kinds of sound textures |
---|
0:16:37 | and so this that is you are not great there |
---|
0:16:39 | so it it's not really ideal task it sort of just more of a proof of concept |
---|
0:16:43 | um i think to really use these statistics for classifying semantic categories like this like urban |
---|
0:16:48 | you probably have to first recognise particular textures like |
---|
0:16:51 | traffic or crowd noise |
---|
0:16:52 | and then link those labels to the category |
---|
0:16:55 | so take a message is here um the first thing is just that text use are you back what is |
---|
0:17:00 | and i i think |
---|
0:17:01 | important and worth studying and i i think they may involve |
---|
0:17:04 | a unique form of representation relative to other um kinds of of auditory phenomena i'm namely summary statistics |
---|
0:17:11 | and so we find that |
---|
0:17:12 | naturalistic textures can be generated from |
---|
0:17:15 | relatively simple summary statistics of early auditory representations marginal moments and and pairwise correlations of cochlear modulation filters |
---|
0:17:23 | and the suggestion is that listeners are using similar statistics |
---|
0:17:26 | to recognise sound texture so when when you remember the the sound of a fire or the sound of rain |
---|
0:17:30 | we think you're just remembering |
---|
0:17:32 | some of these summary statistics |
---|
0:17:33 | um |
---|
0:17:34 | and the suggestion is that some statistics um should be useful for machine recognition of textures that something that will |
---|
0:17:39 | continue to explore |
---|
0:17:40 | thanks |
---|
0:17:46 | thus |
---|
0:17:47 | question |
---|
0:17:52 | i one question |
---|
0:17:53 | do you expect the same kind of sit is is useful and in recognizing speech or or other to a |
---|
0:17:58 | six signal |
---|
0:17:59 | that |
---|
0:17:59 | different |
---|
0:18:00 | something between you know something completely that |
---|
0:18:02 | yes i think one one interesting notion |
---|
0:18:05 | um is that of so textures the things stationary right and so it makes sense to compute be summers that |
---|
0:18:09 | this six where you're averaging things over the |
---|
0:18:12 | length of the signal |
---|
0:18:13 | um for so signals where you're interested in the nonstationary structure |
---|
0:18:17 | um |
---|
0:18:18 | what you might wanna do is |
---|
0:18:20 | i'm Q those |
---|
0:18:20 | the statistics but averaged over a local time windows um so that the statistics would give you sort of a |
---|
0:18:25 | trajectory |
---|
0:18:26 | uh over time |
---|
0:18:27 | of uh the way that the local structure changes in so there's a lot of sounds |
---|
0:18:32 | that kind of locally or texture like um but they have some kind of temporal evolution |
---|
0:18:36 | yeah so like |
---|
0:18:37 | yeah when you when you get in the bad |
---|
0:18:39 | like the sound |
---|
0:18:40 | you make is you kind of go over the sheets and that everything's rustling |
---|
0:18:44 | you know those are those are different textures but they're sort of sequence them like a particular way |
---|
0:18:48 | and |
---|
0:18:48 | um |
---|
0:18:49 | i don't know how use that would be for something like speech but certainly for i think some kinds of |
---|
0:18:53 | nonstationary sounds |
---|
0:18:55 | that sort of approach |
---|
0:18:56 | of looking at the temporal evolution of the detection |
---|
0:18:59 | may maybe be used |
---|
0:19:08 | oh |
---|
0:19:08 | rating |
---|
0:19:09 | yeah it's very question um |
---|
0:19:11 | so |
---|
0:19:13 | i i'm a very interested in this i think that um |
---|
0:19:16 | what you might need to do is actually before so all all these statistics |
---|
0:19:20 | like i guess their time averages right so like the correlation is an average of a product right or whatever |
---|
0:19:24 | that the variance the average of deviation |
---|
0:19:27 | and before you do the time average what you might need to do is do some kind of clustering |
---|
0:19:31 | um |
---|
0:19:31 | because |
---|
0:19:32 | you know for instance in in the case of |
---|
0:19:34 | um like let's see have you know we and |
---|
0:19:36 | which |
---|
0:19:37 | um |
---|
0:19:38 | is it really very correlated it across channels and like clapping that is |
---|
0:19:42 | right |
---|
0:19:43 | so you're gonna have sort of a a a a a a mixture of these two very different kinds of |
---|
0:19:46 | events |
---|
0:19:47 | um some of which will be close to hunt sent correlated another with as a which one not |
---|
0:19:50 | so if you just combine as you're gonna get a correlation of point five which is not |
---|
0:19:54 | really good representation |
---|
0:19:59 | yeah |
---|
0:20:07 | yeah so i i i think it's a it's interesting problem and i mean it is related to |
---|
0:20:10 | um |
---|
0:20:11 | some of the ways that people are thinking about |
---|
0:20:14 | sound segregation um |
---|
0:20:15 | in terms of clustering um |
---|
0:20:17 | and so you you may really in you may have to do some kind of segregation lean or a model |
---|
0:20:22 | um there are cases where works but |
---|
0:20:24 | case word |
---|
0:20:28 | i i uh i just have a quick question for all their experiments especially this subject and classification uh what |
---|
0:20:34 | was the sampling rate of the so |
---|
0:20:35 | or clips to use |
---|
0:20:37 | what was the sampling rate of the sound signals |
---|
0:20:39 | uh pride |
---|
0:20:40 | when T K |
---|
0:20:47 | yes |
---|
0:20:48 | yeah |
---|
0:20:49 | that's correct yeah |
---|
0:20:50 | i mean the the on looks so you know this this is are can on on of a course those |
---|
0:20:54 | are |
---|
0:20:55 | you know there's a low things right so |
---|
0:20:58 | they have an effective sampling rate of something much lower like you know |
---|
0:21:01 | a few hundred or |
---|
0:21:04 | that's right the actual sound files have have you know pretty high |
---|
0:21:06 | normal sent |
---|
0:21:08 | it you check to see down |
---|
0:21:10 | fact and the accuracy at a statistical nature its um |
---|
0:21:14 | especially by don't links then that K to attack channel i |
---|
0:21:17 | it's higher arg statistics really stack failing we you the link it's right or yeah yeah i a great question |
---|
0:21:23 | how |
---|
0:21:24 | how line line where your files that channel and so i'd i stand really work a pretty long things like |
---|
0:21:29 | five seconds but um i have looked at things shorter and the |
---|
0:21:34 | most of this this six are robust down to a pretty short lengths of |
---|
0:21:38 | when you start measuring things like kurtosis and stuff then it gets a bit less robust but |
---|
0:21:42 | that's actually not that and that not being that important for the synthesis not free yeah yeah |
---|
0:21:47 | well well the could of the a of um so it the variance kind does most the work there |
---|
0:21:53 | and the correlations stuff half i did you yeah why that order statistics |
---|
0:21:57 | uh nothing more than kurtosis yeah |
---|
0:22:00 | i think |
---|
0:22:01 | yeah |
---|
0:22:02 | yeah |
---|
0:22:03 | it looks like that way you want |
---|
0:22:05 | a of you all take non parametric |
---|
0:22:08 | a man |
---|
0:22:09 | where you collect all somewhere i |
---|
0:22:11 | a to anyone at all |
---|
0:22:12 | yeah |
---|
0:22:13 | now have a look at that you and power macs |
---|
0:22:17 | actually may |
---|
0:22:19 | all |
---|
0:22:20 | oh oh in |
---|
0:22:22 | each it |
---|
0:22:22 | or |
---|
0:22:23 | you |
---|
0:22:23 | if |
---|
0:22:25 | right |
---|
0:22:25 | it's a little |
---|
0:22:26 | you know which |
---|
0:22:28 | oh |
---|
0:22:28 | what |
---|
0:22:31 | also |
---|
0:22:33 | you |
---|
0:22:33 | kernel |
---|
0:22:35 | i can say what |
---|
0:22:36 | you you have and from |
---|
0:22:38 | which |
---|
0:22:39 | you |
---|
0:22:40 | actually |
---|
0:22:41 | oh those i one what |
---|
0:22:43 | i |
---|
0:22:45 | yeah i think that that's an interesting direction um i mean |
---|
0:22:49 | i i we we've been you moving in in and directions like that haven't tried yet |
---|
0:22:53 | that |
---|
0:22:55 | and the uh oh tutorial on music signal processing the as a discussion of texture like if X and the |
---|
0:23:00 | perception of |
---|
0:23:02 | combination tones and chord |
---|
0:23:04 | if you look good any correlations uh between your work and my kind of |
---|
0:23:10 | um |
---|
0:23:11 | i i don't know exactly which are referring to but i i i mean i know that i mean the |
---|
0:23:14 | word textures used a lot on the context of music |
---|
0:23:17 | um |
---|
0:23:18 | where they tip they're talking about kind a higher level |
---|
0:23:20 | types of a |
---|
0:23:21 | um |
---|
0:23:22 | i mean i have tried to the size musical textures |
---|
0:23:25 | i done work that well |
---|
0:23:26 | um |
---|
0:23:27 | and there's |
---|
0:23:28 | a lot of interesting reasons for why i think that is um i we we need some more complicated statistics |
---|
0:23:33 | essentially for that |
---|
0:23:34 | um but music is one of the things that really doesn't work very well on in general things that |
---|
0:23:38 | things are composed of sounds with pitch and the not |
---|
0:23:40 | as that |
---|
0:23:40 | well |
---|
0:23:41 | so it works great on like environmental sounds and |
---|
0:23:44 | um you machine noise and things like that |
---|
0:23:47 | thanks to us |
---|
0:23:50 | i |
---|