0:00:13 | so |
---|
0:00:15 | that i'm looking at that time i realise really horrible so |
---|
0:00:18 | well one too much |
---|
0:00:19 | um so you get a bit of the motivation that i have behind this the this work |
---|
0:00:23 | um |
---|
0:00:23 | one complained of we have a all had when it came to audio is that |
---|
0:00:27 | all the models all are |
---|
0:00:28 | use a lot of user are not doing you sort of have to always have all these uh constraint that |
---|
0:00:32 | you put in a lot one harmonic sounds or the sound be |
---|
0:00:35 | noise to be stationary this all is a lot of |
---|
0:00:37 | contribution from the user and um |
---|
0:00:39 | somebody the love that this um some slightly allergic to that idea um i i don't want |
---|
0:00:43 | myself to input a lot of information to is them one assistant to learn that information some |
---|
0:00:48 | um the other thing that that motivates me a lot is that |
---|
0:00:51 | when you see a lot of work we die in audio we always have this and third in the end |
---|
0:00:54 | uh up plus and T uh and that basically has to be any kind of interference and of course because |
---|
0:01:00 | we don't |
---|
0:01:01 | uh we're not a comparable with math we assume a gaussian it makes like easy uh but of course you |
---|
0:01:05 | have to people speaking at the same time the second person is not gonna be just a gaussian in signal |
---|
0:01:09 | something much |
---|
0:01:10 | a complicate |
---|
0:01:11 | um so a lot of the work or not that's really carry well sure |
---|
0:01:14 | um and the third motivating point is that |
---|
0:01:17 | um especially nowadays that for the |
---|
0:01:19 | the good experiment is one example of that when you have a lot of data you can have some very |
---|
0:01:23 | very simple very very stupid algorithms |
---|
0:01:26 | and the perform very complicated system |
---|
0:01:28 | uh which sort of a a a very humbling experience |
---|
0:01:30 | um so |
---|
0:01:32 | uh these the things i want to keep in mind during this talk and is gonna be a four point |
---|
0:01:35 | which is also very important but that's gonna come later |
---|
0:01:39 | um so oh |
---|
0:01:41 | the uh first observation about gonna make is that as far as i'm concerned a lot of the fun stuff |
---|
0:01:45 | you can do with audio |
---|
0:01:46 | uh i have to do with them |
---|
0:01:47 | as the the that's gonna be the sound quality |
---|
0:01:49 | um so uh if you wanna do classification separation or full or sound does for me and |
---|
0:01:54 | well just a fun ways uh but you really care about is the magnitude spectrum because so it happens are |
---|
0:01:58 | here likes to a uh uh take that much more than say phase |
---|
0:02:02 | um some someone gonna be talking about that the me |
---|
0:02:04 | um |
---|
0:02:05 | uh for now |
---|
0:02:06 | um and that the other of gonna make is that if we normalize speced right that really change things if |
---|
0:02:11 | i speak twice as loud |
---|
0:02:13 | i'm still gonna be saying the same thing |
---|
0:02:14 | it's not gonna make a difference |
---|
0:02:16 | um so the presentation gonna be using throughout this talk is that |
---|
0:02:20 | a point |
---|
0:02:21 | and i hope the people in the back and see we of that |
---|
0:02:23 | um |
---|
0:02:24 | whenever you see a a a uh through phone upper keys later that's real had on it that means suspect |
---|
0:02:28 | spectrum that's normalized so basically we divided by the sum of what its elements |
---|
0:02:32 | so might expect normalized |
---|
0:02:33 | um so um uh a it's a basic a some some to one |
---|
0:02:37 | um what that means is that it starts to that's of the uh and during a strange space uh because |
---|
0:02:43 | a log by the entire space of spectra is gonna be basically line the simplex |
---|
0:02:47 | to be a subspace of the overall um uh |
---|
0:02:50 | but a spectrum |
---|
0:02:52 | and he's well that looks like let assume we have a spectrum that only has three frequency |
---|
0:02:56 | uh the way we can represent it if we know that's normalized |
---|
0:02:59 | uh is uh a uh of uh with that a simplex with you shown here |
---|
0:03:02 | a a each but of the simplex is that gonna corresponds to a frequency so we have on the lower |
---|
0:03:06 | left |
---|
0:03:07 | the low frequency um on the top we have a lower middle frequency |
---|
0:03:11 | um the lower um |
---|
0:03:12 | uh right we have our uh a high frequency |
---|
0:03:15 | so |
---|
0:03:16 | within that symbol X we can represent any kind of spectrum that has only three frequencies |
---|
0:03:20 | oh assuming you don't care about amplitude so for example all the low frequency sounds are gonna and been there's |
---|
0:03:25 | the region all the high frequency sounds will be this region |
---|
0:03:28 | any point or on the middle is gonna be this wide component that using all of the uh frequencies at |
---|
0:03:32 | the same time |
---|
0:03:33 | so it's of a sort of simple mixing model |
---|
0:03:37 | so um |
---|
0:03:39 | contras not great but |
---|
0:03:40 | one |
---|
0:03:41 | so that's talk about a very very basic sound model based and that idea on that uh uh uh a |
---|
0:03:45 | representation |
---|
0:03:47 | so i can go out |
---|
0:03:48 | record five hours of for speech of somebody speaking |
---|
0:03:51 | um and then every time having you recording of that person and what i can do |
---|
0:03:55 | um is basically go through they are uh normalized spectra uh of the new recording us for example it's a |
---|
0:04:01 | would pick uh the spectrum over here |
---|
0:04:03 | and what i can do is a simple matching try to figure out which spectrum out of all the training |
---|
0:04:08 | data is the closest to it so it's a simple nearest neighbor operation |
---|
0:04:11 | i'm just trying to find the spectrum it has them you the same uh uh uh uh sort of look |
---|
0:04:16 | at it |
---|
0:04:17 | um so that nothing special about doing something like that |
---|
0:04:19 | uh i just to gig an example to get oriented that's what the input was |
---|
0:04:24 | we have some |
---|
0:04:29 | sound |
---|
0:04:29 | uh he's the input |
---|
0:04:31 | yeah |
---|
0:04:33 | i |
---|
0:04:34 | and he what it ends up being approximated if we just swap all the spectra with the closest looking spectrum |
---|
0:04:39 | from the training data |
---|
0:04:40 | yeah |
---|
0:04:42 | yeah |
---|
0:04:43 | it's not a good representation but it's sort of a getting the just of it |
---|
0:04:47 | um what what happens in a and as john that to that we're thinking about |
---|
0:04:51 | uh it's it looks like they we're gonna have the red points point being a training data |
---|
0:04:55 | we're being given some blue point which are the spectra of the sound would trying to lies we're always trying |
---|
0:04:59 | to find was the closest point to the point of observing right now and when a swap that in my |
---|
0:05:03 | presentation |
---|
0:05:04 | so there's nothing super special |
---|
0:05:07 | oh this one point of wanna make um |
---|
0:05:09 | this is not a euclidean space because we have a because train and that simplex things get a little funny |
---|
0:05:14 | um so we can really use you can in this then |
---|
0:05:16 | uh it's is gonna ignore a lot of the uh uh uh of properties of that space which make you |
---|
0:05:21 | unique |
---|
0:05:21 | um and that one will too much on the on the details um uh but basically if you're working on |
---|
0:05:26 | a simplex |
---|
0:05:27 | uh you you can't assume anything's many things you have there's we have something like a dirichlet distribution |
---|
0:05:31 | and what that means is that the a proper distance measure in that space is gonna be the cross-entropy between |
---|
0:05:35 | the spec |
---|
0:05:36 | when we're doing that nearest neighbor search we're looking at the cross entropy between the normalized spectra are not something |
---|
0:05:41 | that like the L two a just |
---|
0:05:43 | so um |
---|
0:05:45 | um the uh a whole point of the stock lowest to uh analyse mixtures uh because we don't wanna have |
---|
0:05:49 | the last and of the there |
---|
0:05:51 | um so um now he's have gonna start with this |
---|
0:05:54 | um i'm gonna make a small something to begin with |
---|
0:05:56 | uh whenever you have a mixture of two sounds and here we have and adding operation |
---|
0:06:00 | the magnitude spectrogram |
---|
0:06:02 | of that mixture |
---|
0:06:03 | i'm gonna soon is gonna be equal to the magnitude spectrograms of the individual sounds |
---|
0:06:08 | had we been able to observe them uh on the own |
---|
0:06:11 | this is not exactly correct because as a little money you at by taking the magnitude |
---|
0:06:14 | uh but an average it's it's uh it's a fine a some |
---|
0:06:18 | um |
---|
0:06:19 | the other thing is that makes is not necessarily something a process that's or of uh a markovian in in |
---|
0:06:23 | anyway |
---|
0:06:24 | so we can just look at one vector at any point in time in that make sure |
---|
0:06:28 | so what we doing here is that was saying that the this particular spectrum but we're observing of the mixture |
---|
0:06:33 | is gonna be a some of the corresponding uh spectra of the original sources at that the at the same |
---|
0:06:37 | time |
---|
0:06:38 | so |
---|
0:06:39 | a very simple idea |
---|
0:06:41 | um |
---|
0:06:42 | and guess what happens when you look at this uh uh statement uh uh in the space that where in |
---|
0:06:47 | i i uh we're gonna have again as of three frequent a simplex |
---|
0:06:50 | will be observing a point which is a mixture between or two sources |
---|
0:06:53 | and that point will have to lie |
---|
0:06:55 | on the subspace space that connects the two point |
---|
0:06:58 | of the spectra that uh uh uh a a that the one thing to create that make sure so uh |
---|
0:07:02 | in the previous example we're gonna have these to but these two spectra which were what be clean sounds were |
---|
0:07:06 | like |
---|
0:07:07 | oh those are gonna be represented by these two points |
---|
0:07:09 | any point in spectrum that lies on this subspace between those two points would be a plausible blend of these |
---|
0:07:15 | two spec |
---|
0:07:17 | um and then how far your along that line tells you how much of this spectrum is a contributing |
---|
0:07:24 | so now that we have this model um we can have a slightly us sir of a updated version of |
---|
0:07:29 | the nearest neighbours uh idea |
---|
0:07:31 | now was gonna have is that i'm gonna of a mixture of sounds |
---|
0:07:33 | and that's the only thing i'm gonna have i don't know what the original sounds are |
---|
0:07:37 | a exactly |
---|
0:07:38 | um i can go to my database and say well that looks like in this make sure i have a |
---|
0:07:42 | somebody speaking in a whole bunch of chirping better |
---|
0:07:45 | i can get a gigabyte of speech i can get a gigabyte of chirping birds no big deal now days |
---|
0:07:49 | um and what i have to do is that for every spectrum that i'm observing in the mixture i'm gonna |
---|
0:07:53 | try to find |
---|
0:07:54 | one spectrum from the speech database and one spectrum from these are chirping a database |
---|
0:07:59 | that we combine well together in or to approximate what i'm observing |
---|
0:08:03 | so become this humongous search |
---|
0:08:05 | and this um is gonna be that if i do that in a find these two spectra |
---|
0:08:08 | a would be before approximations of the spectre that i have an observed of the original uh clean source |
---|
0:08:14 | uh what that means again in the space that where N is that if you're gonna have a frequency simplex |
---|
0:08:19 | so spectrum simplex |
---|
0:08:20 | that's gonna be a subspace space where you have say the red source because it has |
---|
0:08:23 | is particular temple car uh row characteristic |
---|
0:08:26 | is gonna be a of neighbourhood or a subspace that has |
---|
0:08:29 | uh the blue source |
---|
0:08:30 | um i'm gonna try to find |
---|
0:08:32 | all the lines that connect the blue point in a red point |
---|
0:08:35 | and uh uh tried to figure out which is the one that passes the closest to any of them of |
---|
0:08:38 | a of my neck spec |
---|
0:08:40 | um |
---|
0:08:41 | um at this point you're probably thinking you must be knots |
---|
0:08:44 | uh because this is a a horrible search uh a problem uh just to give you some numbers |
---|
0:08:49 | if you have ten means of training data which is not a lot of data um we were talking about |
---|
0:08:53 | seventy five thousand spectra per source |
---|
0:08:55 | uh and of course is or something like two thousand mention |
---|
0:08:57 | uh ten second make sure is gonna be about twelve hundred spectra that was down to about five and a |
---|
0:09:02 | half billion searches for every spectrum of or input |
---|
0:09:05 | um |
---|
0:09:05 | so it's not gonna happen uh uh if you wait it |
---|
0:09:08 | what |
---|
0:09:09 | um hmmm |
---|
0:09:11 | but there's a to actually relax the problem uh make more of a of a of a can thing is |
---|
0:09:14 | optimisation problem |
---|
0:09:16 | a one get much details but it's extremely boring uh but uh uh the when gonna as a sort of |
---|
0:09:20 | a really describe it is that we're gonna try to uh uh basically use all our training data is being |
---|
0:09:25 | this huge huge basis um uh set so every spectrum or training data i will and the being a basis |
---|
0:09:30 | vector |
---|
0:09:31 | uh we could serve concatenate all that data together |
---|
0:09:33 | and our goal is to find |
---|
0:09:35 | a how to combine all this |
---|
0:09:37 | uh this over complete bases in such a way that i'm only using one spectrum from uh each of the |
---|
0:09:41 | two uh uh source |
---|
0:09:43 | um and if i stated that weight sounds like a sparse coding problem |
---|
0:09:47 | uh it's it's not it's not particularly hard to uh the to |
---|
0:09:50 | uh to to solve so uh i got one world |
---|
0:09:52 | and it too much |
---|
0:09:53 | and is an approximate solution but it's a lot faster |
---|
0:09:56 | uh |
---|
0:09:58 | of of use an example of how the think behaves when you have mixture |
---|
0:10:01 | um we have uh do uh original sound of the top and this of the sound i never get to |
---|
0:10:05 | observe |
---|
0:10:06 | um so just a play one of them |
---|
0:10:09 | i |
---|
0:10:11 | i |
---|
0:10:12 | timit stuff stuff you will heard of that |
---|
0:10:13 | um and then i have a mixture |
---|
0:10:15 | uh of uh two speakers |
---|
0:10:16 | i |
---|
0:10:17 | a |
---|
0:10:18 | i |
---|
0:10:19 | i also have a lot of training data of these two particular speaker is now what it can do is |
---|
0:10:23 | do this huge search uh our optimisation try to approximate every spectrum of the mixture as a superposition of any |
---|
0:10:29 | two spectra from those uh |
---|
0:10:30 | to speaker data |
---|
0:10:32 | uh if i do that um i can reconstruct a the two sources |
---|
0:10:36 | and the Q for example one of them |
---|
0:10:40 | i |
---|
0:10:41 | now |
---|
0:10:42 | "'cause" the thing um |
---|
0:10:44 | i see a lot of familiar faces are able be thinking what the hell you just row paper to the |
---|
0:10:47 | they they go in it it so much better was the point of doing this |
---|
0:10:50 | uh and this is gonna be my for um my four points |
---|
0:10:53 | um |
---|
0:10:54 | the whole point of this representation is that we don't necessarily want to separate sounds um |
---|
0:10:58 | i can't for the life of me think why anybody would wanna separate the sound because what is only one |
---|
0:11:02 | wanna separate the sound because you wanna do |
---|
0:11:04 | speech recognition or you wanna do pitch detection you wanna do something afterwards |
---|
0:11:07 | separation by cell it's gonna use less |
---|
0:11:09 | um in fact is no |
---|
0:11:10 | and that we do that |
---|
0:11:12 | um so the whole point of this representation is that |
---|
0:11:15 | we have a very nice semantic way of describing the speed of the uh um |
---|
0:11:19 | uh a the mixture |
---|
0:11:20 | uh by saying that we have these to clean spectra that come together |
---|
0:11:24 | to approximate our uh uh are make sure |
---|
0:11:26 | you just the yeah uh ability to do a lot more uh smart processing |
---|
0:11:30 | uh because that speced will carry some semantic information but |
---|
0:11:33 | so here's one quick example |
---|
0:11:35 | um suppose that we have a mixture of two instruments playing is big |
---|
0:11:38 | which of the one of here |
---|
0:11:40 | um and then i have some training data of those two instruments isolate |
---|
0:11:43 | it's very easy for me to use a pitch track here in pitch stack all of the spectra in the |
---|
0:11:47 | uh in the training data so that means that every spectrum that i have there |
---|
0:11:51 | oh is gonna be associated with the pitch value |
---|
0:11:54 | by do this kind of the composition |
---|
0:11:55 | and basically gonna be explaining every spectrum in the mixture |
---|
0:11:58 | as a superposition of two spectra from a check from my training data |
---|
0:12:02 | and that the i will have a pitch label |
---|
0:12:03 | added to it so that means at that point i know exactly what are the two pitch is that |
---|
0:12:07 | a sounding for that particular time in the mixture |
---|
0:12:10 | and what's nice about it |
---|
0:12:12 | is that if uh we of did experiments starting from a |
---|
0:12:15 | uh doing a solo uh uh instruments a so basic just doing a nearest neighbor search all the we have |
---|
0:12:19 | to having five instruments playing a the world would wind instruments at the same time |
---|
0:12:23 | um and sir of here the results in terms of a error and standard deviation in heard |
---|
0:12:28 | and uh we see can see the we sort of go from |
---|
0:12:30 | and then are of for about four hz |
---|
0:12:32 | uh average error four hz for this all case of forty two herds |
---|
0:12:36 | i wouldn't have five in miss playing so that's not something we could do if we had a uh if |
---|
0:12:40 | we just a if the monophone a pitch tracking algorithm |
---|
0:12:43 | because we're using this way of decomposing mixtures and of things that we can already have labeled |
---|
0:12:47 | uh we get this extra ability to um uh |
---|
0:12:50 | uh to get a a a a uh of extra labels for mixtures |
---|
0:12:53 | uh i'm not example of that is phonetic tagging |
---|
0:12:56 | um um if i have a mixture of two sounds of of to uh speakers and then i can associate |
---|
0:13:00 | that with spectra from the clean recordings i also have a lot of labels that come with that spectra i |
---|
0:13:05 | know what phoneme corresponds of that spectrum |
---|
0:13:07 | i i can maybe do some emotion recognition know what the motion state is i know what's speaker get that |
---|
0:13:11 | was like a what page that speaker was a a a was speaking at |
---|
0:13:15 | um and what happens is that we only see a very mild degradation when we try to analyse mixtures so |
---|
0:13:20 | uh |
---|
0:13:20 | uh in this case just some numbers |
---|
0:13:23 | a simple numbers the for a one speaker if we just do a nearest neighbor search and |
---|
0:13:26 | a little bit of smoothing we can get a phoneme error forty five percent |
---|
0:13:29 | what you have two speakers you gone of uh phoneme error of fifty four percent which the fairly mild |
---|
0:13:33 | uh us of increasing the error |
---|
0:13:35 | uh even of the problem is considerably harder |
---|
0:13:38 | i and we get about ten percent or eight percent uh |
---|
0:13:41 | a worse results everytime you adding you speak |
---|
0:13:43 | uh so grace will be dealing with uh would be a |
---|
0:13:46 | a with the mixture case |
---|
0:13:49 | so um |
---|
0:13:50 | uh to rub that up |
---|
0:13:52 | um |
---|
0:13:52 | this is just a simple geometric looked to a sort of uh consider mixtures uh the point i'm trying to |
---|
0:13:57 | make is that |
---|
0:13:58 | you really have to incorporate the idea that sounds makes into a model you can just say well i'm gonna |
---|
0:14:03 | have a scream model and hopefully people figure out a way to to to deal with X the sources |
---|
0:14:07 | uh was for this a model that start to the idea that things are gonna be mixed together |
---|
0:14:10 | um |
---|
0:14:12 | uh we we really care about uh |
---|
0:14:14 | yeah |
---|
0:14:19 | how you know my schedule |
---|
0:14:21 | yeah |
---|
0:14:21 | uh |
---|
0:14:23 | um |
---|
0:14:23 | the whole idea of re composing is based on uh a a lot of that i sort of uh uh |
---|
0:14:27 | uh an interesting concept that |
---|
0:14:29 | we see a lot of the computer vision uh uh literature we see it a lot in the text mining |
---|
0:14:33 | literature richer um we don't really see that much in the audio space uh if you have a lot and |
---|
0:14:37 | lot lots a lots of recordings you should be able to explain pretty much everything that comes in |
---|
0:14:41 | um i dream is that at some point or |
---|
0:14:43 | speech databases are gonna be so big |
---|
0:14:45 | you just to a nudist search a neighbour an a sentence will give you the scent and you don't have |
---|
0:14:49 | to do all this to process |
---|
0:14:50 | so been done and text right face if i search first question we'll go |
---|
0:14:53 | somebody has already has that |
---|
0:14:55 | um so it's only a matter of time and would be speech |
---|
0:14:58 | um |
---|
0:14:59 | a and the other thing is that |
---|
0:15:00 | uh |
---|
0:15:01 | the thinking about separation is really "'em" we point uh the |
---|
0:15:04 | the is something else we have to do after separation |
---|
0:15:07 | and the if there's any message they can be you with is that |
---|
0:15:10 | being able to uh analyse make and some smart way about figuring out what's in the mixture not necessarily have |
---|
0:15:15 | to extract that uh that the information out of a |
---|
0:15:18 | so |
---|
0:15:19 | and most of them so that here |
---|
0:15:20 | make it |
---|
0:15:26 | uh uh we get for some questions |
---|
0:15:29 | i uh |
---|
0:15:30 | i |
---|
0:15:33 | hello huh |
---|
0:15:35 | um |
---|
0:15:36 | okay once i comment |
---|
0:15:38 | some of those |
---|
0:15:39 | like music actually do want to separate the sounds because we interested in re mixing a on i'm just gonna |
---|
0:15:44 | mention that K O and might my and to that is that want to makes it then you just one |
---|
0:15:49 | to mix the music you don't want to extract the source |
---|
0:15:51 | because you gonna put it back in the mix |
---|
0:15:54 | i |
---|
0:15:56 | we we could talk about up fixing off fine |
---|
0:15:58 | um |
---|
0:15:59 | the the the other thing i just i just wanted to ask was she made this provocative statement you can |
---|
0:16:04 | use euclidean distance yeah |
---|
0:16:06 | and then of two slides later you said |
---|
0:16:09 | but i'm can use the L two norm |
---|
0:16:11 | so i O i did a say that i i i could've sworn you did if i did it was |
---|
0:16:15 | a mistake i didn't i is used to do or |
---|
0:16:18 | but and the center student and you say were getting you you could use the L two norm to do |
---|
0:16:21 | something to enforce sparsity |
---|
0:16:24 | a a yes that uh okay so i was just curious what that men |
---|
0:16:28 | uh |
---|
0:16:32 | um so |
---|
0:16:33 | whenever we we talk about sparsity what you wanna do is uh optimize the L one norm of signal and |
---|
0:16:38 | that's all the what the compressive literature about |
---|
0:16:40 | um it turns out that we're dealing with normal a spectral they already some to one does no way can |
---|
0:16:44 | optimize the L one norm because |
---|
0:16:46 | it's a nonnegative bunch of numbers that sum to one it's always gonna be one |
---|
0:16:49 | sort of screwed if you are uh if you few if you try to optimize that |
---|
0:16:52 | oh because those numbers are gonna be between zero and one uh the close are to zero that's is gonna |
---|
0:16:57 | be a smaller number so by optimising the L two norm |
---|
0:17:00 | of of uh of a normalized vector that sums to one |
---|
0:17:03 | oh then essentially you're enforcing sparsity because you saying i want all of the other to be really really close |
---|
0:17:07 | to zero |
---|
0:17:08 | and you only gonna have to be one of them be a a a a a of a one |
---|
0:17:12 | that make sense |
---|
0:17:15 | hi uh the two speaker example that you played uh |
---|
0:17:18 | it's |
---|
0:17:19 | it it was artificially made |
---|
0:17:20 | by adding two signals |
---|
0:17:22 | i'm just curious how this actually works but the L signals where you have |
---|
0:17:26 | reverberation which |
---|
0:17:27 | smears the speaker |
---|
0:17:30 | ignores in |
---|
0:17:31 | time um yes |
---|
0:17:32 | if you have a a a a uh if you have a long a "'cause" are a big reverberation is |
---|
0:17:36 | not a big deal because |
---|
0:17:38 | the egg was of the reverberation are gonna be spectra that correspond to some of the previous uh utterances of |
---|
0:17:43 | the speak |
---|
0:17:44 | um and that seated incorporate in the model so uh it's a |
---|
0:17:47 | it's are resistant to that |
---|
0:17:48 | uh the on the place where you get into trouble is if you have very strange phase X and the |
---|
0:17:52 | reverberation that actually changes the spectrum |
---|
0:17:54 | well that's L |
---|
0:17:55 | so if you i don't know if to sir of in the bathroom any have these very ugly echo that's |
---|
0:17:59 | very short then you |
---|
0:18:01 | creates an order or resonance in your spectrum |
---|
0:18:03 | uh that's gonna buy guess your uh your testing data to be very different from the training data |
---|
0:18:07 | and the and the fit is not gonna be as good but as as long as the spectral card there's |
---|
0:18:11 | they the same a reverberation is on a big the |
---|
0:18:18 | oh |
---|
0:18:18 | so a thank you um power talk uh |
---|
0:18:21 | so i a question parts uh to represent a the a house so the same that we need a how |
---|
0:18:26 | a lot data to to represent that sauce |
---|
0:18:28 | yeah and uh you you said use G uh when you do the search |
---|
0:18:32 | i think that audio data at in kind of a man and follow so is and you man you for |
---|
0:18:36 | it is seem that we don't need |
---|
0:18:38 | i'm not met data is in |
---|
0:18:40 | no that a itself |
---|
0:18:42 | yeah do you sink back introduce a D two |
---|
0:18:45 | to representation we how back or |
---|
0:18:47 | uh queries |
---|
0:18:49 | the training |
---|
0:18:50 | um to you and it is a out of data in order to properly represent the manifold that every source |
---|
0:18:54 | of lying in |
---|
0:18:55 | um a and that's the only reason and it sort of becomes a very similar to uh problem to a |
---|
0:18:59 | supervised learning where |
---|
0:19:01 | if you data is these and you can represent uh your input thing you're fine |
---|
0:19:05 | and that would be that would mean that you only have a about a doesn't data points are could mean |
---|
0:19:08 | you five million and that data point |
---|
0:19:10 | uh so like in this case |
---|
0:19:12 | um if you're dealing with simple sources the not so much variation then you can get away with a with |
---|
0:19:16 | a little data |
---|
0:19:17 | um if you wanna do something that's very uh uh you know if you wanna model everybody's voice |
---|
0:19:21 | at all different pitches and all different phonemes of all sort of languages than obviously need to have |
---|
0:19:25 | a pretty good representation of all the possible cases and it makes a database |
---|
0:19:28 | a you |
---|
0:19:31 | yeah paris my and common i guess |
---|
0:19:34 | is often what you're are i guess |
---|
0:19:35 | we seen to wrap these talk you know a lot of modeling in different ways of modeling that sound |
---|
0:19:41 | i was wondering what you philosophy is maybe he's you went over it be a bit quickly "'cause" you said |
---|
0:19:46 | you got a mixed and you gonna take two examples samples or many examples of that it's to so it's |
---|
0:19:51 | like you know |
---|
0:19:52 | the components of the mixture already and then you figure out how they |
---|
0:19:55 | put together and so what you're saying |
---|
0:19:57 | at at if you have more complicated situation or maybe you don't know what the components are |
---|
0:20:02 | or if it's a more complicated and the database size grows you was showing |
---|
0:20:06 | or earlier |
---|
0:20:07 | the search can can grow very big can you've done some clever stuff to make the search small |
---|
0:20:12 | um and what is your philosophy on overall |
---|
0:20:15 | the or the older dream is that eventually you have a database that has pretty much every sound ever were |
---|
0:20:19 | played |
---|
0:20:20 | um um you gonna do search over that and chances are that was some whatever you're analysing as it been |
---|
0:20:24 | repeated and fast |
---|
0:20:25 | uh so you know that's what we're driving at |
---|
0:20:27 | um uh |
---|
0:20:29 | uh one E case where a sort of white one defence point they can use is that in the um |
---|
0:20:35 | uh quintet that example here |
---|
0:20:37 | um even when i have a solve recording uh basically had recordings of five instruments |
---|
0:20:41 | and i had to big this one spectrum that was close as in the do ed again have the five |
---|
0:20:44 | instrument at a bit on a on two of them |
---|
0:20:47 | um |
---|
0:20:48 | so yes i did know what instrument were in there and they were part of the database |
---|
0:20:51 | uh but i sort of think of this more of a as a logistic problem in that |
---|
0:20:55 | uh you know i had to get some data um you know ultimately you just wanna have this humongous database |
---|
0:20:59 | of that everything |
---|
0:21:00 | and and pick through that |
---|