0:00:15 | mean |
---|
0:00:15 | and |
---|
0:00:16 | i'm started to just one right but it's it's not simon |
---|
0:00:19 | i |
---|
0:00:20 | um |
---|
0:00:21 | so i'm gonna talk about the music transcription work um from my master's project last year |
---|
0:00:25 | and |
---|
0:00:27 | so |
---|
0:00:28 | uh just to go to what we to the transcription it's and say i musical signal might look some a |
---|
0:00:33 | bit like that's |
---|
0:00:34 | so this is a a a a just a time domain signal it's can be roughly periodic and it's it's |
---|
0:00:38 | can have a a whole that's of |
---|
0:00:40 | um sinusoidal |
---|
0:00:41 | and |
---|
0:00:42 | components each with a a a different time varying amplitude |
---|
0:00:46 | and but that's how we perceive music is that |
---|
0:00:48 | and this is what we |
---|
0:00:49 | what we sort |
---|
0:00:51 | i think of we we have a we think of a a a is no |
---|
0:00:53 | and and then some of a high tech and high level properties |
---|
0:00:57 | uh |
---|
0:00:57 | such as the and expression and that the timbre of instrument |
---|
0:01:00 | so |
---|
0:01:01 | and so what would like is a system that can take some might this and turn it into something like |
---|
0:01:06 | this |
---|
0:01:07 | now that's quite an ambitious things do one step |
---|
0:01:09 | say |
---|
0:01:10 | and we gonna a for a a the intermediate results |
---|
0:01:12 | that something like this |
---|
0:01:14 | this is that it can or roll |
---|
0:01:15 | uh and we've got |
---|
0:01:17 | um um like |
---|
0:01:19 | or got and the pitch of the night sub side yeah and time on bottom and the line indicating which |
---|
0:01:23 | makes the presents |
---|
0:01:24 | and this is from them and you work silence |
---|
0:01:26 | and just on a single byte modeling |
---|
0:01:30 | um |
---|
0:01:30 | so what i'm gonna do is just talk about a um |
---|
0:01:34 | uh sequential um |
---|
0:01:35 | framework that |
---|
0:01:37 | doing this night estimation |
---|
0:01:39 | and and not talk about the the models |
---|
0:01:40 | um |
---|
0:01:41 | that we we using say we we got a a like you'd model using a some point point processes and |
---|
0:01:46 | then something simple dynamic models for them next evaluation |
---|
0:01:49 | and then not talk that's and and C M C scheme to some results |
---|
0:01:52 | and so first all um i'm a music is |
---|
0:01:55 | and a continuous |
---|
0:01:56 | signal and |
---|
0:01:58 | uh we we want to look at |
---|
0:01:59 | and |
---|
0:02:01 | we can see domain model said pressing we gonna do is to um chop it up into frames and i |
---|
0:02:05 | will reference the frames with this then subscript Z here |
---|
0:02:09 | and then for each frame would like to estimate that was set to make its presence which will |
---|
0:02:14 | cool be to T an out and given the data that we've got for that frame to go white T |
---|
0:02:18 | and and the way we can do this is by looking at this uh a joint posterior a of the |
---|
0:02:23 | V |
---|
0:02:23 | the notes in the current frame and the previous frame and you're recognise this from the the previous talk it's |
---|
0:02:28 | that the same |
---|
0:02:29 | um |
---|
0:02:29 | that |
---|
0:02:30 | and say we've got |
---|
0:02:31 | we we can expand this one three times um i like it's um yeah a transition that sticks and |
---|
0:02:36 | and then this |
---|
0:02:38 | um |
---|
0:02:40 | uh posterior time from the previous |
---|
0:02:42 | and processing step |
---|
0:02:44 | say and you might in this uh |
---|
0:02:46 | T minus one implements to then |
---|
0:02:48 | just a marginal of that |
---|
0:02:50 | so i got a yeah |
---|
0:02:51 | a particle up or the previous frame we can smoke that |
---|
0:02:56 | so let's less of for yeah it's the the models |
---|
0:02:59 | that we using for that selected |
---|
0:03:01 | at um |
---|
0:03:02 | so |
---|
0:03:03 | um i mentioned we can use frequency domain model say this is just the actual time area |
---|
0:03:08 | transform of uh one of the frames |
---|
0:03:10 | i'm see that what we interested in it is |
---|
0:03:12 | this that of P down here |
---|
0:03:14 | and that's that's a lot of redundant information down here in the noise level |
---|
0:03:17 | so the first thing we gonna do straight away |
---|
0:03:19 | they will run of that just a peak detection algorithm |
---|
0:03:22 | and this is very simple we we just looking at the first order difference |
---|
0:03:26 | and then and give a median threshold on it |
---|
0:03:29 | and so we would use |
---|
0:03:30 | the bispectrum down to just this that's of a red circle pizza |
---|
0:03:35 | um |
---|
0:03:36 | now |
---|
0:03:37 | what would like to model is but the frequency and the amplitude |
---|
0:03:40 | uh it it sends out the the amplitude these peaks |
---|
0:03:43 | is dependent on a an off lot factors |
---|
0:03:44 | including in but that's playing |
---|
0:03:46 | and uh the you |
---|
0:03:48 | recording environment |
---|
0:03:49 | and most of all them are very very of a time |
---|
0:03:51 | so and |
---|
0:03:53 | print together a simple and robust models as |
---|
0:03:55 | is difficult say what we're gonna start of but just looking at a model for the |
---|
0:03:59 | the frequencies of the the set of so |
---|
0:04:04 | and |
---|
0:04:06 | say |
---|
0:04:08 | if we and if we have one night playing you know |
---|
0:04:10 | um |
---|
0:04:12 | frame then what we what we |
---|
0:04:14 | C characteristically is a a peak at some fundamental frequency that's that's |
---|
0:04:18 | the the lowest P can't with and with than it a yeah the fundamental |
---|
0:04:21 | and then we see yeah a sets of peaks |
---|
0:04:22 | that's |
---|
0:04:23 | um i times a partial frequencies |
---|
0:04:25 | i would is the set up here and there approximately in multiples of the uh a fundamental |
---|
0:04:31 | and but we don't always get |
---|
0:04:32 | a P in all these locations some plus one thing here |
---|
0:04:35 | and we don't know how many of them |
---|
0:04:37 | they'll be ha ha |
---|
0:04:38 | how high we have to go up |
---|
0:04:40 | and |
---|
0:04:40 | in addition we gonna get some cuts of up yeah and it's gonna be due to |
---|
0:04:44 | um |
---|
0:04:46 | a a noise all transients affects which were not really modeling |
---|
0:04:49 | and or the non musical |
---|
0:04:51 | sounds and recording |
---|
0:04:52 | i |
---|
0:04:54 | uh |
---|
0:04:54 | so if we if we have a lot of |
---|
0:04:56 | no |
---|
0:04:57 | present in the frame |
---|
0:04:58 | we up with a horrible they rest a station issue where we we'd like to link every P we've i |
---|
0:05:03 | that one of the nets presents all |
---|
0:05:05 | a a cut the price |
---|
0:05:07 | and |
---|
0:05:08 | but so that that gives us some horrible scaling in complexity as we increase number of nights of the number |
---|
0:05:12 | of at times |
---|
0:05:14 | um so we can get around this by um |
---|
0:05:16 | making it a um using up a possible |
---|
0:05:19 | process assumption about the uh the generation of peaks in a spectrum |
---|
0:05:24 | so |
---|
0:05:25 | we seen that's and |
---|
0:05:27 | for each of its own and the pizza generated in the in the spectrum according to a poisson process |
---|
0:05:33 | and we can construct a and in intensity functions this |
---|
0:05:36 | um |
---|
0:05:37 | for some process by which has a maximum at the expected uh frequency of the |
---|
0:05:42 | uh i that i |
---|
0:05:44 | uh no this is quite a significant assumption |
---|
0:05:46 | um |
---|
0:05:48 | germany many where we only |
---|
0:05:49 | expect a see no P |
---|
0:05:50 | school one be or maybe in some rare cases some some respect peak |
---|
0:05:54 | um |
---|
0:05:55 | now with this assumption we we gonna have a a a a some distributions of the number of peaks at |
---|
0:05:59 | that time that time |
---|
0:06:00 | and so |
---|
0:06:02 | and that's that's the bad thing that the good thing is that |
---|
0:06:05 | um because of the union property of price some processes we could just at the intensity functions |
---|
0:06:09 | uh for each i've i |
---|
0:06:11 | a to of us |
---|
0:06:12 | uh and that's T |
---|
0:06:13 | function like this for the a whole night's as a a personal press |
---|
0:06:17 | and and you can see we we constructed this |
---|
0:06:19 | um |
---|
0:06:20 | with a |
---|
0:06:21 | a very |
---|
0:06:22 | now large can combine it's that fundamental |
---|
0:06:25 | showing that way it would pretty certain is gonna be a peak that and we and we quite sure |
---|
0:06:29 | uh what frequency will be at |
---|
0:06:31 | and we've got some a a small components it's a high frequencies where with less that exactly what frequency the |
---|
0:06:36 | people look occur |
---|
0:06:39 | um |
---|
0:06:39 | and then if we have more one they present the again we can just at these and intensity functions together |
---|
0:06:45 | a for all the different nights |
---|
0:06:46 | and give us a i |
---|
0:06:47 | and a poisson process but for all the peaks in a and all spectrum |
---|
0:06:51 | uh say just that |
---|
0:06:53 | he's a mac |
---|
0:06:54 | and |
---|
0:06:55 | this is uh |
---|
0:06:56 | we would been using a a gaussian mixture model to to construct these these note |
---|
0:07:00 | and intensity |
---|
0:07:01 | function |
---|
0:07:03 | and and then we just |
---|
0:07:04 | uh adding them together to give the entire frame |
---|
0:07:06 | and intensity function |
---|
0:07:08 | and then adding on and a little bit extra um uniformly |
---|
0:07:11 | to account for that that scott's of peaks so |
---|
0:07:13 | the |
---|
0:07:14 | cut up for some process |
---|
0:07:16 | and and then once we got this we |
---|
0:07:18 | uh i like uh a like cleared |
---|
0:07:20 | uh expressions of the |
---|
0:07:22 | um |
---|
0:07:23 | frame |
---|
0:07:23 | so |
---|
0:07:24 | um |
---|
0:07:27 | just a integrating the intensity function a each um and of the fast a transform that would give us the |
---|
0:07:32 | an expectation of |
---|
0:07:34 | um for the the presence of a peak in that bin |
---|
0:07:36 | and then |
---|
0:07:37 | uh we can just um |
---|
0:07:39 | take a like you like this |
---|
0:07:41 | um |
---|
0:07:42 | from a from for speech and and then not all together to give the cycle frame likely |
---|
0:07:48 | and |
---|
0:07:48 | now i said i'll of the uh |
---|
0:07:50 | and attains a cow approximately at integer multiples of the fundamental |
---|
0:07:54 | um |
---|
0:07:55 | and |
---|
0:07:56 | it it sends out that for um |
---|
0:07:58 | especially for a stringed instruments |
---|
0:08:00 | uh they the you ten step the spread out so high frequent |
---|
0:08:03 | so |
---|
0:08:03 | and we've been using a um a models of this in a menace T and the going to this formula |
---|
0:08:08 | i can from the that |
---|
0:08:09 | and and this introduces another parameter that we can have to rest which is if this be here which is |
---|
0:08:14 | that it's a and in how many city parameter a for each night |
---|
0:08:18 | um |
---|
0:08:19 | so |
---|
0:08:20 | the things we have to estimate and now adding up |
---|
0:08:22 | that that speech to that we had a idea |
---|
0:08:24 | and |
---|
0:08:25 | if if we use |
---|
0:08:26 | take this be that the set the problems as we need to estimate we've got so and the number of |
---|
0:08:30 | notes and then for each night a fundamental frequency |
---|
0:08:33 | the number of partials annals that in in how T |
---|
0:08:39 | um maybe non to the um |
---|
0:08:40 | transition density and now we've been using some very simple models least a for um and they'd been based on |
---|
0:08:46 | two |
---|
0:08:47 | quite basic observations say press the that's if an is present in one frame then it's like that that it |
---|
0:08:53 | is also |
---|
0:08:54 | and present in the next frame |
---|
0:08:56 | um and second that's uh it this is the number of nights present in one frame then it's like you |
---|
0:09:00 | will have the same number of nights in the next frame |
---|
0:09:02 | and i'll we see there are formal um higher a levels of modeling the we could do here looking at |
---|
0:09:07 | how the the number of |
---|
0:09:09 | partial frequencies change between frames |
---|
0:09:11 | so you expect that the K |
---|
0:09:12 | um |
---|
0:09:14 | and also um |
---|
0:09:15 | modeling a a note onset set sets we have like that i |
---|
0:09:18 | is |
---|
0:09:21 | um |
---|
0:09:22 | but would now got everything we need to do some inference |
---|
0:09:24 | say |
---|
0:09:25 | this is that |
---|
0:09:27 | this is the thing we trying to rest make run but and we defined a model for the like it |
---|
0:09:30 | that's that the poisson model and we've got a my simple models for the |
---|
0:09:33 | the transition |
---|
0:09:34 | then T |
---|
0:09:35 | and so now we can use the um and C C particles out with them uh which |
---|
0:09:40 | but never and just the last talk |
---|
0:09:41 | and |
---|
0:09:43 | uh two |
---|
0:09:44 | S make this this joint that's T |
---|
0:09:46 | um |
---|
0:09:48 | now |
---|
0:09:50 | the the problem is that |
---|
0:09:51 | if we've got a large number of next then we've got |
---|
0:09:53 | a lot of parameters now um |
---|
0:09:55 | to about three from this region a remember |
---|
0:09:58 | at which means if we try and change all of them at once we end up with very low acceptance |
---|
0:10:01 | rates than all |
---|
0:10:02 | um markov chain |
---|
0:10:06 | um |
---|
0:10:08 | okay that the way to get around this um for we gonna have to sorts of move we can have |
---|
0:10:12 | means where we only and change the |
---|
0:10:14 | and the current frame parameters |
---|
0:10:15 | and then all these where we we trying change by the previous frame and the current frame from |
---|
0:10:20 | and for the current frame |
---|
0:10:21 | and it's it's nice you we can just use metropolis with gives them a just choose to use change some |
---|
0:10:26 | subsets of the problem as that once will just change |
---|
0:10:28 | and the three parameters the say seated one nights |
---|
0:10:31 | uh in each step |
---|
0:10:33 | um |
---|
0:10:34 | the joint moves it gets a little more complex |
---|
0:10:36 | um what would like to do is |
---|
0:10:38 | sample poll a the T minus one from the |
---|
0:10:41 | um |
---|
0:10:42 | possible distribution from a from the previous frame |
---|
0:10:45 | and then uh propose the card frame is from some provides |
---|
0:10:49 | uh say the the problem here is that if |
---|
0:10:51 | when we do the sampling we will be changing all of the T minus one promises as in one guy |
---|
0:10:55 | and and again that gives the is very low acceptance rates |
---|
0:10:59 | uh say |
---|
0:11:01 | a solution to this this being C to take the the particle distribution and it's of collapse it onto to |
---|
0:11:06 | a a a single |
---|
0:11:06 | univariate histogram uh for for all the different possible notes that we have in the previous frame |
---|
0:11:12 | and then we use this |
---|
0:11:13 | to as an approximation for the |
---|
0:11:14 | the the marginal |
---|
0:11:16 | distribution of |
---|
0:11:18 | um each night and then and the the of |
---|
0:11:21 | for uh independent it |
---|
0:11:22 | that's you my as one and this means that we can sample |
---|
0:11:25 | um one day to to time uh the um |
---|
0:11:28 | of the but the T minus one parameter |
---|
0:11:31 | um and that and again gives |
---|
0:11:33 | acceptable acceptable uh |
---|
0:11:34 | except at |
---|
0:11:36 | uh a to finally we we want to made the number of makes present in each frame |
---|
0:11:39 | and that can |
---|
0:11:40 | be done very nice just by putting the whole thing into a a reversible jump |
---|
0:11:44 | um formulation |
---|
0:11:46 | and |
---|
0:11:47 | so that's some look at some results |
---|
0:11:49 | and so this is the the output from a couple of markov chains this is a a a a simple |
---|
0:11:53 | case where we just got one night |
---|
0:11:54 | and we're not looking at reversible jump a that what we fixing the number of nights that one button yeah |
---|
0:12:00 | and |
---|
0:12:00 | you can see that it it |
---|
0:12:02 | and it picks up the correct night |
---|
0:12:04 | in on the first iteration in fact factor |
---|
0:12:06 | um |
---|
0:12:08 | and the other a from just on the tree can you a green i think but it |
---|
0:12:11 | so takes about twenty frames segments |
---|
0:12:14 | and |
---|
0:12:15 | and then here on the right got a a three nee case um and we doing reversible jump mcmc now |
---|
0:12:19 | say um rest making the number of nights air |
---|
0:12:22 | um and that's |
---|
0:12:23 | and yeah so yeah that |
---|
0:12:25 | pretty much correct |
---|
0:12:26 | um |
---|
0:12:28 | with the the frequency say we see a fixed |
---|
0:12:30 | to of the knight's you very quickly and then it its troubles to choose between three possibilities here |
---|
0:12:34 | and |
---|
0:12:35 | this the three cases are in fact |
---|
0:12:37 | space |
---|
0:12:38 | i not to the parts and and the reason that some confusion there is "'cause" the three next have you |
---|
0:12:42 | much the same sets of i but i |
---|
0:12:44 | we of partial frequencies |
---|
0:12:46 | um |
---|
0:12:48 | i finally just |
---|
0:12:49 | a few results |
---|
0:12:49 | uh this is |
---|
0:12:51 | and a simple um sort of |
---|
0:12:53 | a loud test piece |
---|
0:12:54 | so it's it's just three chords each of three nights |
---|
0:12:57 | um |
---|
0:12:57 | so we've got time on bottom them here and then the the frequency of the knight's present a of the |
---|
0:13:02 | this |
---|
0:13:03 | and |
---|
0:13:03 | and the the blue dots that |
---|
0:13:06 | um it's K and it estimates and we can see a fixed up |
---|
0:13:08 | um all night quite nicely here |
---|
0:13:10 | and this that one |
---|
0:13:12 | but one just dropping out here um as the the I the K at the end of the night |
---|
0:13:17 | um |
---|
0:13:19 | do errors here a a at the beginning of each night |
---|
0:13:21 | and and easy |
---|
0:13:22 | "'cause" by a transient effects the beginning of the like which will we're not modelling at my |
---|
0:13:28 | and and then find a we we tried on some real music |
---|
0:13:31 | um so this is a a a a kind of piece |
---|
0:13:34 | and and you C it picks up these the base nice |
---|
0:13:37 | but |
---|
0:13:37 | quite nicely |
---|
0:13:39 | and |
---|
0:13:40 | so the the travel mates it it |
---|
0:13:42 | doing a bad job out here that's there's a lot of false alarms and its of the going on and |
---|
0:13:46 | again that's to to um some trend like transient affects the beginning of each night which we we're not modeling |
---|
0:13:52 | well |
---|
0:13:54 | and sorry the i've a late or just the the ground |
---|
0:13:59 | so you just to each um but that and the um |
---|
0:14:01 | the on |
---|
0:14:02 | point process model which we using so it's you on a search of you for each |
---|
0:14:06 | um |
---|
0:14:07 | uh each frame given given the nights and some simple a dynamic models that |
---|
0:14:11 | so |
---|
0:14:12 | for the evaluation of the X at a time |
---|
0:14:14 | and he's will the these us to do and sequential inference is the the mcmc particles goes out for them |
---|
0:14:19 | to find the the number of nights in each frame and |
---|
0:14:22 | and estimates of that that |
---|
0:14:23 | um frequency is the problem |
---|
0:14:26 | and |
---|
0:14:28 | say that there's lots of ways we could extend this say i i mention that E that we we what |
---|
0:14:31 | you look at P camp use "'cause" that's to hot |
---|
0:14:33 | so i and we do mess that need gets nice performance of we looked at then |
---|
0:14:37 | and and also a a at the phase they that we haven't that that's all yeah |
---|
0:14:41 | um and all step by by looking it's that's more complex |
---|
0:14:44 | a dynamical |
---|
0:14:45 | uh |
---|
0:14:46 | and how to given the simplicity of them it seems we in quite well now |
---|
0:15:00 | um |
---|
0:15:12 | a it's quite a long might of real time about it |
---|
0:15:15 | it's a |
---|
0:15:24 | a i we haven't been aiming to get it real time maybe |
---|
0:15:27 | oh |
---|
0:15:39 | uh yes or something |
---|
0:15:42 | and |
---|
0:15:43 | so |
---|
0:15:47 | and i was able to look through what looks simple peak detection we is possible to find a the of |
---|
0:15:51 | features would like to more spectrum to simple peter good score but |
---|
0:15:55 | your term record more as you are used to but spike to hear more just can be limited to from |
---|
0:16:00 | do |
---|
0:16:00 | spurt maybe or |
---|
0:16:02 | you you such teams still |
---|
0:16:04 | so a i i i a by the your we use to do are go to be to measurements just |
---|
0:16:08 | peak detection like and the peach that you're detecting great |
---|
0:16:12 | and number uh uh is better to four two peaks for example of after some some doing to get to |
---|
0:16:16 | do to use like more still |
---|
0:16:18 | sparks some stuff you that was a some smoothing moving that's |
---|
0:16:23 | but it's doing a |
---|
0:16:24 | sure that for |
---|
0:16:25 | all source or they're them |
---|
0:16:27 | i was to the this to the the errors and the of real hard we can what because of to |
---|
0:16:33 | use from minimum or maybe |
---|
0:16:35 | yeah that's that a trade between if you give a pretty much averaging in that it news the sum of |
---|
0:16:39 | the different now or site it's got just a very sure on the which seems |
---|
0:16:50 | a |
---|
0:16:55 | i |
---|
0:16:58 | oh |
---|
0:17:00 | i |
---|
0:17:01 | i five |
---|
0:17:03 | oh |
---|
0:17:04 | oh |
---|
0:17:05 | a |
---|
0:17:06 | i |
---|
0:17:06 | i |
---|
0:17:07 | a |
---|
0:17:08 | i |
---|
0:17:09 | i |
---|
0:17:10 | oh |
---|
0:17:11 | i |
---|
0:17:13 | which |
---|
0:17:21 | and i think another silence do is that something along |
---|
0:17:49 | in in |
---|