0:00:13 | okay so let me but you know they four |
---|
0:00:16 | or conference |
---|
0:00:17 | well as |
---|
0:00:18 | well uh should no practical remarks are of the this morning |
---|
0:00:24 | you know i guess doesn't have any closing ceremony in the afternoon |
---|
0:00:29 | and the |
---|
0:00:30 | greens but |
---|
0:00:32 | name |
---|
0:00:34 | but it then |
---|
0:00:35 | yeah grew the one last year |
---|
0:00:38 | uh seems to come down |
---|
0:00:40 | so uh oh i would like to |
---|
0:00:43 | thank you all for being here |
---|
0:00:45 | stranger |
---|
0:00:47 | you next your encoder |
---|
0:00:49 | and actually |
---|
0:00:51 | the only thing |
---|
0:00:53 | that uh after having done |
---|
0:00:55 | slides is that we have a again in a problem will be not so probably |
---|
0:01:00 | some like to being written |
---|
0:01:03 | again |
---|
0:01:04 | so for that and we are working uh for a while |
---|
0:01:08 | ooh it |
---|
0:01:11 | oh we will have the last a line or E |
---|
0:01:14 | given a uh by michael jordan |
---|
0:01:18 | from uh there |
---|
0:01:20 | and the yeah more um uh problem uh university of you know or um |
---|
0:01:25 | we introduce |
---|
0:01:26 | or speaker |
---|
0:01:32 | think you're much one yeah so |
---|
0:01:35 | a great on two |
---|
0:01:38 | this a michael jordan |
---|
0:01:40 | or you've got |
---|
0:01:42 | i |
---|
0:01:43 | as uh |
---|
0:01:44 | last |
---|
0:01:45 | you you for this |
---|
0:01:47 | uh a you view |
---|
0:01:48 | a new |
---|
0:01:50 | of is uh work |
---|
0:01:51 | uh |
---|
0:01:54 | legend |
---|
0:01:55 | if if use of artificial intelligence and machine |
---|
0:01:59 | and on some student for which i mentioned |
---|
0:02:02 | a two |
---|
0:02:04 | uh is uh work is in fact |
---|
0:02:07 | for just a two |
---|
0:02:09 | what you was it P two |
---|
0:02:11 | san go |
---|
0:02:12 | then be key effect number eighty might way |
---|
0:02:15 | uh if not uh |
---|
0:02:17 | uh interest |
---|
0:02:18 | machine learning and |
---|
0:02:19 | you you general if you have to |
---|
0:02:22 | where |
---|
0:02:23 | for last one years |
---|
0:02:24 | i |
---|
0:02:25 | a word fundamental them then so |
---|
0:02:26 | uh |
---|
0:02:28 | be with |
---|
0:02:29 | i to use you |
---|
0:02:30 | uh and the light |
---|
0:02:32 | scratch this |
---|
0:02:33 | you we you work and you need |
---|
0:02:35 | absolutely incredible contributions to last |
---|
0:02:40 | uh the topic is if to talk about day is uh a non permit can i |
---|
0:02:45 | region |
---|
0:02:46 | methods |
---|
0:02:47 | uh |
---|
0:02:48 | a general use be stressed it is quite this |
---|
0:02:51 | for a work on a graphical models we see graphic |
---|
0:02:55 | which which found application |
---|
0:02:58 | a speaker you |
---|
0:02:59 | and many statistical signal processing so |
---|
0:03:02 | in all the rest like a natural language course |
---|
0:03:06 | nation the G |
---|
0:03:08 | statistical genetics |
---|
0:03:10 | uh |
---|
0:03:12 | a as being of course recognise |
---|
0:03:14 | a tremendous if voice work uh |
---|
0:03:16 | both in statistics |
---|
0:03:18 | engineering in for instance |
---|
0:03:20 | a year here was a V two |
---|
0:03:23 | a a national actually number |
---|
0:03:26 | S |
---|
0:03:26 | what is to |
---|
0:03:27 | national shall i could be of uh |
---|
0:03:28 | this |
---|
0:03:29 | and |
---|
0:03:31 | so uh |
---|
0:03:32 | a fellow |
---|
0:03:34 | american association for the advanced |
---|
0:03:37 | first person is slice and |
---|
0:03:39 | good do three |
---|
0:03:41 | distinctions the this |
---|
0:03:43 | in a uh is to follow |
---|
0:03:46 | usefulness |
---|
0:03:47 | statistics |
---|
0:03:48 | uh of |
---|
0:03:49 | solution for |
---|
0:03:51 | a machinery |
---|
0:03:52 | of light you be of course a would you really |
---|
0:03:55 | a statistic is to so even lecture |
---|
0:03:58 | as well as see that |
---|
0:04:02 | a special distinctions |
---|
0:04:05 | before that i mean you just read a |
---|
0:04:08 | i |
---|
0:04:10 | and |
---|
0:04:10 | train |
---|
0:04:10 | because you use that's basket |
---|
0:04:13 | i some uh |
---|
0:04:15 | um |
---|
0:04:16 | some some words from students |
---|
0:04:18 | for |
---|
0:04:18 | michael has a large number of students |
---|
0:04:21 | with |
---|
0:04:22 | extremely successful positions |
---|
0:04:26 | here's what i read |
---|
0:04:29 | michael jordan if you imagine |
---|
0:04:32 | artificial |
---|
0:04:36 | yeah |
---|
0:04:37 | Q or |
---|
0:04:38 | if you can become used to |
---|
0:04:40 | we really should be with region students you have |
---|
0:04:46 | you |
---|
0:04:52 | thank you |
---|
0:04:59 | i'm delighted to be here and i thank the organisers very much for inviting me |
---|
0:05:04 | uh so the main goal the talk is to tell you a little bit about what |
---|
0:05:07 | uh the title means what is a nonparametric |
---|
0:05:10 | um just to anticipate a little bit first bayesian just means you use probability pretty seriously |
---|
0:05:16 | and is already community uh where probabilities been used uh |
---|
0:05:20 | probably more than any other apply committed i know of for a long long time so that's easy part of |
---|
0:05:23 | my story |
---|
0:05:25 | nonparametric C doesn't mean there no parameters it is just the opposite it means there's a growing number of parameters |
---|
0:05:31 | so you get more more data at uh bayes not permit matrix a large have more more parameters |
---|
0:05:35 | i grows so that something a |
---|
0:05:37 | a a rates or the number of data points but it grows |
---|
0:05:40 | uh i think that's kind of a key |
---|
0:05:42 | moderate perspective |
---|
0:05:44 | uh |
---|
0:05:44 | at about out statistics and uh signal process |
---|
0:05:47 | and the bayesian has a particular take on |
---|
0:05:50 | oh uh a really does a toolbox high my talk and gonna try to do the talk is give you |
---|
0:05:54 | idea what this toolbox is |
---|
0:05:55 | you can use it to solve apply problems |
---|
0:05:57 | i has a beautiful mathematical structure and it's some since it's really just getting going just getting started |
---|
0:06:03 | uh a song good try to convince you that you to contribute here |
---|
0:06:06 | uh both and a fundamental middle side and i and a track |
---|
0:06:10 | a a big dollars collaborators are the bottom you their names or appear about the talk |
---|
0:06:14 | um |
---|
0:06:15 | what |
---|
0:06:15 | what you do them |
---|
0:06:18 | um now so i've of the one slide on sort of just sort of a historical philosophy i i but |
---|
0:06:22 | computer science your but this can equally be bill uh well be |
---|
0:06:26 | um |
---|
0:06:27 | so the all these fields in the forties and fifties uh uh were somehow together you know people like uh |
---|
0:06:32 | a common core of and to rain and so on |
---|
0:06:34 | some of in this many these |
---|
0:06:36 | it's separate "'cause" the problems got really hard i believe |
---|
0:06:39 | um |
---|
0:06:40 | and to peer sides when a often start looking at data structures an out |
---|
0:06:43 | it didn't for much on an certain |
---|
0:06:45 | so this is a ring and focused on uncertainty and certainly and funk focus too much on algorithms and data |
---|
0:06:50 | structures |
---|
0:06:51 | and so maybe to not per much as one |
---|
0:06:53 | a venue you where these two things are coming together and mathematically really at it amounts to using stochastic process |
---|
0:06:59 | i set of classical parametric distributions |
---|
0:07:02 | i use a growing number parameters so you have |
---|
0:07:04 | distributions indexed by |
---|
0:07:06 | um large spaces those are just called stochastic process |
---|
0:07:10 | so |
---|
0:07:10 | yeah you're gonna see that are right the unit the talk some of the stochastic process we're just in looking |
---|
0:07:15 | at |
---|
0:07:15 | and sort of put this a little bit of a more of a clack a statistical context if you pick |
---|
0:07:19 | up a book on bayesian analysis |
---|
0:07:20 | you will see a the posterior written is proportional to the like the times the prior |
---|
0:07:25 | uh and usually right out in a parametric way with the data index in the parameter space |
---|
0:07:30 | um and so this the priors is a likelihood in there's sure posterior |
---|
0:07:34 | in this talk a we don't want data to be a a dimensional object we want to be in three |
---|
0:07:39 | dimensional open in that |
---|
0:07:40 | so we a right set of data but the right G of like that |
---|
0:07:43 | so of X given G is still to be a likelihood in this talk this is mainly get be hidden |
---|
0:07:47 | markov model actually |
---|
0:07:49 | that's so G will be the structure and all the parameters of a hidden markov model the state space and |
---|
0:07:53 | so on and this is just the usual hmm |
---|
0:07:55 | i and this will be some kind of a structural component of hmms or multiple priors on that |
---|
0:08:00 | so all be talking mostly about this and less about that that |
---|
0:08:03 | um all right so but it with a mathematical story just that instead of having a classical prior distribution |
---|
0:08:08 | you have a distribution an in an image dimensional space |
---|
0:08:11 | and that's not so strange mathematically that is what a stochastic processes |
---|
0:08:15 | uh so we have us a prior stochastic process it's could be multiplied by some kind of us fairly classical |
---|
0:08:20 | likelihood |
---|
0:08:21 | "'cause" once G is fixed |
---|
0:08:23 | it it out care what the size space is is just a fixed object |
---|
0:08:26 | is the probably of data given G can be easily specified typically |
---|
0:08:30 | as we multiply this post no prior by this likely that we get ourselves up |
---|
0:08:34 | posterior stochastic process |
---|
0:08:36 | that's to get some more concrete idea what these things are |
---|
0:08:38 | we'll be talking a little bit about |
---|
0:08:40 | uh distributions on trees of one bounded than an down the fan out you get these in genetics and and |
---|
0:08:45 | natural language processing quite a bit |
---|
0:08:47 | uh so cats process on partitions are rise a lot we talk about class tree models to the number of |
---|
0:08:51 | clusters is a known a priori |
---|
0:08:53 | we can put distributions on grammars on sparse matrices on |
---|
0:08:57 | something copy copy really we but distributions on distributions with this tool as well as you can get kind of |
---|
0:09:01 | a recursive a E |
---|
0:09:02 | that you see in computer science |
---|
0:09:04 | um i'm to talk about one particular a large class of um stochastic processes is a random measures what we |
---|
0:09:10 | get to what those are |
---|
0:09:11 | those are |
---|
0:09:12 | one the main ingredients of the tool but some try to a |
---|
0:09:14 | a at talking |
---|
0:09:16 | okay so here the familiar problem um i learned about this from us so my always at icsi a we've |
---|
0:09:20 | work on this a bit um um |
---|
0:09:22 | and the but use it as a con way to to show you the some of the methods i'm talking |
---|
0:09:25 | about be rolled out in out applied to domain |
---|
0:09:28 | uh so i think everyone here knows with this problem is there's a single microphone there's a meeting going on |
---|
0:09:32 | here's the waveform |
---|
0:09:33 | and ground truth is like this the bob spoke for well than john then chilled than bob and someone |
---|
0:09:38 | we'd like to infer this from this |
---|
0:09:40 | and we don't know how many people in the room will we don't know anything about the spectral characteristics of |
---|
0:09:44 | their speech |
---|
0:09:46 | here's the other problem to talk about was a little less traditional for for you guys i think |
---|
0:09:49 | um but it's a segmentation problem |
---|
0:09:51 | that's a multivariate segmentation and it's a multi type time series uh segmentation problem |
---|
0:09:57 | okay so um someone came into a room and this is uh a motion capture we have sensors on their |
---|
0:10:01 | body |
---|
0:10:02 | um |
---|
0:10:03 | and i think it's about sixty dimensional |
---|
0:10:05 | uh so we have a sixty dimensional time-series for this person |
---|
0:10:08 | and you can see it segment the able they did a little bit of jumping jacks a little bit of |
---|
0:10:11 | splits little but was so on so for |
---|
0:10:14 | um but we're not supposed to know that a pretty or we don't know what the library of exercise routines |
---|
0:10:18 | was and we don't know for this particular person which routines of of the big library they decided to use |
---|
0:10:24 | how long they lasted and where the break points were all that |
---|
0:10:27 | so like to and for all of that |
---|
0:10:28 | moreover this is just one person sixty dimensional time-series series have its for many different people |
---|
0:10:33 | so each person gonna come in a will get a sixty dimensional times for each one of them |
---|
0:10:37 | and the we some overlap some people to twist some people able to touch your toes and so so forth |
---|
0:10:41 | not every won't do all we're teens null be some overlap |
---|
0:10:44 | we like to find that and then exploit it so far learn a little bit about what twists look like |
---|
0:10:48 | for one person |
---|
0:10:50 | i like that it be available to be used to segment the times or the that for some other person |
---|
0:10:55 | it's of the joint inference problem over multiple i |
---|
0:10:57 | multiple to |
---|
0:10:58 | a a high dimensional time series |
---|
0:11:00 | okay |
---|
0:11:02 | um okay so everyone this audience with hmms are plus an audience to talk to for that reason so here's |
---|
0:11:07 | some of here's three of the diagrams are often use i like the graphical model one here what here's the |
---|
0:11:11 | states |
---|
0:11:12 | there is a markov chain i |
---|
0:11:14 | um i multi normal random variables |
---|
0:11:16 | you have in missions coming off of those in here the parameter sitting on there |
---|
0:11:20 | um the core of but of course is the transition matrix is the K by K matrix at each role |
---|
0:11:24 | of it |
---|
0:11:25 | is the next state transition probability distributions so give a urine state |
---|
0:11:29 | um um say three |
---|
0:11:31 | or two then you have a transition out of that i'm we represent that it's this little |
---|
0:11:35 | discrete measure here it just a bunch of atoms and look at locations which of the integers |
---|
0:11:40 | and we have masses is so C with them and also the transition probabilities going out from state to as |
---|
0:11:45 | is my two |
---|
0:11:46 | to all the next day there's K of these things so it's a it's a measure that has finite support |
---|
0:11:51 | and will be talking about measures that infinite support here |
---|
0:11:54 | pretty so |
---|
0:11:55 | so there there would be pi two pi one might be a different measure that maybe is is sparse it |
---|
0:11:59 | his two non-zero atoms |
---|
0:12:01 | and um and so one pi three and so on |
---|
0:12:04 | okay so those of the representation that the lattice the graphical model |
---|
0:12:07 | and i i |
---|
0:12:08 | let's represents is of the |
---|
0:12:10 | next state transitions that will be talked about the talk |
---|
0:12:12 | alright so uh lots of issues with hmms that are still some some sense and resolved uh how many states |
---|
0:12:17 | are we use |
---|
0:12:18 | so for the diarisation problem we don't know the number of speakers so we have to infer that in some |
---|
0:12:22 | way |
---|
0:12:23 | um and the uh segmentation probably don't know the number of behaviours i i got twists send job job out |
---|
0:12:28 | touch your toes and jumping jacks and all |
---|
0:12:30 | or some uh no number of behaviour |
---|
0:12:32 | i think you more interestingly this ladder problem and it has uh it she's about the structure of the state |
---|
0:12:37 | space |
---|
0:12:38 | so how to be an co the notion that a particular times is makes use of particular subset of the |
---|
0:12:42 | is the comet oral notion |
---|
0:12:43 | and i i don't know how to and code that in a classical hmm was just set states |
---|
0:12:47 | how to share states among times zero |
---|
0:12:50 | i so don't know have think about that there's a structure come sort of stuck state space |
---|
0:12:54 | um um that we don't have a class which to um so that's gonna going to my P of G |
---|
0:12:58 | i'm gonna put a |
---|
0:12:59 | structural information of my prior and and given the particular choice i'm gonna have that |
---|
0:13:04 | and information available for inference at the uh hmm low |
---|
0:13:07 | alright so based on parametric solve these i think it a very elegant way and not lots of other the |
---|
0:13:11 | problem so i'm gonna try to show you what |
---|
0:13:13 | what we do |
---|
0:13:14 | a to do that |
---|
0:13:16 | um |
---|
0:13:16 | okay so uh i'm get a a again be talking about random measures here be kind skip this slide |
---|
0:13:21 | i was not really much been set here go right to an example of a random measure |
---|
0:13:26 | um no right so everyone here knows what a measure is my is just a |
---|
0:13:29 | set you take a set in you get a number out |
---|
0:13:32 | and the sets them some sigma algebra |
---|
0:13:34 | um we wanna talk about now a random measures |
---|
0:13:37 | i was gonna be probabilistic and do inference on measures we got to have random measures |
---|
0:13:42 | and so that's first of all star was something really easy that's put a a random distribution on integers |
---|
0:13:47 | can just go from one to infinity |
---|
0:13:49 | and a distribution on is the sum of the set of numbers a countable of the sum to one |
---|
0:13:53 | we wanna a random distributions so there could be a set of random numbers that some the one |
---|
0:13:58 | right how do we do that they have to kind of a in some ways they can some the one |
---|
0:14:02 | and uh there many ways you think of doing that here's a way that turns out have beautiful comet a |
---|
0:14:06 | properties that allow was to |
---|
0:14:07 | to develop inference algorithms based on this idea |
---|
0:14:10 | as this called stick breaking it's old i in probability theory |
---|
0:14:14 | um so what do to do is what take a number of beta random variables that that um uh we |
---|
0:14:18 | it take in in a number of them and do the draw an independent so beta to pay a paid |
---|
0:14:22 | a random variable |
---|
0:14:23 | it has two parameters normally but a little pen the first one to one in the other will be free |
---|
0:14:27 | a random variables live between zero and one right and if you take a L uh not to be bigger |
---|
0:14:32 | bigger bigger this think tilt to up to the to the left |
---|
0:14:34 | so most the masses near the origin and less of the masses near one |
---|
0:14:38 | i so think of getting lots of small numbers out of this |
---|
0:14:42 | is beat around |
---|
0:14:43 | so don't to do but guess stick the goes from zero to one and we get a break off the |
---|
0:14:47 | first fraction of but according to be the one |
---|
0:14:49 | and but call that high one |
---|
0:14:51 | and the remainder of the stick is one minus to one and i'm the take a fraction of that beaded |
---|
0:14:56 | to |
---|
0:14:56 | and this little part right here is that amount beta two times the red |
---|
0:15:00 | and a call that pike two |
---|
0:15:01 | here's pi three five for pie five and so on |
---|
0:15:05 | which keep break you know peace the stick and uh the total stick is has has mass one |
---|
0:15:10 | and so as we do this at to infinity we're gonna get a the pies will eventually some to one |
---|
0:15:16 | oh actually easy to prove if you want to prove it here |
---|
0:15:18 | you get these pies was sum to one or in this procedure |
---|
0:15:22 | alright so now that's a way to get a we now have a these pies are they are distribution for |
---|
0:15:26 | any fixed pies and the random so we have a random distribution only in |
---|
0:15:30 | so having learned how to do that we now how can i we can promote these up to distribution arbitrary |
---|
0:15:34 | spaces using that tool to here's how that's done |
---|
0:15:37 | you use these same price before as weights in a little mixture model an infinite mixture |
---|
0:15:42 | and the makes come the next chirp or uh components are these delta function these are do right dealt is |
---|
0:15:48 | they re unit masses |
---|
0:15:49 | at locations he K |
---|
0:15:51 | and the peak here in depended draws from a distribution G not on some space of this some space can |
---|
0:15:56 | be some arbitrary space |
---|
0:15:57 | it's that it got out of euclidean space it can be a function space it can be |
---|
0:16:01 | uh a a but space it can be just about anything |
---|
0:16:04 | um and so these atoms as are living on that's space and their be weighted by these these mixing proportions |
---|
0:16:09 | pike K |
---|
0:16:10 | i so each one these has unit mass and their been weighted by these things which some the ones so |
---|
0:16:14 | this total object G is a spiky object it's a it's a measure |
---|
0:16:19 | um it's total mass one |
---|
0:16:21 | and it lives on some space |
---|
0:16:23 | that's it's random and two ways it's a because the pies got my stick breaking and because these P case |
---|
0:16:28 | we're drawn from some distribution G not was the source of these things a me calling you think atoms |
---|
0:16:33 | so G not as the source of the atoms |
---|
0:16:35 | so with these weighted atoms and some space |
---|
0:16:38 | alright so that is a random measure if i take G of some set a it'll get me and number |
---|
0:16:43 | so it's a measure |
---|
0:16:45 | i it he's added even so on |
---|
0:16:47 | and it's random a to weight so it's a random measure so this is a very general way now of |
---|
0:16:51 | getting atoms if i do the pies according to stick breaking |
---|
0:16:54 | this particular object G has an name it's called a racially process and usually right it like this it as |
---|
0:16:59 | these two parameters the stick breaking parameter |
---|
0:17:01 | and the source of the atoms for |
---|
0:17:03 | a but we can break sticks in different ways and um get the atoms a different ways so this that |
---|
0:17:08 | actually a uh |
---|
0:17:09 | a the useful tool |
---|
0:17:11 | a right so we can have use this as a component of various kinds of statistical models are here's something |
---|
0:17:16 | called that a racially process mixture model and sky just what you think it was |
---|
0:17:19 | we're the use |
---|
0:17:20 | G a draw from G which are now drawn this way it's on some space but the speech here's the |
---|
0:17:25 | real line just like and draw at |
---|
0:17:27 | it has at that locations given by drawing from some underline distribution G not |
---|
0:17:32 | it has |
---|
0:17:33 | heights to these atoms give my stick breaking with this parameter alpha not |
---|
0:17:36 | so if i draw specific G again an object looking like that |
---|
0:17:40 | i the uses of the kind the a mixture model just uses as a as the distribution |
---|
0:17:44 | it's not your typical gaussian it's it's the distribution and random |
---|
0:17:48 | and i draw from it so i might draw this at in the middle of their uh this you know |
---|
0:17:52 | has high probability has a big high |
---|
0:17:55 | and have been drawn that that's a parameter now for some underlying likelihood |
---|
0:17:59 | um and i wrote it down here that X a given data is kind of some distribution a indexed by |
---|
0:18:04 | data |
---|
0:18:05 | so i do that again and again that's this here that box |
---|
0:18:08 | and that gives me |
---|
0:18:09 | um i mixture model in fact there's some probability are get the same atom |
---|
0:18:13 | on six |
---|
0:18:14 | draws of theta and so those |
---|
0:18:17 | uh |
---|
0:18:18 | indices i would be coming of from the same at we would think of those is at belong to the |
---|
0:18:22 | same cluster |
---|
0:18:25 | alright right so that's a dirichlet process mixture model here some kind |
---|
0:18:28 | are data drawn from one of these thing and drive more more data is like go across you start seeing |
---|
0:18:33 | the blue dots more more data |
---|
0:18:34 | and to the parameters they data in this case are means and covariances of gaussians state as a whole big |
---|
0:18:40 | long vector |
---|
0:18:41 | a and you see that the number of distinct once was growing there only if you just ones and the |
---|
0:18:45 | you get more more |
---|
0:18:47 | and they grow it's sum rate fact that rate turns of to be logarithm of and |
---|
0:18:51 | logarithm of the number of number date |
---|
0:18:55 | so the number of parameters we have in this system is growing we we have small number of parameters here |
---|
0:18:59 | and we keep dry from G again and again and again we get more more parameters |
---|
0:19:02 | and like i said it grows a rate log ins we're nonparametric here we don't have a fixed parameter space |
---|
0:19:07 | no matter how much data we have |
---|
0:19:08 | as you me more more data i'm gonna give you more more parameters |
---|
0:19:14 | okay so let's go back to this little slide here |
---|
0:19:17 | and is i've been alluding to uh with some probability five picked a theta one equal to this adam here |
---|
0:19:22 | in the middle |
---|
0:19:23 | with some probability data to decode that exact same atom and so those two data points will come from the |
---|
0:19:28 | same cluster |
---|
0:19:29 | uh you can ask what's the kind of comical structure induced by this kind of procedural how often does that |
---|
0:19:34 | happen |
---|
0:19:34 | what's the probability that data to is equal to theta one no matter where |
---|
0:19:39 | theta one occur |
---|
0:19:40 | probably that you get a lightning twice in the same place |
---|
0:19:43 | and what is that probability fact not over for this particular G but over all possible choices of G under |
---|
0:19:49 | this procedure that i outlined |
---|
0:19:51 | this P G |
---|
0:19:52 | so you think that would be a really hard problems solve that sounds complicated and it turns out that's really |
---|
0:19:58 | easy to solve terms of the answer is |
---|
0:20:01 | and um so that understand that you need or stance a call the chinese rest is another stochastic process |
---|
0:20:07 | this is on partition |
---|
0:20:08 | um not on parameters this is on partitions |
---|
0:20:11 | and this here i've have all drawing this down here |
---|
0:20:14 | a you have customers coming to a restaurant with an infinite number of tables |
---|
0:20:18 | um so here these round tables that the chinese rest raw that's why they're round |
---|
0:20:22 | and and the first customer sits here with probably one |
---|
0:20:25 | the second customer or joins then was probably half an sort the new table probably half |
---|
0:20:29 | and the job is used to let table proportional the number of people already the table |
---|
0:20:33 | so that's often called preferential attachment you start to get a big cluster merging at some small clusters after that |
---|
0:20:39 | and you can easily prove from this this little set up here the number of occupied tables grows as rate |
---|
0:20:43 | at rate log in |
---|
0:20:45 | um okay so it's a beautiful mathematical fact that if you do the integrals that was talking about for the |
---|
0:20:49 | a racially process this turns out to be the marginal probability of the racially process of who sits with who |
---|
0:20:56 | i so it's it it's another perspective it's the marginal |
---|
0:20:59 | uh under that other big probability measure |
---|
0:21:02 | um P G |
---|
0:21:03 | you can make this into an explicit clustering model a mixture model effectively but now |
---|
0:21:08 | a of in terms of clustering in that a mixture components |
---|
0:21:10 | all you do is that the first person set at a table |
---|
0:21:13 | a draws a parameter for that table |
---|
0:21:15 | from some prior which would call G not the same G not as before actually |
---|
0:21:20 | and everybody was sits a table here at that same parameter vector |
---|
0:21:23 | so if fee one here is a mean in a variance of a gaussian that all the data points for |
---|
0:21:27 | customers around this table |
---|
0:21:28 | well all come from the same gal see have the same mean cover it's |
---|
0:21:32 | okay so this is actually i i |
---|
0:21:35 | exactly a marginal of the dirichlet process mixture my |
---|
0:21:40 | um okay so that's kind of little but a tutorial this is been able you know forty years old the |
---|
0:21:45 | the the the like process mixture models it's that's leave all the in and slowly getting taken up and apply |
---|
0:21:50 | communities |
---|
0:21:51 | um but then very very slow |
---|
0:21:53 | um now |
---|
0:21:54 | to use this in a richer class of problems that just mixture models and clustering |
---|
0:21:58 | a you want to have to face the problem that you have multiple estimation problems you have one not just |
---|
0:22:03 | one set a at have multiple sets of data |
---|
0:22:05 | and the hmm case that's gonna come because we have the current state which indexes the next state |
---|
0:22:10 | we have a bunch of distribution |
---|
0:22:12 | the next day distributions for all of the the current states |
---|
0:22:15 | so we don't have one distribution we have a whole bunch of distribution |
---|
0:22:18 | or all the rows of the transition i |
---|
0:22:21 | we need to estimate all those rows |
---|
0:22:23 | um so in statistics we off that have this a rise in we have multiple estimation problems or something parameter |
---|
0:22:29 | and there's a data based on that of another other parameters data based on that |
---|
0:22:32 | and we often want tie these things to get because there might be very little data over here a lot |
---|
0:22:36 | of data over here of these are related problems that makes sense that tie these to get |
---|
0:22:41 | and that called a hierarchical model it's one the main reasons to be a bayesian which is a very easy |
---|
0:22:45 | to make these kind of hard ca models at tie things together |
---|
0:22:47 | um and so what you do is you assume that E parameter |
---|
0:22:51 | a for all these subproblems uh are these primes are related by coming from an underline parameter |
---|
0:22:57 | as you draw these randomly from an my data |
---|
0:22:59 | and now all the data over here kinda goes up to this tree down to here and the the posterior |
---|
0:23:03 | estimate of data i |
---|
0:23:05 | depends on all the data here |
---|
0:23:07 | it's a convex combination of all the data |
---|
0:23:09 | so that arises out the hierarchical bayesian perspective and here's kind of a picture of that |
---|
0:23:13 | here's the i just showed |
---|
0:23:14 | here's a kind of the graphical model representation that using these boxes to represent replication |
---|
0:23:19 | so that our boss box represents these |
---|
0:23:21 | amber applications down below |
---|
0:23:25 | okay so we can do that now what the dirichlet process and so this the paper that we published few |
---|
0:23:28 | years ago a hard could racially process |
---|
0:23:31 | um this is i think really a very useful tool is just the hierarchy applied to a racially process next |
---|
0:23:36 | we have the same kind of setup as before |
---|
0:23:38 | uh but now we don't just have one G we have G one G two G three G four |
---|
0:23:43 | um for the different |
---|
0:23:44 | estimation problems the different groups |
---|
0:23:46 | and if you want be concrete about this is the different rows of the hmm |
---|
0:23:49 | each one of them has that is the right a measure |
---|
0:23:52 | and we have a set of measure |
---|
0:23:53 | all the different rows we want tie them together so they have the same next state space that they all |
---|
0:23:58 | transition to |
---|
0:23:59 | um so are we do is not good draw these things independently and like a lump them all together we |
---|
0:24:04 | want different transition |
---|
0:24:05 | proper probabilities is we don't want them to be complete separate state space they gotta be tied together in some |
---|
0:24:10 | way |
---|
0:24:10 | so the way you do this is that you you have a a an extra layer of this graph you |
---|
0:24:14 | first of all draw uh from some |
---|
0:24:16 | source of atoms |
---|
0:24:17 | a |
---|
0:24:18 | mother distribution G not |
---|
0:24:21 | it's now a random instead with be fixed parameter like before |
---|
0:24:24 | and now that the base measure that you used |
---|
0:24:26 | to going to each of the children |
---|
0:24:28 | they they draw the atoms from G not |
---|
0:24:31 | um and they re weight them according to their own stick breaking process |
---|
0:24:35 | okay |
---|
0:24:35 | so this ties together a set of random measures it makes them cooperate operate on what are the atoms there |
---|
0:24:41 | all to use |
---|
0:24:42 | and what are the weights at they're gonna use |
---|
0:24:45 | uh this also as a as a rest raw underneath it which is a really easy way to understand these |
---|
0:24:49 | ideas |
---|
0:24:50 | we now don't have just one rest route we have multiple restaurants in restaurants in general and again this in |
---|
0:24:55 | my application hmms these will be the different current state |
---|
0:24:59 | or for rows of a transition sure |
---|
0:25:01 | right so if you're rest run number two |
---|
0:25:03 | you go when you sit at a table proportional number of people or to the table like before that would |
---|
0:25:07 | get |
---|
0:25:08 | uh sharing or clustering with a restaurant |
---|
0:25:11 | and then if i'm the first person as sit at a table i i up to a global menu of |
---|
0:25:15 | dishes for the entire franchise |
---|
0:25:17 | and i pick it is from this menu maybe i pick |
---|
0:25:19 | um |
---|
0:25:21 | uh a chicken up here |
---|
0:25:23 | um and i pick "'em" project i bring it down to my table and everybody who joint we my table |
---|
0:25:27 | has to that same dish |
---|
0:25:29 | we all the same parameter |
---|
0:25:30 | but i also put a check mark next to a dish |
---|
0:25:33 | and if you are in some other restaurant |
---|
0:25:35 | and and when you're the first person to your table you got to this menu |
---|
0:25:39 | and you pick it dish proportional the number of check marks next the dish |
---|
0:25:43 | case they're big you get some dishes a peer there are popular across all of the restaurants not just within |
---|
0:25:47 | one restaurant |
---|
0:25:48 | and they get transfer between the rest according to this this kind of preferential attachment it's |
---|
0:25:54 | okay so again get the hmm setting |
---|
0:25:56 | these will be the rows of the transition matrix |
---|
0:25:59 | the number of possible transitions as a in as you get more more data so just have K states i |
---|
0:26:03 | have a growing number of state |
---|
0:26:06 | and the state to be shared among all the different rows of the transition matrix according to this rest strong |
---|
0:26:13 | um |
---|
0:26:14 | now i have a really nice application of this that we publish their this your that i'm good i think |
---|
0:26:18 | skip in the interest of time in this talk but to be of a paper on this |
---|
0:26:22 | i think it's kind of a killer out for this problem |
---|
0:26:24 | you're basically trying to model some of the geometry of proteins |
---|
0:26:27 | here's kinds of the angles that you get in proteins |
---|
0:26:30 | and if you put all the data for a particular a know so on top of each other you get |
---|
0:26:33 | kind of these fuzzy diagrams where they're not repress side |
---|
0:26:37 | and so what you really wanna do is not just have one estimation problem of the density of angles |
---|
0:26:42 | i you wanna break them up according to the context like in |
---|
0:26:45 | you wanna not just have one |
---|
0:26:47 | a distribution one have a bunch of them depending on the context |
---|
0:26:50 | when you break them up according the context or over you have a sparse data problem many of the context |
---|
0:26:55 | every little data |
---|
0:26:56 | as a very similar kind of problem lots of settings including signal process in speech |
---|
0:27:00 | um and so we want to have models like this at that that are depending on the context of the |
---|
0:27:05 | neighbouring an amino acids |
---|
0:27:06 | so this setup is exactly that the groups or the neighbourhood |
---|
0:27:10 | so for each context you of a group of data |
---|
0:27:12 | and you don't treat them separately you don't want them together read you have them cooperate right this tree |
---|
0:27:17 | and everyone that racially is looks kind like a virtually process like that's synthetic data should you earlier |
---|
0:27:22 | but you really have |
---|
0:27:23 | twenty amino mean as a left when twenty the right we get four hundred estimation problem |
---|
0:27:27 | or or context |
---|
0:27:28 | as we have four hundred diagrams like this |
---|
0:27:30 | and we get them to share |
---|
0:27:32 | atoms according to this procedure so |
---|
0:27:34 | uh i guess skip over some the results but they are um they're they're impressive this is kind of log |
---|
0:27:39 | a lot probably improve on test set for all the to and you know acids |
---|
0:27:43 | here's no improvement in this is over what they did probably due in in the protein folding literature |
---|
0:27:47 | so it's really quite massive pro |
---|
0:27:50 | um okay let's go back to hidden markov models uh where are that is a is a hidden markov model |
---|
0:27:55 | and our prior now is the structural |
---|
0:27:57 | machine or in telling you about |
---|
0:27:59 | um okay so now we have a hidden markov model we don't have a fixed number of states we call |
---|
0:28:03 | this a H P M |
---|
0:28:05 | we have an infinite number of states so here's time here's the state space |
---|
0:28:09 | and so when you in one of these current state you have a distribution on next states |
---|
0:28:13 | you get they get a like a trying as rest are pasta here's table one table two table three table |
---|
0:28:16 | four |
---|
0:28:17 | and as a a a a growing number of tape |
---|
0:28:20 | i i do that for table for every time i hit state number two |
---|
0:28:23 | um |
---|
0:28:24 | then and that's great i can get a growing number of tables but i wanna share those tables between state |
---|
0:28:29 | number two and three and four and so on |
---|
0:28:31 | so is i been talking about i need this |
---|
0:28:33 | hierarchical dirichlet process to tie together all those transitions |
---|
0:28:37 | um okay so i'm get kind of go quickly to slides like again details are what i'm trying to convey |
---|
0:28:41 | here just kind of but at this very |
---|
0:28:43 | and and and uh |
---|
0:28:45 | briefly what you do is you draw a mother transition distribution for all states |
---|
0:28:49 | and then everybody a for each |
---|
0:28:52 | um current state is a kind is a perturbation of of that so here's spike three |
---|
0:28:55 | it takes this in kind of every weights all these atoms |
---|
0:28:58 | i i will for |
---|
0:28:59 | and so one so for |
---|
0:29:01 | okay |
---|
0:29:02 | alright right so um |
---|
0:29:04 | i hope that was signed of kind of clear this is a way of doing hmms were you don't know |
---|
0:29:08 | the number states a priori |
---|
0:29:09 | and actually are putting a prior distribution on the number of states and it can grow you get more more |
---|
0:29:13 | day |
---|
0:29:14 | a kind of a nice solution to the a classical problem in hmm lan |
---|
0:29:18 | um so we implemented this uh there is a simple sampling procedure um |
---|
0:29:23 | we did a slightly non samples procedure but there's a whole bunch of procedures the can be used to do |
---|
0:29:27 | a poster inference with this i given some emission some data |
---|
0:29:31 | uh in for |
---|
0:29:32 | the most probable hmm and for the uh trajectory D a viterbi path and for everything else you want to |
---|
0:29:38 | a for all the parameters about this |
---|
0:29:39 | you can do this pretty easily on a computer |
---|
0:29:41 | just kind of a |
---|
0:29:42 | we can standard methodology |
---|
0:29:44 | it's hmm a |
---|
0:29:46 | right so what we did this uh we apply to the uh diarisation data we did okay but not particularly |
---|
0:29:51 | well |
---|
0:29:52 | we identify one problem we need to all before we could start to get performance at which was was satisfactory |
---|
0:29:57 | and that was that we have a little bit too much state splitting going on so here's a little synthetic |
---|
0:30:01 | problem in which which time |
---|
0:30:02 | there were three states in our synthetic data as you can see here so you know this state was on |
---|
0:30:06 | for a few uh time frames and then this state and so on |
---|
0:30:10 | um here was the data and you can sort of see that there are three states here but you know |
---|
0:30:13 | it's pretty noisy be kind hard to tell that's by looking at |
---|
0:30:15 | um here the output of our inference system are computer program that |
---|
0:30:19 | for the H D P H |
---|
0:30:21 | and it did find three states here but then it actually count of for state and the problem was that |
---|
0:30:26 | the parameters for state three in state for the emission distributions |
---|
0:30:29 | happen to be just about the same |
---|
0:30:32 | could happen it happened here |
---|
0:30:34 | and so when you were state three and four from the outside world point of view the emission probably is |
---|
0:30:37 | base of the same |
---|
0:30:39 | so the system was do it perfectly well |
---|
0:30:41 | at high likelihood |
---|
0:30:42 | and within flickering between state three and four |
---|
0:30:44 | and again i have high like this so why not |
---|
0:30:47 | there's or that prevents it from doing that |
---|
0:30:48 | but we don't want that of course because now we didn't a very good the diarisation problem thought there for |
---|
0:30:52 | people the room they're only really three |
---|
0:30:54 | okay so we're being bayesian in this problem and so we don't my put in a little bit more prior |
---|
0:30:58 | knowledge and this we put base nine N at this point so is put a little bit more and which |
---|
0:31:02 | is that people don't tend to type uh talk |
---|
0:31:05 | for microsecond millisecond time interval |
---|
0:31:07 | they ten to talk for sec |
---|
0:31:09 | we have one extra parameter a self transition probability in the hmm the diagonal of this infinite matrix |
---|
0:31:15 | is good be treated all but special get a extra boost |
---|
0:31:18 | right so we have one extra parameter which is the extra boost for self transition something like a a |
---|
0:31:22 | a um |
---|
0:31:24 | um |
---|
0:31:26 | uh sim my mark of |
---|
0:31:27 | my mark you've H M |
---|
0:31:29 | okay |
---|
0:31:30 | as so we call that is sticky H D A H E P H M it just has the transition |
---|
0:31:34 | distributions before |
---|
0:31:35 | plus one parameter which boost these all transitions |
---|
0:31:38 | so if there's the distribution you had before |
---|
0:31:41 | then we add a little bit to that to um |
---|
0:31:44 | you or a little bit of boost |
---|
0:31:45 | for the uh self transition |
---|
0:31:47 | okay |
---|
0:31:48 | that parameter were being in bayesian is again |
---|
0:31:50 | i parameter B for by the uh given the data it's it's random |
---|
0:31:53 | it has a poster distribution |
---|
0:31:57 | okay so uh we put that into our system and um where now ready report some results |
---|
0:32:02 | okay so we went back the speaker diarisation problem |
---|
0:32:04 | and um |
---|
0:32:06 | implemented this on data uh i think this was two thousand and seven the uh in this R T competition |
---|
0:32:11 | data a we didn't compete in this we simply took the data compared to what had been done |
---|
0:32:16 | by people who do compete |
---|
0:32:17 | um |
---|
0:32:18 | okay so uh this is diarisation error rate |
---|
0:32:21 | and icsi results of been the state-of-the-art uh for this at that time and i i don't remember if they |
---|
0:32:25 | are still um but they they were that see result |
---|
0:32:29 | no by a wide margin at the time |
---|
0:32:31 | um |
---|
0:32:32 | and this is over the twenty one meetings |
---|
0:32:34 | and these are comparisons to the icsi just to get a flavour |
---|
0:32:37 | um so are sticky results are basically comparative to the red ones which of the icsi |
---|
0:32:42 | so if you just kind of scan through here |
---|
0:32:44 | the green ones are the non sticky each and you can see there worse and that was assigned to as |
---|
0:32:48 | we actually were reasonable this if something was not quite work |
---|
0:32:51 | so if you do a head to head |
---|
0:32:53 | basically were compare we comparative with the icsi result at this time |
---|
0:32:56 | i do wanna say that might |
---|
0:32:58 | my goal showing these numbers is not to say that we are |
---|
0:33:01 | we are be anybody or that were competitive this was two thousand seven |
---|
0:33:05 | we we're not we're not |
---|
0:33:06 | speech people we didn't try to compete |
---|
0:33:08 | um but we got the results are in fact you know compare with they state-of-the-art the art system and the |
---|
0:33:13 | main point a wanna make is it this was done by |
---|
0:33:15 | and lee fox a grad student to visit my group in the summer and she learned about |
---|
0:33:20 | the H H P hmm and she implemented this and they all that are one summer project |
---|
0:33:25 | so this is not that hard to do it's a tool just like the hmm you can learn and you |
---|
0:33:29 | can use it and you can get |
---|
0:33:30 | you know competitive results doing that |
---|
0:33:33 | um we have not pursue this and you know i know that the icsi team actually did a discriminative method |
---|
0:33:37 | that after that that that's a better numbers |
---|
0:33:39 | um |
---|
0:33:40 | but all these numbers are getting pretty good at and i think that this this particular approach if if um |
---|
0:33:44 | if pursued can be a |
---|
0:33:46 | you know what the best that that's out there |
---|
0:33:48 | i i i'm row i'm i'd recall when hmms first came about i was actually a grad to at the |
---|
0:33:53 | time |
---|
0:33:54 | um um the very first paper on hmms gives a numerical results complied compare to dynamic time warping |
---|
0:33:59 | and they were okay but not great better |
---|
0:34:01 | um but of course that was enough and that set the whole field often |
---|
0:34:04 | heading in that direction i i would like to think that this could be the case also these |
---|
0:34:08 | be the on parametric methods |
---|
0:34:10 | there easy to implement |
---|
0:34:11 | they're just a small twist or what you're used to they're just hmms a little bit of extra that you |
---|
0:34:15 | can move to a new table in the restaurant |
---|
0:34:17 | a little bit index heat of multiple restaurants that really in implement |
---|
0:34:20 | i the robust a you know one student can just implement it it in it were |
---|
0:34:24 | well |
---|
0:34:25 | so |
---|
0:34:26 | i do think the gonna play a role |
---|
0:34:28 | um |
---|
0:34:30 | a C Ds are some examples of actual means these are with the diarization rates for both of X you're |
---|
0:34:34 | quite small you can see here we basically solve the problem |
---|
0:34:38 | and here's a meeting with things were a little bit worse |
---|
0:34:40 | there was one meeting which we did particularly badly |
---|
0:34:42 | turned out that wasn't the model as bad it's turned that the mixing we were use in a mcmc out |
---|
0:34:47 | with the for this the mix C was quite slow one that media |
---|
0:34:49 | we had mixed by the time we start |
---|
0:34:51 | and um so that still on when issue of how to |
---|
0:34:53 | make sure the markov chains mix |
---|
0:34:55 | if we're gonna use markov chain |
---|
0:34:59 | okay so that's all i want to say but there is a show um |
---|
0:35:01 | uh |
---|
0:35:02 | uh that that you just |
---|
0:35:03 | briefly mention a couple of other applications of each D P the uh been used in a many other kinds |
---|
0:35:08 | of problems not just for hmms |
---|
0:35:10 | in fact you can use these for P C F G's for problems the context free grammars |
---|
0:35:14 | and on this case you of a parse tree |
---|
0:35:16 | and the number of rules |
---|
0:35:18 | rules are like clusters that you're doing statistical an L |
---|
0:35:21 | and you grow the number of clusters is you see more more data |
---|
0:35:24 | and the same rule can appear multiple places in the parse tree |
---|
0:35:27 | that's what you have multiple restaurants of the chinese rest for exactly the same reason |
---|
0:35:31 | what share |
---|
0:35:32 | this your strings |
---|
0:35:33 | in different locations of parse tree so we have |
---|
0:35:35 | built a system that does that and |
---|
0:35:38 | um |
---|
0:35:38 | are able to then in for uh most probable |
---|
0:35:41 | rules sets |
---|
0:35:42 | uh uh from data |
---|
0:35:43 | parsing |
---|
0:35:44 | build part |
---|
0:35:48 | um okay so that was it racially process |
---|
0:35:51 | and um and the start talking about virtual processors lot more a lot people more you on it but one |
---|
0:35:56 | that wanna move on |
---|
0:35:57 | it's all you about some of the other a a stochastic process that are in the toolbox |
---|
0:36:01 | so one i'm particularly interested in these days called the beta process |
---|
0:36:05 | where the racially like process uh when you or the rest you have to sit at one and one only |
---|
0:36:09 | and only one table |
---|
0:36:10 | the beta process allows you know the rest right and sit multiple T |
---|
0:36:15 | i |
---|
0:36:15 | so you to think about the tables now was not like clusters is like a feature a bit vector description |
---|
0:36:20 | of of of T |
---|
0:36:22 | so if you set a table one three and seventeen |
---|
0:36:25 | and i sit at three seventeen and thirty five we able to bit of overlap among each other |
---|
0:36:28 | so bit factors gone overlap an interesting ways |
---|
0:36:31 | and so that's what the beta process allows us to do the dirichlet process does not |
---|
0:36:36 | um |
---|
0:36:38 | okay now be on the beta process all tell you but more but the be the process or map about |
---|
0:36:41 | one tell you briefly but the general framework |
---|
0:36:43 | um soaking they had a very important paper uh in nineteen sixty seven and i sixty eight uh a call |
---|
0:36:49 | on complete a measures |
---|
0:36:50 | the beta process as an example of a complete random a measure |
---|
0:36:53 | and right image are really simple to talk about it to work with |
---|
0:36:56 | all they are they are measures they're random measures on that's just like before |
---|
0:37:01 | but what's new here's a T assign independent mast an or second subsets |
---|
0:37:05 | of the space i a picture of this |
---|
0:37:07 | an X |
---|
0:37:07 | yeah are we got the here some arbitrary space |
---|
0:37:10 | a here's a set a and here's a set P |
---|
0:37:13 | and here this red a check is a random measure a discrete measure on this space |
---|
0:37:17 | and so there's a random amount a mass fell in to set a a a random out the fill to |
---|
0:37:21 | be |
---|
0:37:22 | and if that random variable |
---|
0:37:24 | um |
---|
0:37:25 | a here the random mass |
---|
0:37:27 | is that dependent |
---|
0:37:29 | because these are non overlapping sets |
---|
0:37:31 | then this random as is called completely random |
---|
0:37:34 | so it's a really nice concept it leads to divide and conquer out where them it's basically |
---|
0:37:38 | um you know |
---|
0:37:40 | as computational concept |
---|
0:37:42 | okay so uh there are lots of lots of things you you know you know what brown emotion is what |
---|
0:37:46 | brownie motion it turns out is is is a special case of this |
---|
0:37:50 | gamma processes process is and all kinds of other interesting object |
---|
0:37:53 | to respect is not completely random process but it's a normalized gamma prior |
---|
0:38:00 | okay so now |
---|
0:38:01 | um king men had a very beautiful results um |
---|
0:38:04 | characterising completely random process |
---|
0:38:07 | and turns out that they can be derived from poisson prop point process |
---|
0:38:11 | so the pa some process lies behind all of this |
---|
0:38:14 | and it's really beautiful construction or some tell you bout here a really briefly what we're trying to do remember |
---|
0:38:18 | as we have this space so make a |
---|
0:38:20 | we're trying put random measures on a at all job put the racially process measures on spaces |
---|
0:38:24 | i mean now be more general side but a how all kinds of other random a results space |
---|
0:38:28 | what you do it is you take the original space to make a and you cross it with the real |
---|
0:38:31 | line you look at the product space so make across all |
---|
0:38:35 | okay |
---|
0:38:36 | i'm gonna put a poisson process in that product space |
---|
0:38:39 | how do put a poisson process on things |
---|
0:38:41 | well you to tell but with the rate function in |
---|
0:38:43 | i you're probably fill with the homogeneous poisson process where you have a flat rate for a concentrate function |
---|
0:38:49 | and then every the in a little interval or action this case a little set |
---|
0:38:53 | um the number of points for there is a poisson random variable |
---|
0:38:56 | i with some re numb |
---|
0:38:58 | like a let the rate variance so now the poisson on uh number uh the possible rate is an integral |
---|
0:39:03 | of a rate function over that little set |
---|
0:39:05 | so you have to write on the rate function here it is |
---|
0:39:08 | and so i in a great of that rate function to get um |
---|
0:39:10 | i now possible rates for every little small set of this space |
---|
0:39:14 | and here's a draw from that some point process so you see the thing was tilted up and the left |
---|
0:39:18 | i got more points down here and if few are up here |
---|
0:39:22 | now having drawn from this are pa process with this rate function then i forget all this machine or look |
---|
0:39:26 | at this red thing here |
---|
0:39:28 | i take each X i drop a line from that X down to the or make at |
---|
0:39:32 | and now that resulting object is a as measure on the all make a space it's a discrete measure |
---|
0:39:37 | and it's random |
---|
0:39:38 | and it's close it's a completely random measure because any uh mass the falls in a some set a here |
---|
0:39:43 | and some set be here will be independent |
---|
0:39:45 | because it's an underlying plus um pro |
---|
0:39:49 | the beautiful fact is that all the random is gonna got this way |
---|
0:39:53 | okay so more directions trivial other directions complete it's quite non sure |
---|
0:39:57 | so if you like a plea ran a measures and i do |
---|
0:39:59 | uh then this |
---|
0:40:01 | there are says you can reduce the study of |
---|
0:40:03 | these measures to study of the pa some |
---|
0:40:05 | and just the rate functions the possible process |
---|
0:40:08 | so i think that's a tool |
---|
0:40:09 | i think that |
---|
0:40:10 | in this field the me others that we will be studying rate measures for poisson process |
---|
0:40:13 | as a ways to specify |
---|
0:40:16 | comet or structures on thing |
---|
0:40:20 | okay so the particular example of the beta process has this thing is it's rate function or a function at |
---|
0:40:24 | to arg gonna sort the a make a part and the real part |
---|
0:40:27 | the make a part just some at a some measure that gives you um |
---|
0:40:31 | you um |
---|
0:40:32 | the prior on atoms |
---|
0:40:33 | just like a was G not be for not called be not |
---|
0:40:36 | and then here for the beta process is the uh is the beta density |
---|
0:40:40 | and the improper beta density |
---|
0:40:42 | um |
---|
0:40:43 | which gives you infinite collection of a of points we draw from point process which is what we want we |
---|
0:40:47 | don't want find a number that would be a parametric |
---|
0:40:49 | prior we wanna a nonparametric power we do this thing to be |
---|
0:40:53 | improper gender |
---|
0:40:55 | okay so so i that's probably too much map me just draw picture |
---|
0:40:58 | the here is that rate function it has a singularity the origins a really tilts up sharply at the origin |
---|
0:41:04 | it breaks off it stops at one |
---|
0:41:06 | and so all the uh voice we do the poisson some point process you get lots and lots of |
---|
0:41:10 | of points that are very near the origin |
---|
0:41:12 | and you now take this um by dropping from down from the uh |
---|
0:41:16 | uh |
---|
0:41:17 | the Y court down the X N |
---|
0:41:19 | then it looks like the |
---|
0:41:20 | it's O P I is the height of an adam and all make i is the location of the atom |
---|
0:41:25 | and you take that's a infinite sum and that a now is a random measures as another way of getting |
---|
0:41:30 | random measure like to be but stick breaking the four |
---|
0:41:32 | this is another way of getting random measure which is much more general |
---|
0:41:35 | um and a particular these P I do not some to one there between zero and one |
---|
0:41:40 | but they do not some the one |
---|
0:41:42 | and they're independent |
---|
0:41:44 | i so i like to think of these pieces coin tossing probabilities and i like i think that this is |
---|
0:41:48 | now an infinite collection of coins |
---|
0:41:50 | uh and most of the coins have probably nearly zero also picked a all these coins |
---|
0:41:55 | i'll get a few ones and lots of zero |
---|
0:41:58 | and get an infinite set of zeros and a few ones |
---|
0:42:01 | and i do that again a all can get a few ones and a lot of zero |
---|
0:42:05 | a keep doing that they'll be a few places where i'm gonna get lots of ones |
---|
0:42:09 | a lot of overlap |
---|
0:42:10 | and lots of other the places with there's less so |
---|
0:42:13 | of a picture showing that |
---|
0:42:14 | a here i've drawn from the beta process that that blue thing right there this between zero one it's mostly |
---|
0:42:19 | nearly zero |
---|
0:42:20 | and then here's a hundred as |
---|
0:42:23 | with these think of these as coins uh and think of these heights as the probability of one have head |
---|
0:42:28 | i draw this sum has a big a probably here's like a lots of one that are relatively lots of |
---|
0:42:33 | ones like go down the column |
---|
0:42:34 | and some region over here |
---|
0:42:36 | and uh no know i've nearly all zeros why quantizer probably like a almost all zero |
---|
0:42:41 | okay so think of a row of this matrix now snaps as a sparse binary |
---|
0:42:46 | infinite dimensional random age |
---|
0:42:48 | i think of this is a feature vector for a bunch of entities |
---|
0:42:51 | so here's a hundred and tease and if you i think of this like a chinese restaurant it D number |
---|
0:42:56 | one hundred came to the restaurant |
---|
0:42:58 | and didn't just sit it one table it's sat at this table this table in this table this table on |
---|
0:43:02 | the table be for different tables |
---|
0:43:04 | the next person comes in number ninety nine and sat at didn't sit the first table but set at the |
---|
0:43:08 | four stable table and bubble bobble |
---|
0:43:10 | right of this captures the it's it in pattern of a hundred people of in this restaurant and all the |
---|
0:43:15 | tables i cetera at |
---|
0:43:16 | and the total number of tables you keep doing that's gonna grow it's the denser and denser fill out |
---|
0:43:21 | but it grows again at its slow rate |
---|
0:43:25 | okay so um and |
---|
0:43:27 | you know different centres the for parameters of this process and you can get different right is to that on |
---|
0:43:31 | the settings |
---|
0:43:33 | i so that's probably way too abstract for you appreciate um why |
---|
0:43:37 | you wanna do that |
---|
0:43:38 | let me just say that there are also was a restaurant metaphor here |
---|
0:43:41 | um there's something the indian buffet process which captures the sitting pattern directly on the matrix |
---|
0:43:46 | not talking about the underlying beta process just like the chinese restaurant |
---|
0:43:50 | captures the |
---|
0:43:51 | sitting pattern for the de racially process and not talking about the underlined to racially process literally the marginal probability |
---|
0:43:56 | and of the beta process |
---|
0:43:58 | so i'm skip that um i one move to |
---|
0:44:01 | um |
---|
0:44:02 | all the way there we go back to this problem i could make it concrete again |
---|
0:44:06 | okay so number this problem of the multiple time series i have these people coming in into the room and |
---|
0:44:11 | doing these exercise routines |
---|
0:44:12 | and um and i don't know how many |
---|
0:44:15 | routines there and in the library and i don't know who does what routine |
---|
0:44:19 | and each person doesn't do just one routine they do a subset of reading |
---|
0:44:24 | i i |
---|
0:44:25 | right so how my gonna model that |
---|
0:44:27 | i well it's time series and has segments and an i'm plug use an hmm as might basic like |
---|
0:44:32 | but now got put some structure around that to capture all this come oral structure about my problem |
---|
0:44:37 | right |
---|
0:44:37 | so way that i'm gonna do that as a me i'm use the beta process |
---|
0:44:41 | and uh i have a slide on as i think and is that try say this an english what you |
---|
0:44:44 | start to slide |
---|
0:44:46 | um and the math if you care to |
---|
0:44:48 | but |
---|
0:44:48 | a but i encourage you not just listen to me tell you how does were |
---|
0:44:52 | um um okay so let's both everybody this room as good come up here on the stage do the lecture |
---|
0:44:56 | size routines to every one of you is good to your all the subset |
---|
0:44:59 | there's a infinite library up their possible routines you could do |
---|
0:45:03 | you choose before you come up on stage which subset of that library you good at you're good actually pick |
---|
0:45:07 | out |
---|
0:45:08 | do maybe i i'm to do jumping jacks and twist that's on there |
---|
0:45:12 | right |
---|
0:45:13 | now |
---|
0:45:13 | a up and have and there is an infinite by in |
---|
0:45:16 | transition matrix |
---|
0:45:18 | um um |
---|
0:45:19 | and that got possesses the not us get to possess |
---|
0:45:22 | and uh i pick out |
---|
0:45:24 | um twist |
---|
0:45:25 | and jumping jacks |
---|
0:45:26 | from that in for matrix so i'm gonna pick a little to by two sub matrix maybe twist is uh |
---|
0:45:30 | column number thirty seven |
---|
0:45:32 | jumping jacks as number forty two |
---|
0:45:34 | so i pick out that that those those columns in the corresponding rows i get a little to by two |
---|
0:45:38 | matrix and i bring it down thing in for the matrix |
---|
0:45:40 | and i and stan see that a classical hmm |
---|
0:45:44 | it's actually autoregressive hmms like it'll bit of oscillation "'cause" he's are also to remove |
---|
0:45:49 | i and now i run a classical a jim for for me dream my exercise routine my emissions or the |
---|
0:45:53 | six dimensional |
---|
0:45:54 | vector of positions |
---|
0:45:56 | and just a hmm |
---|
0:45:58 | right now i run the forward-backward algorithm and i gets "'em" update to the parameter |
---|
0:46:02 | i don't have they might the local a to go back up an infinite matrix and that little to by |
---|
0:46:06 | two that's right but the updates from the upper bound welch |
---|
0:46:10 | right now he comes then he's the next person a couple the stage and he takes also column thirty five |
---|
0:46:14 | but he did the node one a one one a one seventeen i |
---|
0:46:17 | so you out a three by three subset of the matrix |
---|
0:46:20 | he runs C hmm on his data |
---|
0:46:22 | it's about mulch update and then goes back to the infinite matrix and changes those that three by three submatrix |
---|
0:46:28 | as |
---|
0:46:29 | and as we all keep doing that we're gonna be changing overlapping subsets of that for the minimum mean |
---|
0:46:35 | okay |
---|
0:46:36 | and so that's the beta process |
---|
0:46:38 | ar-hmm |
---|
0:46:40 | so i hope that you kind of got the spirit of that it is just an hmm there one hmm |
---|
0:46:45 | for each of the people do the exercise routine |
---|
0:46:47 | and then the beta process is this machine up here which gives us a a |
---|
0:46:50 | a feature vector a which of the subsets of states are the infinite set a state that i actually you |
---|
0:46:57 | i so the this maybe look a little bit complicated it's not it's actually very easy to put on the |
---|
0:47:01 | computer |
---|
0:47:02 | and again um this is actually in again emily came an implemented this center second summer with us |
---|
0:47:07 | um |
---|
0:47:08 | and |
---|
0:47:10 | but the software to do this |
---|
0:47:12 | um |
---|
0:47:13 | okay so uh anyway it just a hmm like the with the beta process prior on some on the way |
---|
0:47:18 | the parameters struck |
---|
0:47:20 | now this actually really worked |
---|
0:47:21 | so um |
---|
0:47:22 | here motion capture results |
---|
0:47:24 | uh |
---|
0:47:26 | and this is nontrivial trivial problem of a lot lots of methods don't do uh well at all on this |
---|
0:47:30 | problem |
---|
0:47:31 | um it's a bit qualitative as how well we're doing but i think you'll kind of if you look adults |
---|
0:47:34 | were doing the well so here is the first feature |
---|
0:47:37 | if you pick to that's state i E feature |
---|
0:47:39 | and then and you just |
---|
0:47:40 | held that fixed an hmm it's or autoregressive hmms it'll oscillate |
---|
0:47:44 | the oscillations the arms go up and down |
---|
0:47:47 | this kind of picked out this jumping jacks |
---|
0:47:49 | um |
---|
0:47:50 | rudy |
---|
0:47:51 | this one if you take that feature in you put in hmm a you don't you don't any transitions happen |
---|
0:47:56 | you get the knees wobbling back back |
---|
0:47:58 | here you get some kind it was motion of the hips |
---|
0:48:00 | here's the more wobbling something or other here's where the arms are go in circles |
---|
0:48:04 | i so when your the bottom you start to get a little bit more subdivision the states than maybe we |
---|
0:48:08 | might like although emily thinks that there's kind of |
---|
0:48:11 | good can a mac reasons of these are actually to be gift |
---|
0:48:14 | a but this really this that nail the problem it took in this sixty dimensional time-series multiple once do not |
---|
0:48:18 | about the number of segments and where the segments occur |
---|
0:48:21 | and jointly segment them among on all the six all all all the different users |
---|
0:48:25 | of the sector |
---|
0:48:28 | alright right |
---|
0:48:29 | um how much you on time |
---|
0:48:31 | um |
---|
0:48:31 | okay so i'm a about the not time i'm this good uh i think i against get some slides here |
---|
0:48:35 | and just say something about um |
---|
0:48:38 | um |
---|
0:48:38 | this model |
---|
0:48:39 | uh |
---|
0:48:40 | so this is something i've were done for a number of years it's say um exchangeable model i bag of |
---|
0:48:44 | words model for text probably were shocked nations it's fairly widely known |
---|
0:48:48 | it's extremely simple it's too simple for really lots of we're a role work phenomena |
---|
0:48:53 | and so we've been working i make a more interesting |
---|
0:48:55 | and um |
---|
0:48:56 | so |
---|
0:48:56 | what the lda model does it takes a bag of words representation of text |
---|
0:49:01 | and it re represents texans since terms of work on top top |
---|
0:49:05 | a topic is a probable vision on word |
---|
0:49:07 | and so i given document can express a subset of topics maybe be sports and trap |
---|
0:49:12 | and so all the words in the document come from |
---|
0:49:14 | the sports topic |
---|
0:49:15 | or the travel top can you get mixtures were called had mixtures |
---|
0:49:19 | um um of topics within a single talk |
---|
0:49:22 | um um the problem with this approach would many problems one of them is that the |
---|
0:49:26 | words like |
---|
0:49:27 | a a function words |
---|
0:49:29 | uh ten do occur every single topic |
---|
0:49:31 | because of i have a duck with only about travel and the function words don't occur in that topic |
---|
0:49:35 | i can get function words in that document |
---|
0:49:37 | i get a very low probably document |
---|
0:49:39 | right so we like to separate out things like function words other kinds of abstract words from more concrete words |
---|
0:49:45 | and make a traction higher |
---|
0:49:47 | right so we have done that was the paper the that that was G A C and the you're and |
---|
0:49:51 | i approach you look at this site more proud out of this and some sense than lda as a kind |
---|
0:49:55 | of a |
---|
0:49:56 | path forward |
---|
0:49:57 | for this field |
---|
0:49:58 | uh we call this the nested chinese restaurant process |
---|
0:50:01 | and so this is a a whole chinese restaurant up here and here's another chinese rest are all these are |
---|
0:50:05 | chinese restaurants are now organise in a tree |
---|
0:50:08 | when you go to a chinese restaurant you pick a table like before and that tells you what branch to |
---|
0:50:12 | leave |
---|
0:50:13 | to go to the next rest wrought |
---|
0:50:15 | so you think about this is a as the first night here in prague |
---|
0:50:18 | you pick some rest and then you'll or what branch you you know you know what rest are you need |
---|
0:50:22 | a to on the second night |
---|
0:50:24 | and then a third not and so on as you keep going |
---|
0:50:27 | the nights of the conference |
---|
0:50:29 | alright so what document comes down here and picks the path down this tree |
---|
0:50:33 | another dog it comes in a picks another path and they what that that the overlap |
---|
0:50:37 | and so you get these |
---|
0:50:38 | overlapping branching structures |
---|
0:50:40 | alright and now you put a topic at every node this tree it has a distribution words |
---|
0:50:45 | and then a document has a path down the tree that gives it a set of topics can draw from |
---|
0:50:50 | and that draws the words from those that that set of top |
---|
0:50:54 | right |
---|
0:50:55 | now that a up of the top is been used by all documents |
---|
0:50:58 | so it makes a lot sense to put the function words up there |
---|
0:51:02 | or i as this no down here is only been used by small so so the docking so you might |
---|
0:51:06 | will put some more concrete words down there |
---|
0:51:09 | and that's what the statistics does we fit this the data |
---|
0:51:11 | it did develop stop it's at the top of the tree which are more abstract a more concrete topics as |
---|
0:51:15 | you go down the tree |
---|
0:51:17 | i so i'm get just yep |
---|
0:51:18 | and here's my last slide actually this is the a result to turn your head |
---|
0:51:22 | uh of um in this to us like i think is as action not psychology all you change this errors |
---|
0:51:26 | like that |
---|
0:51:28 | psych |
---|
0:51:28 | um psych review view he's are abstracts or a particular journal and psychology |
---|
0:51:32 | and this is the most probable hold tree we're put a distribution on the tree turn |
---|
0:51:36 | is everything is uh |
---|
0:51:38 | distribution here |
---|
0:51:39 | and the was high probability tree in is the was high probably topic |
---|
0:51:43 | um |
---|
0:51:44 | or a high probably words that the topic at that node so we get a and of is the high |
---|
0:51:47 | probably words at the root |
---|
0:51:49 | so we have to strip away the function words in this case they just pop |
---|
0:51:52 | what well hold up the |
---|
0:51:54 | um the next level we get a model memory uh self social psychology motion vision binocular |
---|
0:51:59 | dried food brain |
---|
0:52:00 | oh so this looks kind of like social psychology cognitive psychology uh physiological psychology and visual psychology |
---|
0:52:07 | and so once you good other the is actually infinite tree we are shown the first three levels of that |
---|
0:52:12 | um so any about hope that "'cause" you or more flavour of you know the toolbox box no should here |
---|
0:52:15 | we but |
---|
0:52:16 | try rest not together and a |
---|
0:52:18 | with a a object which first distributions on is and we could sell do things like how traction |
---|
0:52:23 | and reason about |
---|
0:52:25 | um so i'm done uh i was just kind of a tour of a few highlights of a literature um |
---|
0:52:30 | they're probably about a hundred people worldwide are working in this topic |
---|
0:52:34 | uh actively we as a composer "'cause" there's a conference call bayesian a nonparametric parametric it's we held of very |
---|
0:52:39 | cruise later this model |
---|
0:52:41 | um every two years as |
---|
0:52:42 | you held |
---|
0:52:42 | um |
---|
0:52:44 | uh |
---|
0:52:44 | and it is just be getting going so for much of the younger people the audience you uh |
---|
0:52:48 | i highly encourage you look at this is there's a little bit of work |
---|
0:52:51 | beam brought in the speech and signal processing but there as good be a whole whole lot more |
---|
0:52:55 | um so on my publication page there two papers which are my point you to if you enjoy the talk |
---|
0:52:59 | that are written for |
---|
0:53:01 | for for uh a non expert and give you lots of pointers to um more |
---|
0:53:05 | a literature |
---|
0:53:06 | thank you very much |
---|
0:53:16 | a there's time for questions |
---|
0:53:17 | yes |
---|
0:53:17 | thank you both to a them for two |
---|
0:53:20 | so can we see have time for |
---|
0:53:22 | some questions yeah |
---|
0:53:31 | hearing is read channel you get of make it |
---|
0:53:35 | and we already |
---|
0:53:36 | in the money |
---|
0:53:37 | a good a so i have two questions a lot first well thank you very much things to this and |
---|
0:53:40 | is possible for to still uh |
---|
0:53:42 | a |
---|
0:53:43 | a community |
---|
0:53:44 | the first question to i have a a a um |
---|
0:53:48 | don't i'm just a |
---|
0:53:50 | full not a scale problem |
---|
0:53:52 | since this set and a |
---|
0:53:54 | choir monte carlo simulation |
---|
0:53:56 | i |
---|
0:53:57 | okay |
---|
0:53:58 | to need that is to make a |
---|
0:54:00 | really |
---|
0:54:01 | real |
---|
0:54:01 | difficult you know i don't believe that at all and so no one knows really we go to large scale |
---|
0:54:07 | uh so |
---|
0:54:08 | you know the em algorithm does that apply to large scale problems or not |
---|
0:54:11 | yeah yes and no |
---|
0:54:12 | i mean for some problems |
---|
0:54:14 | things |
---|
0:54:15 | really quickly settled down after a very quick number of iterations of the uh maybe like two iterations of em |
---|
0:54:20 | or for some problems |
---|
0:54:21 | these algorithms |
---|
0:54:22 | that should do a are just give samplers |
---|
0:54:25 | and so with the E out with a plus one extra step |
---|
0:54:28 | just a bitterly em do like em before uh you not change the indicator from this to this |
---|
0:54:33 | or or or go to a brand table |
---|
0:54:35 | right so we actually don't know whether uh poster inference are guns on large scale or gonna mix |
---|
0:54:40 | maybe they mix may of more quickly than were used to from a small day |
---|
0:54:44 | in the last point to make is that this has not you you C C that's just what we happen |
---|
0:54:48 | use because this was we had three months to the project |
---|
0:54:50 | any other procedures are split and merge algorithm there's is variational methods and so on for post your as can |
---|
0:54:55 | be used here |
---|
0:54:56 | um so um |
---|
0:54:58 | it's it's low |
---|
0:54:59 | yeah not a second question i have is that many people in this all |
---|
0:55:03 | where lee |
---|
0:55:05 | um the map of the use of a well yeah it's sample |
---|
0:55:08 | a a not a really small a good model |
---|
0:55:11 | yeah is for speech |
---|
0:55:12 | in in a fist of |
---|
0:55:14 | just |
---|
0:55:15 | yeah that we select we use that to them what in france easy |
---|
0:55:19 | and and then |
---|
0:55:20 | easy correct so i just want to know you'll cake all |
---|
0:55:24 | to what extent |
---|
0:55:25 | this this ten |
---|
0:55:27 | general |
---|
0:55:28 | your yeah you not okay that's a great great question like like a very much a so how those two |
---|
0:55:33 | papers or a minute |
---|
0:55:34 | and your question trying to show the toolbox box shows a range of other kinds of models we |
---|
0:55:39 | consider a |
---|
0:55:40 | a there need a racially process for example doesn't have |
---|
0:55:43 | a power law behavior |
---|
0:55:44 | you might want power law behavior |
---|
0:55:46 | or something else called the pitman-yor which gives you power be |
---|
0:55:48 | you could do a pitman-yor version of H P H M M |
---|
0:55:52 | um |
---|
0:55:52 | and so there there are many many generalisations there's inverse gamma forms of the weights and so on so for |
---|
0:55:58 | i so um |
---|
0:56:00 | yeah uh you know |
---|
0:56:01 | this is really is a tool box |
---|
0:56:03 | and i think also the point about you know i'm a statistician we're used the to models which if you |
---|
0:56:07 | find the right card to it can really work surprisingly well |
---|
0:56:10 | and so you hmm yes it's not the right model for speech but that's lee set and i at |
---|
0:56:15 | i |
---|
0:56:15 | and it still was the a you know the elephant to the got speech all the way to where it |
---|
0:56:19 | is |
---|
0:56:20 | maybe badly may wrongly |
---|
0:56:22 | right |
---|
0:56:23 | um |
---|
0:56:23 | you know but uh i it is a very useful to make cartoons tunes that to have nice that as |
---|
0:56:28 | is go and computational probably and can be general |
---|
0:56:31 | so i think a hmm was too much of a box could be generalized easily and i think that these |
---|
0:56:35 | methods go beyond that |
---|
0:56:36 | once again |
---|
0:56:37 | but still staying with a problem |
---|
0:56:42 | but it's like back up |
---|
0:56:43 | oh he have to do that the back |
---|
0:56:45 | i have control over that |
---|
0:56:48 | or another question |
---|
0:57:00 | i can summarise ways |
---|
0:57:11 | yeah so is questions about over fitting wise you getting killed on over fitting a nonparametric world |
---|
0:57:16 | well you know i |
---|
0:57:17 | uh |
---|
0:57:18 | it's easy for is but bayesian don't have such problems of the over fitting |
---|
0:57:21 | it's kind of i'm not always a bayesian but what i a bayesian |
---|
0:57:24 | um you know it's one the is i am |
---|
0:57:27 | um so you know of to first or we don't have a over fitting troubles with these systems even on |
---|
0:57:31 | pretty large scale problem |
---|
0:57:34 | a fact |
---|
0:57:36 | um so you know we compare this for example one our context free grammars stuff the |
---|
0:57:40 | be a pet of there was ice with a split merge em algorithm |
---|
0:57:43 | and there they had |
---|
0:57:44 | a big over fitting problem they kind don't with that |
---|
0:57:47 | eventually pretty fact way |
---|
0:57:49 | um but we didn't have to think about that |
---|
0:57:51 | just didn't |
---|
0:57:51 | you know |
---|
0:57:52 | come up |
---|
0:57:53 | um |
---|
0:57:54 | you know so that's the first order the second or you have a couple of hyper parameter of to get |
---|
0:57:58 | them in the right range |
---|
0:57:59 | you know get the right range of have some overfitting fitting but were very robust to that |
---|
0:58:03 | and so you're right yeah the day didn't totally work just of about |
---|
0:58:07 | we had a little over fitting there was a one more state there should a in |
---|
0:58:10 | but that's not too bad |
---|
0:58:11 | that it was not to easy do a little bit of engineering and think about the problem a little bit |
---|
0:58:14 | more they are well little bit of time scale and have at you know uh prior needs to be put |
---|
0:58:18 | in |
---|
0:58:20 | so i i think that's fine were sort of artists an engineers were trying to we don't mix wanna build |
---|
0:58:24 | a box |
---|
0:58:25 | that in a you give to a high schools to the it's done |
---|
0:58:27 | oh always be a little bit thinking |
---|
0:58:29 | and a lot of engineering |
---|
0:58:30 | and to about |
---|
0:58:32 | um but i really the you know i'm not always a bayesian i've of of what's to be a non |
---|
0:58:36 | be i go back and forth every single day of my life |
---|
0:58:39 | um |
---|
0:58:39 | you know but for a lot of these really high dimensional hard |
---|
0:58:42 | yeah inference problems we you have multiple things which need to kind of |
---|
0:58:45 | collaborate by the harry |
---|
0:58:47 | a just gives you a a a you know from the get go a lot control over those sorts |
---|
0:58:56 | in you the question |
---|
0:59:05 | so for speech we use mixture models within each yeah a is there way too |
---|
0:59:10 | this side went to great then use a versus one but you a new component yeah X question i should |
---|
0:59:15 | made that clear |
---|
0:59:16 | so |
---|
0:59:16 | the state specific emission distribution here is not a single gal C |
---|
0:59:20 | for reasons you guy know extremely well |
---|
0:59:22 | is the the mixture of gas |
---|
0:59:24 | no |
---|
0:59:24 | it's a the racially process mixture of dallas |
---|
0:59:27 | right and it turned out that was critical for us to get this to work |
---|
0:59:32 | absolutely we need more just like a the classical gmm |
---|
0:59:35 | a single permit speech done more |
---|
0:59:37 | we don't wanna have a |
---|
0:59:38 | L distributions there we wanted to grow as you get more all patches to be more more pictures of that |
---|
0:59:44 | mission distribution which a rise we put in the hard the like cross |
---|
0:59:47 | it's a hard like process |
---|
0:59:49 | that helps to make the point about two boxing better |
---|
0:59:55 | a well we should uh we've one more question asked one |
---|
1:00:04 | uh we actually use quite a bit up a all |
---|
1:00:08 | i hidden markov model in for |
---|
1:00:11 | and it's very good so make mean everybody to |
---|
1:00:14 | take a low but one that the major problems is in the |
---|
1:00:18 | a prior evil lucien because we have to estimate a lot |
---|
1:00:23 | for a hyper hour um |
---|
1:00:25 | so sometimes |
---|
1:00:26 | the number of high but from is is more than |
---|
1:00:29 | the pair on the this is so |
---|
1:00:31 | now now yeah talking about a |
---|
1:00:33 | in for a number of |
---|
1:00:35 | for it is |
---|
1:00:36 | so we're have probably i for a number of hyper per on no we don't know that's really critical so |
---|
1:00:41 | the classical bayesian hmm had a lot hyper parameter that because it had |
---|
1:00:46 | fixed K states yes and number of hyper ever scaled the size K |
---|
1:00:50 | right and you had the I C or something else like that the choose K |
---|
1:00:53 | oh |
---|
1:00:54 | we're not do we any of that |
---|
1:00:55 | we have a distribution the number of states and a number of hyper parameters constant |
---|
1:00:59 | okay a small |
---|
1:01:01 | so it's share she's here and i was actually there's a sharing by this high this time these uh choices |
---|
1:01:06 | a french |
---|
1:01:07 | the hyper programs real little with a top level |
---|
1:01:10 | okay okay at the menu that one there's of so |
---|
1:01:12 | a very small number of that so there is uh a pry pollution according to how you is you give |
---|
1:01:17 | the have breast they are correct the very number of small number of hyper parameters here okay some sense almost |
---|
1:01:22 | too small then you can't believe it it it it probably has to be a go a little bit to |
---|
1:01:26 | know is a is very good we but is that we not |
---|
1:01:29 | this is not your classical |
---|
1:01:30 | a bayesian hmm or you have the number of states things |
---|
1:01:33 | were in a of the number state it's friend |
---|
1:01:35 | okay as the city pattern in the rest |
---|
1:01:38 | it's it's grows that the prior with log in and the posterior just random |
---|
1:01:41 | okay |
---|
1:01:43 | thank you thank you |
---|
1:01:44 | a well i would like a us to thing |
---|
1:01:46 | see once again for if |
---|
1:01:56 | and now we have the uh a few break outside T one |
---|
1:01:59 | nine or |
---|
1:02:08 | hmmm |
---|
1:02:09 | i |
---|
1:02:09 | yeah |
---|