0:00:15 | these work was the and by jeff's timing and mean |
---|
0:00:20 | and a scene two thousand two out of there is no i-vectors we really tried |
---|
0:00:25 | to put i-vectors what we don't didn't find where to put it |
---|
0:00:32 | we don't say that this is this state of the art however you can okay |
---|
0:00:42 | to do something new something and sometimes the best think it's to use something very |
---|
0:00:47 | old that everyone forgot about it |
---|
0:00:51 | so these work basically go back to nineteen fifty five |
---|
0:00:58 | where hmms were approximately define |
---|
0:01:03 | and |
---|
0:01:03 | at this time their work |
---|
0:01:06 | you pose define two types of hmms one that we well known |
---|
0:01:12 | we have a transitions |
---|
0:01:16 | from one state or to another and then we have a at each state it |
---|
0:01:22 | distribution the define the distribution of these data these |
---|
0:01:27 | state and you name |
---|
0:01:29 | more hmm and another type will find that |
---|
0:01:35 | it depends from where the data was defined so |
---|
0:01:39 | both transition probability and the distribution of the data was on the arcs |
---|
0:01:46 | not at this data and you named |
---|
0:01:49 | male hmm |
---|
0:01:51 | in the control system |
---|
0:01:54 | society they work a lot on both types of hmms |
---|
0:01:59 | but they were more than |
---|
0:02:01 | the don't try to estimate the parameters or make |
---|
0:02:05 | the best bus is viterbi |
---|
0:02:08 | the more work on |
---|
0:02:10 | discrete distribution and ask questions what is the equivalent model the that they can find |
---|
0:02:17 | or what is the minimum |
---|
0:02:20 | hmm the |
---|
0:02:21 | they can find |
---|
0:02:23 | we will look |
---|
0:02:25 | on the other perspective of the mainly hmm |
---|
0:02:29 | and compare it to more hmm the hmm we know |
---|
0:02:34 | and try to apply to |
---|
0:02:36 | there is a nation of telephone conversations |
---|
0:02:40 | so |
---|
0:02:42 | i would give a short summary of hmm just to build the same notation not |
---|
0:02:46 | to say something new |
---|
0:02:48 | and then they will present |
---|
0:02:50 | the mainly gmm |
---|
0:02:53 | and |
---|
0:02:54 | show how we applied to speaker there is variation |
---|
0:02:57 | and some results |
---|
0:02:59 | would be following |
---|
0:03:03 | so in the hmm we know that we have if we have k state |
---|
0:03:09 | model |
---|
0:03:11 | it defined by the |
---|
0:03:13 | initial probability vector or the transition matrix |
---|
0:03:17 | and |
---|
0:03:18 | the vector or of |
---|
0:03:22 | distributions in the |
---|
0:03:23 | gmm case |
---|
0:03:25 | each state distribution will be a |
---|
0:03:28 | gmm |
---|
0:03:29 | so the triple |
---|
0:03:32 | by a and b defines the |
---|
0:03:34 | model |
---|
0:03:38 | in more a gmm |
---|
0:03:40 | what we show |
---|
0:03:42 | there are going three problems to define the probability or the likelihood of the data |
---|
0:03:49 | of the model given the data |
---|
0:03:52 | of the v terribly problem to find the best path |
---|
0:03:56 | and |
---|
0:03:57 | to estimate the model |
---|
0:03:59 | we can estimated via a viterbi |
---|
0:04:03 | statistics |
---|
0:04:04 | or by baum-welch |
---|
0:04:07 | in our case in there is variation |
---|
0:04:09 | we interesting in viterbi statistics more |
---|
0:04:14 | the what tomato vacation to use mainly a gym and can be seen in these |
---|
0:04:19 | do example |
---|
0:04:22 | we can see that the |
---|
0:04:25 | suppose we are looking at state to |
---|
0:04:31 | there is |
---|
0:04:33 | few data which are i from state one to state to name and to one |
---|
0:04:39 | only two hundred points |
---|
0:04:42 | on the other hand from states three to state |
---|
0:04:46 | two |
---|
0:04:47 | derive much more data |
---|
0:04:50 | nine times more data |
---|
0:04:53 | the distribution of the data which arrive from each |
---|
0:04:57 | state |
---|
0:05:00 | different go science |
---|
0:05:04 | but if we will try to |
---|
0:05:06 | estimate its state to gmm of |
---|
0:05:10 | size of two |
---|
0:05:12 | it would basically low |
---|
0:05:15 | almost like the data k r i from state three |
---|
0:05:18 | in state |
---|
0:05:20 | two data will |
---|
0:05:21 | have very small influence |
---|
0:05:24 | all the distribution |
---|
0:05:29 | so |
---|
0:05:31 | we there we want to emphasise |
---|
0:05:35 | the data of this the which derived from state one |
---|
0:05:39 | so we can |
---|
0:05:42 | truancy is the to this the eight |
---|
0:05:46 | it's proper or when the data came from state one |
---|
0:05:53 | these are the distributions |
---|
0:05:56 | of the two gauche iain's |
---|
0:05:58 | and the above |
---|
0:06:04 | the |
---|
0:06:05 | from state one and from state to if we multiply each of them |
---|
0:06:12 | it the transition |
---|
0:06:15 | what |
---|
0:06:16 | we can see that |
---|
0:06:20 | we can not |
---|
0:06:22 | trying to see have any transition |
---|
0:06:26 | from state one to state to because the blue line is about |
---|
0:06:32 | and |
---|
0:06:34 | and of |
---|
0:06:35 | we always will decide to state to state it state one |
---|
0:06:40 | but |
---|
0:06:42 | if we are looking only on the data from transitions on the arcs |
---|
0:06:48 | on the specific data |
---|
0:06:50 | then we see |
---|
0:06:51 | it totally another features we see that it's much prefer able to move from state |
---|
0:06:58 | one to state to than to stay on state one which is the |
---|
0:07:04 | blue line |
---|
0:07:06 | so |
---|
0:07:07 | and i |
---|
0:07:08 | if we have |
---|
0:07:11 | it specific distribution on each arc and not on the state level |
---|
0:07:18 | we can |
---|
0:07:20 | better |
---|
0:07:22 | two |
---|
0:07:23 | move from one state to another |
---|
0:07:27 | then when we assume that |
---|
0:07:29 | the day in the state data |
---|
0:07:31 | is the same norm made therefore we each |
---|
0:07:34 | preview state we arrive |
---|
0:07:37 | so these was them the motivation to try to move |
---|
0:07:41 | from |
---|
0:07:43 | more a gmm two male hmm |
---|
0:07:48 | it in this case we define our model that we have |
---|
0:07:53 | in the initial vector or but that initial vector it's not |
---|
0:07:58 | effect or probabilities but in the a vector of pdfs for a distribution function |
---|
0:08:04 | it depends also |
---|
0:08:06 | on the data |
---|
0:08:08 | not only |
---|
0:08:09 | which data you are going to |
---|
0:08:12 | and we have a |
---|
0:08:14 | metrics any which is the matrix all again of function |
---|
0:08:20 | the dependence |
---|
0:08:21 | from which they to each datum transient |
---|
0:08:25 | transient and |
---|
0:08:26 | the data also it depends also on data so now we have a model which |
---|
0:08:32 | is |
---|
0:08:33 | a couple |
---|
0:08:34 | only of but i and eighty |
---|
0:08:41 | we have the same three problems like in more hmm to define the |
---|
0:08:46 | and likelihood of the model given the data |
---|
0:08:50 | to find the best path |
---|
0:08:52 | and to estimate |
---|
0:08:55 | or the parameters via |
---|
0:08:57 | viterbi statistics or baum-welch |
---|
0:09:01 | again baum-welch is not of the interest of these store we will |
---|
0:09:05 | don't just a little bit |
---|
0:09:07 | on these |
---|
0:09:09 | later |
---|
0:09:13 | so we can see |
---|
0:09:15 | if you want to estimate the likelihood |
---|
0:09:18 | it became very easy |
---|
0:09:21 | it just a |
---|
0:09:23 | product all of the |
---|
0:09:25 | initial vector are multiplied by mattered |
---|
0:09:28 | and then to sum it we multiply by |
---|
0:09:33 | vector a |
---|
0:09:34 | a row vector of ones |
---|
0:09:38 | if we compare it to for a gmm |
---|
0:09:40 | of course the |
---|
0:09:42 | we know we have to make |
---|
0:09:44 | apart over all the possible one |
---|
0:09:49 | pasties and we use the |
---|
0:09:51 | the forward or backward |
---|
0:09:54 | coefficients to do it but still the creation is much more complex |
---|
0:09:58 | then |
---|
0:10:00 | the |
---|
0:10:01 | matrix multiplication that we have in |
---|
0:10:04 | mainly representation |
---|
0:10:12 | to find the best viterbi by us |
---|
0:10:14 | it's also a known problem we have just to make these products |
---|
0:10:21 | all of the |
---|
0:10:23 | best transitions we have |
---|
0:10:29 | and we want to maximize |
---|
0:10:33 | are marks on the |
---|
0:10:34 | and a sequence of states really |
---|
0:10:37 | want to have |
---|
0:10:40 | i will briefly |
---|
0:10:43 | do you to |
---|
0:10:45 | because it's well now we have a at each time stamp effect or |
---|
0:10:51 | off |
---|
0:10:52 | best |
---|
0:10:55 | like to use of the sequence |
---|
0:10:57 | of a partial sequence and |
---|
0:11:00 | effect or of we are we derive from |
---|
0:11:04 | just as in more |
---|
0:11:06 | case |
---|
0:11:09 | we initialize |
---|
0:11:12 | the |
---|
0:11:13 | delta vector and three vector or |
---|
0:11:16 | very simply |
---|
0:11:20 | but in their a portion |
---|
0:11:22 | the equation became very simple much simpler than |
---|
0:11:27 | it wasn't more a gym and you just have them are probably product |
---|
0:11:32 | and mean |
---|
0:11:35 | be twice or |
---|
0:11:38 | between the vector of |
---|
0:11:41 | previous likely to and |
---|
0:11:43 | a row vector of much looks at |
---|
0:11:47 | we take much someone these product |
---|
0:11:50 | and they have they |
---|
0:11:53 | place where of the maximum likelihood of the path and argmax you've the previews |
---|
0:11:59 | place |
---|
0:12:00 | state where we came from |
---|
0:12:04 | and then like in more the gym and we have a termination |
---|
0:12:09 | and |
---|
0:12:10 | every cushion |
---|
0:12:11 | novel |
---|
0:12:12 | changes at all |
---|
0:12:20 | if you want to estimate |
---|
0:12:23 | the parameters using viterbi statistics |
---|
0:12:30 | which are the and i |
---|
0:12:32 | hence the cost function |
---|
0:12:37 | and |
---|
0:12:38 | the difference to the moral you |
---|
0:12:40 | in the level lagrange multiplier now we have a constrained |
---|
0:12:44 | not bella |
---|
0:12:47 | is some of |
---|
0:12:49 | the weights have to sum to one |
---|
0:12:52 | but |
---|
0:12:53 | the estimation to one |
---|
0:12:55 | have to be over all the weights |
---|
0:12:58 | from |
---|
0:13:00 | all the transition states from |
---|
0:13:02 | if you're going from state one we take the weights all the weights which are |
---|
0:13:07 | self loop to state one plus all the weights state to an states three |
---|
0:13:12 | and |
---|
0:13:13 | this is the only difference |
---|
0:13:18 | and |
---|
0:13:19 | at the end it converge to very simple recreation we just look |
---|
0:13:25 | it the data that runs eaten can maybe a train it gmm |
---|
0:13:30 | like |
---|
0:13:32 | we do in more but then we have |
---|
0:13:35 | scale the |
---|
0:13:36 | weights |
---|
0:13:37 | it at each gmm |
---|
0:13:40 | by |
---|
0:13:42 | this fraction |
---|
0:13:44 | everyone knows here what fraction is yes |
---|
0:13:53 | and this fraction it is actually the same s the transition probability in the more |
---|
0:14:01 | a gmm |
---|
0:14:03 | i was so we can see that on each are |
---|
0:14:06 | it's not a pdf |
---|
0:14:09 | but it pdf multiplied by |
---|
0:14:12 | a probability of the transition |
---|
0:14:22 | if you want to do bound where's we just i will not give the creation |
---|
0:14:26 | of there are big and ugly and |
---|
0:14:30 | there is no match information i just show that we have two |
---|
0:14:34 | defined a little bit differently the hidden variables we need |
---|
0:14:39 | the hidden variables on the |
---|
0:14:41 | for state on the initial state to define |
---|
0:14:46 | tk me |
---|
0:14:47 | one that if |
---|
0:14:48 | the m-th mixture |
---|
0:14:51 | i |
---|
0:14:52 | of the kinetic case initial state they meet |
---|
0:14:55 | do the e x one and similarly |
---|
0:14:58 | lee |
---|
0:14:59 | we define the hidden variable |
---|
0:15:02 | but any outdoor |
---|
0:15:04 | time which is not one |
---|
0:15:13 | then |
---|
0:15:15 | can rise the question is it really matter to you more a gmm or maybe |
---|
0:15:20 | a gmm |
---|
0:15:24 | yes and no |
---|
0:15:28 | yes we will see that it makes the life easier we will show shortly |
---|
0:15:33 | no because it was shown already that |
---|
0:15:39 | any |
---|
0:15:41 | more a gmm can be represented is mainly hmm and vice versa if any male |
---|
0:15:48 | hmm can be represented as more gmm |
---|
0:15:51 | so if we define |
---|
0:15:54 | a set of all possible sequences |
---|
0:15:58 | we give an example in the binary sequence that |
---|
0:16:03 | let's say all the value can be only zero and once so x star our |
---|
0:16:09 | all sequence possible sequences |
---|
0:16:13 | then the string probability p is |
---|
0:16:17 | and mapping from x start to zero one |
---|
0:16:22 | and we can define an equivalent model |
---|
0:16:27 | like |
---|
0:16:28 | to a two models |
---|
0:16:30 | which can be both hmm or both mainly one hmm one mainly |
---|
0:16:35 | more |
---|
0:16:36 | are defined the is equivalent |
---|
0:16:43 | for each be an ap prime |
---|
0:16:46 | we get the be equals be prime |
---|
0:16:56 | then we can define |
---|
0:17:00 | the more minimal model |
---|
0:17:02 | it's a model that |
---|
0:17:07 | it's in the equivalent model than has the look at the smallest number of states |
---|
0:17:15 | and the same is the mailing minimal a model |
---|
0:17:18 | we define the same if you have to mainly models that the mainly |
---|
0:17:23 | with the same |
---|
0:17:27 | be really we use the less number of states |
---|
0:17:32 | it's an open question still |
---|
0:17:35 | how to find the minimal model |
---|
0:17:39 | but |
---|
0:17:40 | the more interesting that we can show that for any case |
---|
0:17:45 | states |
---|
0:17:46 | more hmm we can find |
---|
0:17:49 | and the equivalent mainly hmm with the same number of states |
---|
0:17:54 | with no more than k states |
---|
0:17:57 | but |
---|
0:17:58 | vice versa it's not so easy for k |
---|
0:18:01 | states |
---|
0:18:03 | male hmm |
---|
0:18:05 | it can't happen that the |
---|
0:18:07 | minimal model for will be case square states so we increase |
---|
0:18:13 | in the power of to the number of states |
---|
0:18:17 | very easy to show how to move |
---|
0:18:21 | from |
---|
0:18:23 | more to maybe you just on the arcs put the probability of the pdfs of |
---|
0:18:29 | the state and multiplied by transition and we have an equivalent small |
---|
0:18:34 | but if you're going to male hmm |
---|
0:18:38 | we have to build |
---|
0:18:39 | it's structure that |
---|
0:18:42 | part of the transitions are zero and |
---|
0:18:46 | so a |
---|
0:18:48 | specify |
---|
0:18:49 | in the very precise way how to really |
---|
0:18:52 | and |
---|
0:18:53 | i'm not sure that this will be the minimal more model |
---|
0:18:57 | but they showed that this more than on more model would be equivalent to mainly |
---|
0:19:01 | model |
---|
0:19:02 | but we increase transition matrix and so on |
---|
0:19:06 | these in the case when we |
---|
0:19:08 | no the which state belong to which event |
---|
0:19:13 | if we don't know |
---|
0:19:14 | we will have to somehow estimated state |
---|
0:19:17 | it's one it |
---|
0:19:19 | s one to belong to event one and a to prevent to it's not |
---|
0:19:25 | very simple |
---|
0:19:27 | we applied to speaker there is station we have some voice activity detection overlapped speech |
---|
0:19:34 | removal of that the initialisation of hmm |
---|
0:19:38 | and then we |
---|
0:19:39 | apply fix duration |
---|
0:19:43 | a gmm |
---|
0:19:44 | clustering |
---|
0:19:46 | both for |
---|
0:19:48 | mainly and formal |
---|
0:19:50 | the minimum amount duration was of two hundred milliseconds it mean we stay twenty states |
---|
0:19:58 | in the same model |
---|
0:20:02 | so we have three hyper state for speaker one speaker to and non-speech because we |
---|
0:20:08 | know that this is telephone conversation we know in advance they are |
---|
0:20:12 | only |
---|
0:20:13 | two speakers |
---|
0:20:14 | in case of more hmm these is the picture which they tell in our case |
---|
0:20:20 | that we could twenty times in the same model then we can translate to any |
---|
0:20:26 | outdoor |
---|
0:20:28 | in male hmm it's very see similar |
---|
0:20:32 | but now with thing one model |
---|
0:20:35 | doll minus one ninety times in the same model |
---|
0:20:38 | and we have |
---|
0:20:40 | now on the transition our |
---|
0:20:44 | distributions |
---|
0:20:48 | the results were on |
---|
0:20:50 | ldc database one hundred and that eight conversations |
---|
0:20:55 | then approximately ten minute each |
---|
0:21:01 | and |
---|
0:21:03 | we tried different models |
---|
0:21:08 | therefore more those of twenty one and twenty four gaussian as a full covariance what |
---|
0:21:14 | better model gave |
---|
0:21:16 | best results |
---|
0:21:18 | over twenty four the results dropdown so we didn't show here |
---|
0:21:23 | and then we tried to |
---|
0:21:26 | a different |
---|
0:21:28 | models of mainly a gmm |
---|
0:21:31 | on the left side |
---|
0:21:33 | we see that a total number of gaussian is that we have |
---|
0:21:37 | in all the |
---|
0:21:39 | hmm |
---|
0:21:40 | and of the right side the diarization error rate |
---|
0:21:44 | and we can see basically that |
---|
0:21:47 | we have more gmms to estimate |
---|
0:21:50 | but we can achieve the same results as in |
---|
0:21:56 | more which a man |
---|
0:21:58 | we is twenty percent about twenty percent |
---|
0:22:02 | let's go oceans overall |
---|
0:22:04 | why because |
---|
0:22:06 | we enable |
---|
0:22:08 | to define it data on the transition |
---|
0:22:12 | because |
---|
0:22:13 | we cannot be sure that speaker one speaks after speaker at all |
---|
0:22:19 | have the same dynamics |
---|
0:22:21 | or i don't face like if you start speaking after silence |
---|
0:22:26 | maybe speaks differently and we want to |
---|
0:22:30 | defined these transition effects |
---|
0:22:33 | and we define them on the are |
---|
0:22:36 | probabilities |
---|
0:22:39 | so |
---|
0:22:41 | we can |
---|
0:22:43 | have the same results |
---|
0:22:45 | with less go shares or |
---|
0:22:48 | a little bit |
---|
0:22:49 | better results |
---|
0:22:51 | when we |
---|
0:22:52 | use more go options |
---|
0:22:55 | so we present maybe hmm |
---|
0:22:59 | show |
---|
0:23:01 | that you can works similarly |
---|
0:23:05 | the presentation between mainly and moral |
---|
0:23:09 | we see that the we can make telephone there is station |
---|
0:23:14 | without any loss of |
---|
0:23:17 | performance when we use mainly and even better performance with less complexity |
---|
0:23:30 | we know that hmm is usually and not always use it's a standalone |
---|
0:23:37 | the recession system |
---|
0:23:38 | but also |
---|
0:23:40 | when we use |
---|
0:23:41 | big based their station we have re fine tuning |
---|
0:23:45 | at the end which done by |
---|
0:23:49 | hmm |
---|
0:23:50 | we know that the in i-vector based there is station |
---|
0:23:54 | like poetry course the front view between the phase one and phase two there is |
---|
0:23:58 | a and an hmm that make re-segmentation we can replace the more hmm by maybe |
---|
0:24:05 | a gmm |
---|
0:24:07 | maybe |
---|
0:24:08 | get some improvement |
---|
0:24:12 | in the systems |
---|
0:24:14 | so |
---|
0:24:15 | this is my |
---|
0:24:17 | last thing that want to say |
---|
0:24:20 | thank you |
---|
0:24:40 | no you once |
---|
0:24:42 | question that where |
---|
0:24:45 | so |
---|
0:24:46 | in speaker diarisation usually we use gmms right |
---|
0:24:54 | which is then that well and you are using an ergodic hmm |
---|
0:24:59 | so it can you can on the advantage of using and nobody approach |
---|
0:25:07 | compared to the in there is a station we use the |
---|
0:25:12 | note gmms but like in |
---|
0:25:16 | in these the system may be an hmm the use of a good because you |
---|
0:25:21 | mail we |
---|
0:25:22 | assume that we can move from |
---|
0:25:24 | each speaker to each speaker and then there go tick way |
---|
0:25:28 | and the question about that is the state distribution deal now |
---|
0:25:32 | the state distribution as far as an over |
---|
0:25:35 | gmms |
---|
0:25:36 | and he relished industry also with gmms but on the arcs instead on right and |
---|
0:25:42 | the states |
---|
0:25:43 | but they stay with gmm |
---|
0:25:47 | okay |
---|
0:25:48 | but we have you are not using the notion of you know if the universal |
---|
0:25:52 | background models no i this is then we don't use because |
---|
0:25:59 | we work we several |
---|
0:26:04 | companies the that |
---|
0:26:07 | we tried to have data for universal background model and they say that they have |
---|
0:26:12 | no data in the |
---|
0:26:14 | the channels are changing very much and this a maybe they can give us one |
---|
0:26:19 | o one half hour |
---|
0:26:20 | and out of data |
---|
0:26:22 | and then not sure that we can do a very good the ubm model we |
---|
0:26:27 | use the |
---|
0:26:28 | one or even two hours of data |
---|
0:26:32 | so we use stand the long model that do not rely on a some background |
---|
0:26:38 | model but |
---|
0:26:40 | you've there is |
---|
0:26:42 | and background model that we |
---|
0:26:44 | we have a data so we can use extended hmm like and then is or |
---|
0:26:48 | based i-vector system and just encapsulate |
---|
0:26:52 | the gmms a part of it |
---|
0:26:54 | it's not a problem the next paper is on broadcast data so it may have |
---|
0:27:00 | more detailed and you |
---|