0:00:15 | so i everyone the |
---|
0:00:17 | presentation is about |
---|
0:00:19 | speaker diarization |
---|
0:00:21 | and i would speak about |
---|
0:00:23 | ilp clustering |
---|
0:00:25 | what we introduce in the last addition of o t c |
---|
0:00:30 | because we add some improvements it was necessary |
---|
0:00:34 | i would speak about password a graph |
---|
0:00:36 | clustering |
---|
0:00:39 | so |
---|
0:00:41 | the in |
---|
0:00:43 | as a presentation will be first are we speak about the context in the ionisation |
---|
0:00:48 | architecture where you thing in your |
---|
0:00:51 | to show you where the ilp clustering is used |
---|
0:00:55 | and then i will show you what's wrong with this formulation the original one and |
---|
0:01:00 | then i would show you the graph |
---|
0:01:03 | clustering |
---|
0:01:05 | so |
---|
0:01:06 | the context is the same challenge as every spoke |
---|
0:01:11 | the hotel challenge so |
---|
0:01:13 | the goal was to i one was are you with that but the goal was |
---|
0:01:16 | to detect at any time in the during the video the |
---|
0:01:20 | who is speaking and who is busy but on the screen and we cited |
---|
0:01:25 | and |
---|
0:01:26 | speaker diarization was just one of the sub task of the challenge |
---|
0:01:31 | so do in this paper and this the presentation and present result |
---|
0:01:37 | on the generally to seven search in this corpus it to put the duration of |
---|
0:01:42 | forty in our roles there is twenty eight tv shows recorded from french t v |
---|
0:01:48 | channels |
---|
0:01:49 | so it's broadcast news |
---|
0:01:52 | video broadcast news |
---|
0:01:53 | and what social while balanced between prepared and spontaneous speech |
---|
0:01:59 | so that it actual we used in the room |
---|
0:02:03 | it's the two-stage architectures there is a first |
---|
0:02:07 | segmentation part in clustering |
---|
0:02:10 | which give us the first segmentation so |
---|
0:02:14 | there is a |
---|
0:02:15 | secretary on segmentation followed by a clustering a viterbi re-segmentation |
---|
0:02:21 | and then we detect the speech nonspeech areas engenders |
---|
0:02:25 | so the first segmentation files |
---|
0:02:29 | each cluster |
---|
0:02:31 | contains the voice of only one speaker |
---|
0:02:34 | but several cluster can be |
---|
0:02:36 | related to a same speaker so we have two |
---|
0:02:40 | do another clustering |
---|
0:02:42 | that's where we used that's where we propose to use ilp clustering to replace the |
---|
0:02:48 | h a c |
---|
0:02:49 | a traditional clustering we used in speaker diarization |
---|
0:02:55 | so i will give you what about those two clustering because that we just give |
---|
0:02:59 | you some results in order to compare the |
---|
0:03:02 | if you can see in term of diarisation error rate |
---|
0:03:06 | so from the put |
---|
0:03:08 | big based segmentation |
---|
0:03:11 | we do here we can implement of clustering with a complete linkage we used the |
---|
0:03:16 | cross actually with ratio to estimate the similarities |
---|
0:03:20 | and the speaker cluster up |
---|
0:03:23 | modeled with |
---|
0:03:24 | but question mixture models so we used twelve |
---|
0:03:27 | and if she sees plus the energy we removed the channel contribution |
---|
0:03:33 | it was performed with a map adaptation on the two five six component ubm |
---|
0:03:39 | really basic colour |
---|
0:03:41 | clustering |
---|
0:03:43 | and on the other side e i d |
---|
0:03:46 | so the clustering is expressed as an ilp problem the speaker cluster are modeled with |
---|
0:03:51 | i-vectors of sixty dimensionality so not that much |
---|
0:03:57 | we use |
---|
0:03:58 | and i ching mfcc the energy the first and second order derivatives we use as |
---|
0:04:03 | where the one so than twenty four ubm |
---|
0:04:06 | i-vector that avoid links normalize |
---|
0:04:10 | the training data we used came from the ester one french broadcast news dataset it |
---|
0:04:16 | was |
---|
0:04:17 | a common evaluation campaign so this is desire sorry right you radio data |
---|
0:04:24 | and so we estimate the similarities between each i-vectors with a man database distance |
---|
0:04:31 | and so are we give you sorry the clustering we express it with the got |
---|
0:04:37 | it i linear programming |
---|
0:04:40 | sorry |
---|
0:04:41 | which consist in |
---|
0:04:43 | gently minimize the number of cluster |
---|
0:04:47 | so there and the dissipation between the cluster |
---|
0:04:52 | as a constraint are just |
---|
0:04:55 | what one point two is to which is to say that we used in area |
---|
0:04:58 | variable so if |
---|
0:05:00 | a cluster g is that assigned to a center k |
---|
0:05:04 | it would be equal to one |
---|
0:05:07 | question one the tree is that to be to say that the clusters you have |
---|
0:05:12 | to be assigned to a single center k |
---|
0:05:16 | and then once the performance for the twenty sure that |
---|
0:05:20 | the center k selected if |
---|
0:05:22 | a cluster g is assigned to it |
---|
0:05:24 | and the last one is distance so the distance between two clusters ascended g a |
---|
0:05:31 | cluster gmm sent okay i've to be shorter |
---|
0:05:34 | one |
---|
0:05:36 | but special |
---|
0:05:38 | and about the comparison of some results so i don't we cannot compare its because |
---|
0:05:44 | it's not the same |
---|
0:05:46 | acknowledges and mode a decision |
---|
0:05:48 | but what we have we see agency gmm we obtain the sixteen that twenty two |
---|
0:05:54 | diarization error rate |
---|
0:05:56 | and |
---|
0:05:57 | we went down to fourteen that seven with the ilp clustering |
---|
0:06:02 | to this was done on the data are presented first |
---|
0:06:06 | so what's wrong in the site be formulation actually nothing is wrong it just |
---|
0:06:12 | that |
---|
0:06:13 | we have to use an external solver two |
---|
0:06:18 | to obtain all clustering |
---|
0:06:20 | which uses |
---|
0:06:22 | mostly up most of them use the branch and bound algorithm which is general algorithm |
---|
0:06:27 | to determine what optimal solution of discrete programs |
---|
0:06:32 | and it's not depending on the added error |
---|
0:06:35 | i mean the complexities not |
---|
0:06:37 | but good |
---|
0:06:38 | it may result |
---|
0:06:40 | in a systematic enumeration of all the possible solution we are |
---|
0:06:44 | you know the to give you the optimal solution |
---|
0:06:46 | and so big problems made it to unreasonable processing duration |
---|
0:06:53 | so we have two |
---|
0:06:54 | in order to decrease the complexity of the solving we have two |
---|
0:07:00 | minimize the path the algorithm have to explore so to do that with the i |
---|
0:07:04 | p |
---|
0:07:05 | it means we have to reduce the number of binary variables and constraints which are |
---|
0:07:13 | defined in the problem to be solved |
---|
0:07:16 | and because the distance between clusters i-vectors are computed |
---|
0:07:21 | before two |
---|
0:07:23 | define the ilp problem itself |
---|
0:07:26 | we already know which |
---|
0:07:29 | pair of i-vectors of cluster can be used because of the distance |
---|
0:07:34 | we already knows that |
---|
0:07:37 | the distance between |
---|
0:07:39 | each i-vectors i mean |
---|
0:07:41 | so |
---|
0:07:42 | you less two |
---|
0:07:44 | to construct the big ilp clustering |
---|
0:07:47 | big at problem |
---|
0:07:48 | with all the variables |
---|
0:07:51 | while we can just uses the interesting one |
---|
0:07:56 | so we formulate the clustering by |
---|
0:08:02 | what |
---|
0:08:03 | we use a subset of the |
---|
0:08:06 | or |
---|
0:08:08 | set of clusters |
---|
0:08:10 | which correspond to the for each |
---|
0:08:13 | cluster g |
---|
0:08:15 | it correspond to all the possible values |
---|
0:08:19 | of k for which the distance are shorter than the threshold which is a very |
---|
0:08:24 | tended to mine |
---|
0:08:26 | so well we don't need anymore that cost rent |
---|
0:08:29 | and |
---|
0:08:31 | so the problem |
---|
0:08:34 | lit to a reduction of in terms of number of been area variables and constraints |
---|
0:08:40 | so i took the |
---|
0:08:43 | we counted |
---|
0:08:45 | and the i p five which are submitted to the solver |
---|
0:08:51 | the number of binary variables and constraints and then i present for each show of |
---|
0:08:55 | the corpus and would i presented only the |
---|
0:08:58 | the statistics |
---|
0:09:00 | so the average in average will reduce from one thousand seven to fifty three cost |
---|
0:09:06 | variables |
---|
0:09:08 | and the number of constraints have been reduced from three thousand four |
---|
0:09:14 | two fifty tree as weighted so |
---|
0:09:17 | the diarization error rate didn't |
---|
0:09:19 | change it's |
---|
0:09:20 | it just a re formulation of the problem in order to decrease the complexity of |
---|
0:09:26 | the sorting process |
---|
0:09:30 | and so |
---|
0:09:32 | because we reduced a lot is the number of variables and |
---|
0:09:36 | and the constraint |
---|
0:09:38 | we can to think about |
---|
0:09:40 | us graph speaker clustering so that the representation of |
---|
0:09:46 | so when using metrics distance which associate the distance between each cluster |
---|
0:09:52 | it can be interpreted as a connected graph so the clusters are represented by the |
---|
0:09:57 | note and the distance by the ages |
---|
0:10:00 | and second easy representation of the original ilp formulation which is complex |
---|
0:10:07 | with all the |
---|
0:10:08 | distance |
---|
0:10:11 | and i |
---|
0:10:13 | so |
---|
0:10:13 | we can |
---|
0:10:14 | if we decompose that graph into |
---|
0:10:18 | connected component |
---|
0:10:20 | by removing the edges which are long as a threshold delta |
---|
0:10:25 | we obtain several connected component which can which constitute independent subproblems so we can process |
---|
0:10:33 | those components separately |
---|
0:10:36 | instead of doing a big clustering we just |
---|
0:10:39 | therefore some |
---|
0:10:42 | small clustering which are much more three details |
---|
0:10:44 | and as you can see there is some |
---|
0:10:48 | cluster we don't have to be processed |
---|
0:10:50 | because the solution is abuse |
---|
0:10:52 | even that one |
---|
0:10:59 | so |
---|
0:11:00 | instead of |
---|
0:11:02 | doing an ilp clustering |
---|
0:11:04 | or whatever the clustering is but we use i give it a jesse's find as |
---|
0:11:08 | well |
---|
0:11:12 | we actually |
---|
0:11:14 | look for the abuse centers which can be formulated as the search for star graph |
---|
0:11:22 | components so star graph it just the kind of trees |
---|
0:11:27 | three sorry which is composed of one central node then |
---|
0:11:31 | many a set the number of live |
---|
0:11:34 | just the one |
---|
0:11:35 | that level |
---|
0:11:38 | it's real easy to find |
---|
0:11:40 | so i mean it's fourteen and |
---|
0:11:42 | so there is |
---|
0:11:43 | obvious solution all of those don't have to be process it with clustering algorithm |
---|
0:11:52 | but there are some more complex sub components like that one |
---|
0:11:56 | or we still need to two |
---|
0:12:00 | to use a clustering algorithm in order to have the optimal solution |
---|
0:12:06 | so we did it with the i p of course compared |
---|
0:12:11 | as a result of the previous |
---|
0:12:15 | slide i mean the with a reduction of the number of a but it cost |
---|
0:12:18 | trends |
---|
0:12:19 | and |
---|
0:12:20 | on the right is the one with |
---|
0:12:23 | star graph a connected component search on which the ilp clustering is used only to |
---|
0:12:29 | process the complex |
---|
0:12:31 | sub components |
---|
0:12:33 | so it is reduced to fifty three toward most seven in average and |
---|
0:12:39 | the minimum is zero it means that some of the shows |
---|
0:12:42 | didn't presents it at |
---|
0:12:45 | complex sub components so |
---|
0:12:48 | on these that |
---|
0:12:50 | only by finding the start subgraph we with all so e |
---|
0:12:55 | clustering problem |
---|
0:12:58 | and so we were questioning about the interest of the clustering method to process the |
---|
0:13:06 | complex |
---|
0:13:08 | components |
---|
0:13:09 | because on the eight |
---|
0:13:11 | of the eight twenty eight shows which compose the corpus |
---|
0:13:15 | web present t souls complex connected components |
---|
0:13:19 | so we tried to do it without any clustering process |
---|
0:13:24 | so that was two strategies and low clustering where |
---|
0:13:29 | nothing is done with the complex component which just say okay we have a complex |
---|
0:13:33 | subcomponents just let it like that and the others the what single cluster strategy is |
---|
0:13:39 | the opposite we merge all |
---|
0:13:41 | of the cup of the look sorry all the cluster of a complex component into |
---|
0:13:46 | a single cluster |
---|
0:13:49 | an |
---|
0:13:50 | it appears that |
---|
0:13:52 | well so no clustering strategy when the thing is done is a don't present interesting |
---|
0:13:57 | result but |
---|
0:13:59 | if we look each |
---|
0:14:00 | on the ad the |
---|
0:14:03 | z are also good results the best result we have for each threshold |
---|
0:14:09 | star graph |
---|
0:14:10 | research |
---|
0:14:11 | by and minutes of merging of the all the cluster the complex component give better |
---|
0:14:15 | results |
---|
0:14:17 | land |
---|
0:14:18 | the one with an ilp clustering because of this ratio |
---|
0:14:21 | but we still better to use |
---|
0:14:24 | a clustering method to have the really on optimal values because of the processing of |
---|
0:14:30 | the complex sub components |
---|
0:14:33 | but what we can say is |
---|
0:14:38 | where i don't i should have i but the diarisation on the rights we add |
---|
0:14:43 | with the agency approach using gmms we at sixteen that twenty two percent so it's |
---|
0:14:52 | z a star graph approach with and a clustering algorithm to process the complex sub |
---|
0:14:58 | components give better diarization error rate |
---|
0:15:01 | so it's almost all look clustering process |
---|
0:15:05 | at |
---|
0:15:07 | so |
---|
0:15:07 | that's a conclusion so we |
---|
0:15:10 | we formulate the ip in order to reduce the complexity of the serving processing |
---|
0:15:15 | the reason no interference and diarization error rate |
---|
0:15:18 | and then we expose the clustering as a graph exploration which can which'll |
---|
0:15:24 | the system to split |
---|
0:15:27 | the clustering problem into several independent subproblems and can be used to search for star |
---|
0:15:32 | graph connected component |
---|
0:15:35 | the star graph collect the star graph up rush euros |
---|
0:15:41 | solve almost the entire problem but it's to professor able to use |
---|
0:15:47 | and clustering algorithm in order to process the complex sub components |
---|
0:15:54 | some clustering algorithm have already been studied |
---|
0:15:58 | to do that |
---|
0:15:59 | graph with a graph approach women |
---|
0:16:01 | but we find that id give better result than the agency approach which was the |
---|
0:16:07 | conclusion of the odours |
---|
0:16:10 | and we have some |
---|
0:16:14 | so |
---|
0:16:18 | i performed an experiment on the |
---|
0:16:20 | when large corpora it's not to read is that large but one hundred dolls so |
---|
0:16:25 | i to the segmentation five from the be at clustering about several and then i |
---|
0:16:31 | i do would be a big clustering ilp clustering on that |
---|
0:16:35 | so it represent a clustering with something like a bit more than four thousand a |
---|
0:16:40 | speaker cluster |
---|
0:16:41 | and i compared to duration of the i t so the original one from two |
---|
0:16:46 | years |
---|
0:16:47 | two hours to be to be done as a re formulation to con the i'd |
---|
0:16:52 | units |
---|
0:16:53 | and the graph approach to |
---|
0:16:55 | only five |
---|
0:16:56 | so |
---|
0:16:57 | this is clustering included the time and required to compute the distance between each clusters |
---|
0:17:04 | as the definition of the problem and the solving |
---|
0:17:08 | well i think most of the time they're dispense to estimate similarities between clusters |
---|
0:17:18 | what |
---|
0:17:19 | that |
---|
0:17:20 | would be my last night |
---|
0:17:23 | section |
---|
0:17:37 | that's i have to remarks first it's quite |
---|
0:17:42 | normal to conclude that eurostar algorithm is able to |
---|
0:17:49 | a graph according to say about a sort of by itself you clustering problem because |
---|
0:17:53 | you achy call initial allegory is the graph clustering of going |
---|
0:17:58 | so |
---|
0:17:59 | it's just a different version it could be inter would be interesting to compile a |
---|
0:18:04 | in term of we have simulation in terms of refs a rough jewelry europe which |
---|
0:18:11 | is directly |
---|
0:18:13 | the second point are remark i'm we could be disappointed after two euros with the |
---|
0:18:17 | ilp to see that |
---|
0:18:20 | various t-norm really improvement in them of all using ilp |
---|
0:18:27 | because you have less |
---|
0:18:29 | you have more or you are not taking only decision like in your article a |
---|
0:18:34 | clustering so we could expect |
---|
0:18:36 | to have also improvement performance |
---|
0:18:39 | us can i agree with you and well the ilp is not the solution of |
---|
0:18:43 | the clustering when |
---|
0:18:45 | we use it |
---|
0:18:47 | to perform clustering on the big beta and it's |
---|
0:18:52 | almost because what i want to say is |
---|
0:18:56 | processing duration is really |
---|
0:18:59 | interesting compared to the edges e one |
---|
0:19:03 | well i think it will still fail with a huge amount so that the i |
---|
0:19:09 | mean sows on of house i never tried but i think that would be some |
---|
0:19:13 | so e |
---|
0:19:14 | in |
---|
0:19:16 | h i c i think will be |
---|
0:19:20 | we can do the job but it will take time |
---|
0:19:25 | but we did that the improvement from eight to what the number of constraints and |
---|
0:19:30 | viable is really mean nothing but we have |
---|
0:19:33 | to add it's because is |
---|
0:19:36 | was |
---|
0:19:38 | essential i mean to process |
---|
0:19:41 | data |
---|
0:19:55 | so and i wasn't is then you look "'cause" |
---|
0:19:58 | i was just a big static the channels and |
---|
0:20:01 | and i wanted to |
---|
0:20:03 | to try to apply but the fact that is the nine hundred this that need |
---|
0:20:08 | something data to |
---|
0:20:10 | compute the covariance matrix |
---|
0:20:12 | sorry could be i mean |
---|
0:20:14 | i what dialects but that the channels |
---|
0:20:18 | but the i-vector challenge we don't have the training data which is the not the |
---|
0:20:22 | case for the to compute them hunters this |
---|
0:20:26 | in the not just as when we actually |
---|
0:20:31 | i haven't as a slight but that's not published result but we switch we're using |
---|
0:20:35 | now i-vectors of three hundred dimensionality |
---|
0:20:38 | and we stopped using man database we use the key idea scoring another to that |
---|
0:20:43 | was my mind compare that's |
---|
0:20:45 | much more |
---|
0:20:47 | we have better results and does not |
---|
0:20:49 | thanks |
---|
0:20:53 | thanks |
---|