Speech Transcript - Recent Improvements on ILP-based Clustering for Broadcast News Speaker Diarization

so i everyone the

presentation is about

speaker diarization

and i would speak about

ilp clustering

what we introduce in the last addition of o t c

because we add some improvements it was necessary

i would speak about password a graph

clustering

the in

as a presentation will be first are we speak about the context in the ionisation

architecture where you thing in your

to show you where the ilp clustering is used

and then i will show you what's wrong with this formulation the original one and

then i would show you the graph

clustering

the context is the same challenge as every spoke

the hotel challenge so

the goal was to i one was are you with that but the goal was

to detect at any time in the during the video the

who is speaking and who is busy but on the screen and we cited

and

speaker diarization was just one of the sub task of the challenge

so do in this paper and this the presentation and present result

on the generally to seven search in this corpus it to put the duration of

forty in our roles there is twenty eight tv shows recorded from french t v

channels

so it's broadcast news

video broadcast news

and what social while balanced between prepared and spontaneous speech

so that it actual we used in the room

it's the two-stage architectures there is a first

segmentation part in clustering

which give us the first segmentation so

there is a

secretary on segmentation followed by a clustering a viterbi re-segmentation

and then we detect the speech nonspeech areas engenders

so the first segmentation files

each cluster

contains the voice of only one speaker

but several cluster can be

related to a same speaker so we have two

do another clustering

that's where we used that's where we propose to use ilp clustering to replace the

h a c

a traditional clustering we used in speaker diarization

so i will give you what about those two clustering because that we just give

you some results in order to compare the

if you can see in term of diarisation error rate

so from the put

big based segmentation

we do here we can implement of clustering with a complete linkage we used the

cross actually with ratio to estimate the similarities

and the speaker cluster up

modeled with

but question mixture models so we used twelve

and if she sees plus the energy we removed the channel contribution

it was performed with a map adaptation on the two five six component ubm

really basic colour

clustering

and on the other side e i d

so the clustering is expressed as an ilp problem the speaker cluster are modeled with

i-vectors of sixty dimensionality so not that much

we use

and i ching mfcc the energy the first and second order derivatives we use as

where the one so than twenty four ubm

i-vector that avoid links normalize

the training data we used came from the ester one french broadcast news dataset it

was

a common evaluation campaign so this is desire sorry right you radio data

and so we estimate the similarities between each i-vectors with a man database distance

and so are we give you sorry the clustering we express it with the got

it i linear programming

sorry

which consist in

gently minimize the number of cluster

so there and the dissipation between the cluster

as a constraint are just

what one point two is to which is to say that we used in area

variable so if

a cluster g is that assigned to a center k

it would be equal to one

question one the tree is that to be to say that the clusters you have

to be assigned to a single center k

and then once the performance for the twenty sure that

the center k selected if

a cluster g is assigned to it

and the last one is distance so the distance between two clusters ascended g a

cluster gmm sent okay i've to be shorter

one

but special

and about the comparison of some results so i don't we cannot compare its because

it's not the same

acknowledges and mode a decision

but what we have we see agency gmm we obtain the sixteen that twenty two

diarization error rate

and

we went down to fourteen that seven with the ilp clustering

to this was done on the data are presented first

so what's wrong in the site be formulation actually nothing is wrong it just

that

we have to use an external solver two

to obtain all clustering

which uses

mostly up most of them use the branch and bound algorithm which is general algorithm

to determine what optimal solution of discrete programs

and it's not depending on the added error

i mean the complexities not

but good

it may result

in a systematic enumeration of all the possible solution we are

you know the to give you the optimal solution

and so big problems made it to unreasonable processing duration

so we have two

in order to decrease the complexity of the solving we have two

minimize the path the algorithm have to explore so to do that with the i

it means we have to reduce the number of binary variables and constraints which are

defined in the problem to be solved

and because the distance between clusters i-vectors are computed

before two

define the ilp problem itself

we already know which

pair of i-vectors of cluster can be used because of the distance

we already knows that

the distance between

each i-vectors i mean

you less two

to construct the big ilp clustering

big at problem

with all the variables

while we can just uses the interesting one

so we formulate the clustering by

what

we use a subset of the

set of clusters

which correspond to the for each

cluster g

it correspond to all the possible values

of k for which the distance are shorter than the threshold which is a very

tended to mine

so well we don't need anymore that cost rent

and

so the problem

lit to a reduction of in terms of number of been area variables and constraints

so i took the

we counted

and the i p five which are submitted to the solver

the number of binary variables and constraints and then i present for each show of

the corpus and would i presented only the

the statistics

so the average in average will reduce from one thousand seven to fifty three cost

variables

and the number of constraints have been reduced from three thousand four

two fifty tree as weighted so

the diarization error rate didn't

change it's

it just a re formulation of the problem in order to decrease the complexity of

the sorting process

and so

because we reduced a lot is the number of variables and

and the constraint

we can to think about

us graph speaker clustering so that the representation of

so when using metrics distance which associate the distance between each cluster

it can be interpreted as a connected graph so the clusters are represented by the

note and the distance by the ages

and second easy representation of the original ilp formulation which is complex

with all the

distance

and i

we can

if we decompose that graph into

connected component

by removing the edges which are long as a threshold delta

we obtain several connected component which can which constitute independent subproblems so we can process

those components separately

instead of doing a big clustering we just

therefore some

small clustering which are much more three details

and as you can see there is some

cluster we don't have to be processed

because the solution is abuse

even that one

instead of

doing an ilp clustering

or whatever the clustering is but we use i give it a jesse's find as

well

we actually

look for the abuse centers which can be formulated as the search for star graph

components so star graph it just the kind of trees

three sorry which is composed of one central node then

many a set the number of live

just the one

that level

it's real easy to find

so i mean it's fourteen and

so there is

obvious solution all of those don't have to be process it with clustering algorithm

but there are some more complex sub components like that one

or we still need to two

to use a clustering algorithm in order to have the optimal solution

so we did it with the i p of course compared

as a result of the previous

slide i mean the with a reduction of the number of a but it cost

trends

and

on the right is the one with

star graph a connected component search on which the ilp clustering is used only to

process the complex

sub components

so it is reduced to fifty three toward most seven in average and

the minimum is zero it means that some of the shows

didn't presents it at

complex sub components so

on these that

only by finding the start subgraph we with all so e

clustering problem

and so we were questioning about the interest of the clustering method to process the

complex

components

because on the eight

of the eight twenty eight shows which compose the corpus

web present t souls complex connected components

so we tried to do it without any clustering process

so that was two strategies and low clustering where

nothing is done with the complex component which just say okay we have a complex

subcomponents just let it like that and the others the what single cluster strategy is

the opposite we merge all

of the cup of the look sorry all the cluster of a complex component into

a single cluster

it appears that

well so no clustering strategy when the thing is done is a don't present interesting

result but

if we look each

on the ad the

z are also good results the best result we have for each threshold

star graph

research

by and minutes of merging of the all the cluster the complex component give better

results

land

the one with an ilp clustering because of this ratio

but we still better to use

a clustering method to have the really on optimal values because of the processing of

the complex sub components

but what we can say is

where i don't i should have i but the diarisation on the rights we add

with the agency approach using gmms we at sixteen that twenty two percent so it's

z a star graph approach with and a clustering algorithm to process the complex sub

components give better diarization error rate

so it's almost all look clustering process

that's a conclusion so we

we formulate the ip in order to reduce the complexity of the serving processing

the reason no interference and diarization error rate

and then we expose the clustering as a graph exploration which can which'll

the system to split

the clustering problem into several independent subproblems and can be used to search for star

graph connected component

the star graph collect the star graph up rush euros

solve almost the entire problem but it's to professor able to use

and clustering algorithm in order to process the complex sub components

some clustering algorithm have already been studied

to do that

graph with a graph approach women

but we find that id give better result than the agency approach which was the

conclusion of the odours

and we have some

i performed an experiment on the

when large corpora it's not to read is that large but one hundred dolls so

i to the segmentation five from the be at clustering about several and then i

i do would be a big clustering ilp clustering on that

so it represent a clustering with something like a bit more than four thousand a

speaker cluster

and i compared to duration of the i t so the original one from two

years

two hours to be to be done as a re formulation to con the i'd

units

and the graph approach to

only five

this is clustering included the time and required to compute the distance between each clusters

as the definition of the problem and the solving

well i think most of the time they're dispense to estimate similarities between clusters

what

that

would be my last night

section

that's i have to remarks first it's quite

normal to conclude that eurostar algorithm is able to

a graph according to say about a sort of by itself you clustering problem because

you achy call initial allegory is the graph clustering of going

it's just a different version it could be inter would be interesting to compile a

in term of we have simulation in terms of refs a rough jewelry europe which

is directly

the second point are remark i'm we could be disappointed after two euros with the

ilp to see that

various t-norm really improvement in them of all using ilp

because you have less

you have more or you are not taking only decision like in your article a

clustering so we could expect

to have also improvement performance

us can i agree with you and well the ilp is not the solution of

the clustering when

we use it

to perform clustering on the big beta and it's

almost because what i want to say is

processing duration is really

interesting compared to the edges e one

well i think it will still fail with a huge amount so that the i

mean sows on of house i never tried but i think that would be some

so e

h i c i think will be

we can do the job but it will take time

but we did that the improvement from eight to what the number of constraints and

viable is really mean nothing but we have

to add it's because is

was

essential i mean to process

data

so and i wasn't is then you look "'cause"

i was just a big static the channels and

and i wanted to

to try to apply but the fact that is the nine hundred this that need

something data to

compute the covariance matrix

sorry could be i mean

i what dialects but that the channels

but the i-vector challenge we don't have the training data which is the not the

case for the to compute them hunters this

in the not just as when we actually

i haven't as a slight but that's not published result but we switch we're using

now i-vectors of three hundred dimensionality

and we stopped using man database we use the key idea scoring another to that

was my mind compare that's

much more

we have better results and does not

thanks

Recent Improvements on ILP-based Clustering for Broadcast News Speaker Diarization

Speaker Diarization

Grégor Dupuy, Sylvain Meignier, Paul Deléglise and Yannick Estève