so i everyone the
presentation is about
speaker diarization
and i would speak about
ilp clustering
what we introduce in the last addition of o t c
because we add some improvements it was necessary
i would speak about password a graph
clustering
so
the in
as a presentation will be first are we speak about the context in the ionisation
architecture where you thing in your
to show you where the ilp clustering is used
and then i will show you what's wrong with this formulation the original one and
then i would show you the graph
clustering
so
the context is the same challenge as every spoke
the hotel challenge so
the goal was to i one was are you with that but the goal was
to detect at any time in the during the video the
who is speaking and who is busy but on the screen and we cited
and
speaker diarization was just one of the sub task of the challenge
so do in this paper and this the presentation and present result
on the generally to seven search in this corpus it to put the duration of
forty in our roles there is twenty eight tv shows recorded from french t v
channels
so it's broadcast news
video broadcast news
and what social while balanced between prepared and spontaneous speech
so that it actual we used in the room
it's the two-stage architectures there is a first
segmentation part in clustering
which give us the first segmentation so
there is a
secretary on segmentation followed by a clustering a viterbi re-segmentation
and then we detect the speech nonspeech areas engenders
so the first segmentation files
each cluster
contains the voice of only one speaker
but several cluster can be
related to a same speaker so we have two
do another clustering
that's where we used that's where we propose to use ilp clustering to replace the
h a c
a traditional clustering we used in speaker diarization
so i will give you what about those two clustering because that we just give
you some results in order to compare the
if you can see in term of diarisation error rate
so from the put
big based segmentation
we do here we can implement of clustering with a complete linkage we used the
cross actually with ratio to estimate the similarities
and the speaker cluster up
modeled with
but question mixture models so we used twelve
and if she sees plus the energy we removed the channel contribution
it was performed with a map adaptation on the two five six component ubm
really basic colour
clustering
and on the other side e i d
so the clustering is expressed as an ilp problem the speaker cluster are modeled with
i-vectors of sixty dimensionality so not that much
we use
and i ching mfcc the energy the first and second order derivatives we use as
where the one so than twenty four ubm
i-vector that avoid links normalize
the training data we used came from the ester one french broadcast news dataset it
was
a common evaluation campaign so this is desire sorry right you radio data
and so we estimate the similarities between each i-vectors with a man database distance
and so are we give you sorry the clustering we express it with the got
it i linear programming
sorry
which consist in
gently minimize the number of cluster
so there and the dissipation between the cluster
as a constraint are just
what one point two is to which is to say that we used in area
variable so if
a cluster g is that assigned to a center k
it would be equal to one
question one the tree is that to be to say that the clusters you have
to be assigned to a single center k
and then once the performance for the twenty sure that
the center k selected if
a cluster g is assigned to it
and the last one is distance so the distance between two clusters ascended g a
cluster gmm sent okay i've to be shorter
one
but special
and about the comparison of some results so i don't we cannot compare its because
it's not the same
acknowledges and mode a decision
but what we have we see agency gmm we obtain the sixteen that twenty two
diarization error rate
and
we went down to fourteen that seven with the ilp clustering
to this was done on the data are presented first
so what's wrong in the site be formulation actually nothing is wrong it just
that
we have to use an external solver two
to obtain all clustering
which uses
mostly up most of them use the branch and bound algorithm which is general algorithm
to determine what optimal solution of discrete programs
and it's not depending on the added error
i mean the complexities not
but good
it may result
in a systematic enumeration of all the possible solution we are
you know the to give you the optimal solution
and so big problems made it to unreasonable processing duration
so we have two
in order to decrease the complexity of the solving we have two
minimize the path the algorithm have to explore so to do that with the i
p
it means we have to reduce the number of binary variables and constraints which are
defined in the problem to be solved
and because the distance between clusters i-vectors are computed
before two
define the ilp problem itself
we already know which
pair of i-vectors of cluster can be used because of the distance
we already knows that
the distance between
each i-vectors i mean
so
you less two
to construct the big ilp clustering
big at problem
with all the variables
while we can just uses the interesting one
so we formulate the clustering by
what
we use a subset of the
or
set of clusters
which correspond to the for each
cluster g
it correspond to all the possible values
of k for which the distance are shorter than the threshold which is a very
tended to mine
so well we don't need anymore that cost rent
and
so the problem
lit to a reduction of in terms of number of been area variables and constraints
so i took the
we counted
and the i p five which are submitted to the solver
the number of binary variables and constraints and then i present for each show of
the corpus and would i presented only the
the statistics
so the average in average will reduce from one thousand seven to fifty three cost
variables
and the number of constraints have been reduced from three thousand four
two fifty tree as weighted so
the diarization error rate didn't
change it's
it just a re formulation of the problem in order to decrease the complexity of
the sorting process
and so
because we reduced a lot is the number of variables and
and the constraint
we can to think about
us graph speaker clustering so that the representation of
so when using metrics distance which associate the distance between each cluster
it can be interpreted as a connected graph so the clusters are represented by the
note and the distance by the ages
and second easy representation of the original ilp formulation which is complex
with all the
distance
and i
so
we can
if we decompose that graph into
connected component
by removing the edges which are long as a threshold delta
we obtain several connected component which can which constitute independent subproblems so we can process
those components separately
instead of doing a big clustering we just
therefore some
small clustering which are much more three details
and as you can see there is some
cluster we don't have to be processed
because the solution is abuse
even that one
so
instead of
doing an ilp clustering
or whatever the clustering is but we use i give it a jesse's find as
well
we actually
look for the abuse centers which can be formulated as the search for star graph
components so star graph it just the kind of trees
three sorry which is composed of one central node then
many a set the number of live
just the one
that level
it's real easy to find
so i mean it's fourteen and
so there is
obvious solution all of those don't have to be process it with clustering algorithm
but there are some more complex sub components like that one
or we still need to two
to use a clustering algorithm in order to have the optimal solution
so we did it with the i p of course compared
as a result of the previous
slide i mean the with a reduction of the number of a but it cost
trends
and
on the right is the one with
star graph a connected component search on which the ilp clustering is used only to
process the complex
sub components
so it is reduced to fifty three toward most seven in average and
the minimum is zero it means that some of the shows
didn't presents it at
complex sub components so
on these that
only by finding the start subgraph we with all so e
clustering problem
and so we were questioning about the interest of the clustering method to process the
complex
components
because on the eight
of the eight twenty eight shows which compose the corpus
web present t souls complex connected components
so we tried to do it without any clustering process
so that was two strategies and low clustering where
nothing is done with the complex component which just say okay we have a complex
subcomponents just let it like that and the others the what single cluster strategy is
the opposite we merge all
of the cup of the look sorry all the cluster of a complex component into
a single cluster
an
it appears that
well so no clustering strategy when the thing is done is a don't present interesting
result but
if we look each
on the ad the
z are also good results the best result we have for each threshold
star graph
research
by and minutes of merging of the all the cluster the complex component give better
results
land
the one with an ilp clustering because of this ratio
but we still better to use
a clustering method to have the really on optimal values because of the processing of
the complex sub components
but what we can say is
where i don't i should have i but the diarisation on the rights we add
with the agency approach using gmms we at sixteen that twenty two percent so it's
z a star graph approach with and a clustering algorithm to process the complex sub
components give better diarization error rate
so it's almost all look clustering process
at
so
that's a conclusion so we
we formulate the ip in order to reduce the complexity of the serving processing
the reason no interference and diarization error rate
and then we expose the clustering as a graph exploration which can which'll
the system to split
the clustering problem into several independent subproblems and can be used to search for star
graph connected component
the star graph collect the star graph up rush euros
solve almost the entire problem but it's to professor able to use
and clustering algorithm in order to process the complex sub components
some clustering algorithm have already been studied
to do that
graph with a graph approach women
but we find that id give better result than the agency approach which was the
conclusion of the odours
and we have some
so
i performed an experiment on the
when large corpora it's not to read is that large but one hundred dolls so
i to the segmentation five from the be at clustering about several and then i
i do would be a big clustering ilp clustering on that
so it represent a clustering with something like a bit more than four thousand a
speaker cluster
and i compared to duration of the i t so the original one from two
years
two hours to be to be done as a re formulation to con the i'd
units
and the graph approach to
only five
so
this is clustering included the time and required to compute the distance between each clusters
as the definition of the problem and the solving
well i think most of the time they're dispense to estimate similarities between clusters
what
that
would be my last night
section
that's i have to remarks first it's quite
normal to conclude that eurostar algorithm is able to
a graph according to say about a sort of by itself you clustering problem because
you achy call initial allegory is the graph clustering of going
so
it's just a different version it could be inter would be interesting to compile a
in term of we have simulation in terms of refs a rough jewelry europe which
is directly
the second point are remark i'm we could be disappointed after two euros with the
ilp to see that
various t-norm really improvement in them of all using ilp
because you have less
you have more or you are not taking only decision like in your article a
clustering so we could expect
to have also improvement performance
us can i agree with you and well the ilp is not the solution of
the clustering when
we use it
to perform clustering on the big beta and it's
almost because what i want to say is
processing duration is really
interesting compared to the edges e one
well i think it will still fail with a huge amount so that the i
mean sows on of house i never tried but i think that would be some
so e
in
h i c i think will be
we can do the job but it will take time
but we did that the improvement from eight to what the number of constraints and
viable is really mean nothing but we have
to add it's because is
was
essential i mean to process
data
so and i wasn't is then you look "'cause"
i was just a big static the channels and
and i wanted to
to try to apply but the fact that is the nine hundred this that need
something data to
compute the covariance matrix
sorry could be i mean
i what dialects but that the channels
but the i-vector challenge we don't have the training data which is the not the
case for the to compute them hunters this
in the not just as when we actually
i haven't as a slight but that's not published result but we switch we're using
now i-vectors of three hundred dimensionality
and we stopped using man database we use the key idea scoring another to that
was my mind compare that's
much more
we have better results and does not
thanks
thanks