hello would have "'em" everybody in these presentation and we show you some of my
work in speaker clustering
but before starting i would like to define two things the first one is the
speaker clustering problem that we want to scroll we have another database in which i
would be awesome belong to unknown speaker and also we have are known number of
a speaker
and the second one is we will talk about audio database characteristic in this presentation
when we refer to this term we think is in things such as the number
of audio or how many of yours we happening
each speaker higher
so
first of all i would percent you the outline of the presentation
we will start with the motivation
later we i present you the clustering algorithm that we are we have been using
later a we will see the them
the right of also that we have studied and we will conclude
with some experiment a starting the stopping criteria
so
if we talk about the what the question why we suppose that a we a
receiving number of these one client that is interesting
it getting a clustering based solution
and one common question that we have to deal with is okay
how is your system working
and for that purpose a we will ask them to give a and how the
database a similar as possible
to that one that will be used
later in the in the system and with that database we will make something we
will be able to say okay we expect
to have similar results as this one but
based on hours again we've seen that a clustering task
my of that
very different results depending on the of the database so we also my sake be
careful because if the distribution of our viewers and speaker in the database is different
from what we have now
you may have
very different results
and then
of course based on how can we expect
those disorders to change
and one so that's what you would need to and on several experiment and someone
else experiment one to nine percent think you're
okay so now we know what we want to do first of all i we
present the clustering algorithm that we are using
we can see that and are domain i think it i've got clustering about these
a clustering algorithm that are that stuck in a partition in which each audio is
identified with one single cluster and it editing really we match the close to a
cluster
two completely fine i will algorithm we will have to a fixed three scenes the
first one is the distance metric and for this purpose we will can see that
a the scores provided by the lda system so
before running the clustering algorithm
we compute all the buttons all scores for the abolition database and we will use
both the score to be the similarity matrix
we also saw and need to define a linkage method and we will use minimum
distance
and also what we have six
but stopping criterion and we can see that a score based initial particularly
a maximum distance scores about these were to cluster made is the this time is
right about certain threshold we will start
and weather wise we will continue a
messing cluster
regarding the performance measures we are when i use a we will use a those
defined by david but only when one of his work that a lot of one
are the speaker but the and the clustering purity speaker the matter how to speak
at the house but in the speaker a
overall the clustered
white cluster impurity measure of how corrupt cluster are and when we say that one
cluster gypsy score but i we refer to the fact that you
has audio from many different the speaker
if we compute
a those of i levels at each iteration of the big clustering process
and we blocks
the always point in graph
we will get impunity three of course that are going as the one but a
we have here in this slide
we will use these graphs
to make sure that performance of our way the clustering experiments a using the
the whole the presentation
and for as a reference
point will be that you went but working point that these when we have
the same is speaking ability of the clustering purity
before we start with the presentation a i was and you the database that we
have used we can see that
and i leo's from these that are
telephone channel
and with a three hundred segment duration and here in a graph you can see
the are we just put a speaker distributions that we have in this database
okay
use our policies
to conduct a times an hour ago database was first meet a to define some
variables that if an art in this part so we can see that don't then
the first one size of the task
but these the number of audio we have been database
the second one number of a speaker that is the number of a speaker that
we haven't database and the balance of a speaker that meshes
and how many how close it just be good a house
show
and regarding the first well what we will perform different experiments in which
we might i the size of the task
it was started from the initial set of audio and we will study
i into
that's what is more the side
so for example a we have as you can see in the table six
and subset of side a three subsets results and
for those task in which
it we have more than clustering task we will the weather or the
one of the resource l with one single car
we can better results between different size of the task
here we have a meeting place of course not they
what extent that actually have we have clustering purity and in the medical axes we
have speaker impurity
and as we can see as we introduce
the size of the task we expect to have better results in our clustering problem
the second part of what we have i think use if the number for speaker
and to characterize this experiment
we will use
the value out that is defined as the number of a speaker divided by the
number of our with your
we can also have another interpretation of these available
but it allows us to know that
iteration in which we should stop since we want to stop when we have as
many clusters
as the speakers
we can see that several groups of clustering that's we will win of a time
the number of speakers and all the task
and have the same number of yours and given a task of a concrete number
of a speaker
a we will have a same number of our guest better speaker
so as you can see in the table four component we will have task with
a five a speaker size hundred and twenty hours per speaker
and
here we have the universal bases
and that it's a little bit different from what we have seen the previews experiment
but again we will exactly the same information on the a forty some
axes
and we have are weighted by table that the we have time but this i
and the vertical axis we have the speaker evaluation
and each
of the lines represents all standpoint of clustering purity a valid
so for example if we want to start with a
the results they're suppose we would like to get
in our experiments are clustering purity of one percent that is the score
and we want to compare themselves
but using o point five a and one eight and we see that
with
point five we need high spirits high fighters getting ability value
this means that
if our a optimal solution
it is found is found in the middle of the clustering the risk we will
the spectral sub network resource
then that's about of all we have studied use it to balance of a speaker
in the but also for speaker would try to study the manual they one we
are percent in a slight that these
we have one to speak at it that fast most of the owners in the
database and we have
all the number of a speaker about how much less our reviewers
a we also need to fix
but these the number of speakers are divided by the number to follow and in
our task we will can see that always a the size of the that six
to forty so it's of a where
giving are it's equal to
given the numbers or
of the speakers
here we have
for scenario in which we might i
they a presentation of a clear that the remainder speaker
that's we start
from a with this one which
the main or speaker task
more or less the same number of years that or something until these one in
which
we the main speaker cost much more out of your than the other where
if we
again
take a look at the results that this is a getting us
empirically the rate of call
we see that
this leads to a system and the sense similar results and as we increase the
presentation of i'll give that the range you get how
we
get better results
so
we can conclude that if the main speaker
task you know audio to make the different with different the rest of the via
speaker we will expect with a better clustered into shows
okay
it still for a what if you remember a when i present the clustering algorithm
i talk about the stopping criteria but it
so far a the computation cost of a threshold value
it has been avoided
in this section a we will study it to a different methods
and arseholes method requires a set of labeled a are we get database
two one we would better for a the experiments instead and then also a mismatch
between the training
and the testing set
so
the first one that we have call maximum this time with a baseball
we will use
the label our database to run a clustering process and
as we know
how many speakers do we have will be able to stop at the point in
which the number of speakers is equal to the number of clusters
if we
it saves that the distance or vast last iteration we will be able to use
later
a substantial value and that's initial value is they want that it's used for placement
for
the second method that these called maximum distance with unsupervised score calibration what we do
is instead of a leaving the clustering algorithm
and they distance metric but time we can be from the ap lda system
we will make a calibration process over the voucher scored and
that's a made use of credit with this point is the one that will be
used later in a clustering algorithm
a as this process calibrating we will be able to choose the threshold value that
we want depending on
how many a errors
we moved to let our clustering algorithm to make
i'm thinking that if you let
a few errors you will stop at very a high speaker the greedy values and
we will not get the correct number four
or for speaker
and we can see that
and for the group of clustering task
the first one but using a in which we will use similar training and testing
set and all the three groups in which we will have different a i'll just
better speaker distribution in the training and that there's things that
as here we are going in the rest i in stopping what we have a
as many speakers just clustering
we will define a way to perform a measure as the difference between the number
of speakers and the number of clusters
related to a the number of speakers
so here we have the obtain it results eh
we see here it may but the girl axis the their valuable exactly the this
one but i just define
and here we have
in blue
a difference of dining with the maximum distance with protocol
and on that a solution well funded by the a calibrated a scores
and
we see that a
the second method performs similar source no matter
a the that's a mismatch between
training and testing set and
we
the first method may only be used
when we have
see me that a databases
in the training and testing
so it to conclude with my presentation
i would like to say to think that these
we see that speaker clustering used
strongly affect by the characteristics of our are we get a calibration
and also a we can use these completion to anticipate
a possible to change but also to find possible solution in the future for example
we see
that it if we have operating at
are we dataset
we will get
much one assaults that use the at the database is more so
we will propose to split that our database into a is more than one and
use those smaller set to run a clustering that aims at
i as
those clustering task we
i have better visual that the rules that the big one
we will finally have
better results in
you know what clustering problem
and
the supply the need for questions so
i question so it's so probably
so they you mentioned you have stuck that someone clusters that are useful participate in
the accuracy of the best in a scenario
but it's based on the system i mean how dependent distributions on the system do
you use
i is at the unit is possible you know that
or
well i would say you know
it is used a quite spatially
i believe that you know when you make
one decision
a at the beginning of the clustering process
you that you will
take that into a home until the end of the process
so
i think i the reason behind and this conclusion is found in that's thing
for example a
a we can think why
we have
shown different results when and we have different size of the task
and used as the size of the task t speaker
errors that are made at the beginning of the clustering process
we started out or the of the clustering three
and
these
use
more harmful as
model iteration
we have so is are where the task is more than once we have there's
less there's iteration that will be less channel
and also for example
the task we in which we analyze
a the number of a speaker eh we see that
there was a result where a chain when we were at the middle of the
the clustering three
and
a and if the solution was found
in the beginning of the three or in the end of the three we got
a
better visual
that is also because
and
i again and at the beginning that a
less possible
partition
and
in the middle we have more but as
we cannot access all obtain because
because of the it possible decisions that we have previously made
the old
may not be available but that
that
a in doesn't happen that if we apply
we need a in just we have a more
possible option
that's because of course okay i
due to the bic clustering algorithm where using
so
i'd say a
yes i think a
clustering
i believe i affected by these by an the conclusion stuck
at a very influenced by a the algorithm you use
a
for example
a here and are not all there are so that all the experiments we have
make
but a
if we change the
but in case mix of and a we used for example
average
score
we show that the evidence
a you to the finals of the big of a see what we have
a better results when there is a means because if we use
average the score instead of matching score a all the results that we obtain whether
this were similar so
that was an example that if we change the clustering algorithm we may have a
different
some most of the completion suspect all the rebuttal for fundamental this element definitely a
clustering is our method for testing the particular scoring your you see inside what so
what is your inside of the limits once
so what would you say that affects the most of the to these conclusions
a high i think it's a quite affected by
by the
like the clustering algorithm within your
thanks
sorry
i four u s i isn't
no way stance
one work was the it's able the database that used to you mentioned that using
only "'cause" the t a three hundred seconds of the
of
okay there is the duration variability inside and so on
did you study the effect of this duration on the
all the conclusion that you would
yes i think we also need any some experiments which a we tested different
different iteration
and hey the data results channel deconvolution
and it keeps similar but a we have
hi
after some we that higher a clustering purity levels
all of our weighting
experiment
as we got higher the difference between a different databases used not show something