i present the other words that we did the
our first speech to the i-vector challenge
and actually that is in just the slides it is some more that was not
presented in the paper
but the was submitted the a system description i think that this was to
we should with you with you guys
so
here's outline of my talk so first i will present the
of the progress of our system
and then i will a detailed to work to ideas that are the class training
and the score normalisation for comp losing computing the stock
so
so we it is
the time of the panel for the mindcf
for also for our system
so for starting from the baseline was they
min dcf of zero point three hundred the at six
we end up with the mean dcf of zero point two hundred the forty seven
which makes a
relative improvement of about thirty six percent
so i'm gonna present the this is the main a direct in the i graphical
manner so we have the development set and have the evaluation set that is
split into enrollment and test
and we have this these the three steps that was the in that baseline so
we have the whitening the nickel normalisation and the cosine scoring
and as we see that only whitening need the training
and do so we don't they really need the at the label of
of the development set for that
so static from this the baseline us something we get that can be done is
if we can better choose the of the data the data for the whitening
i mean if we take only that the
the you tenants with more than thirty five seconds no id experiments
we will are getting like
what it is some improvements with the mean dcf of zero point three hundred seventy
two
so after afterward i what i'm gonna use that this the a conditioned i-vectors so
i'm gonna use this deaf twenty about two and later experiments
like to systems
so
so all the next step that we did is the clustering
so
is a clustering so actually tried different kind of clustering and then i'm gonna come
back this is just later on but the one of the best clustering that you're
getting is that what you called the cosine be any clustering
and so actually
after this clustering we take only the
the clusters that have more than a to i-vectors in it
and we and we apply and now we can apply like
supervised based techniques like lda be at a double c and muppets
so here we just a this study at and clustering in the loop and you
can see that we can already get some improvements women dcf of zero point three
hundred three hundred fifty six
so what we tried next is
less to place the cosine scoring by about the kind of scrollings force of for
them was the svm
so actually here the so we trained a linear svm for every target speaker
what the positive we have only one positive samples of the next normalized
i-vector of the target speaker and the negative samples are the next normalize i-vector of
of the processed the
development set
so we had we can get some jump more miss with the mindcf of three
hundred two
you're two
so next we added the w c n and the loop
just after the lda
so he had for the svm would not get any improvement
but for
for the lda that would explain next slide we will got the w c and
was happily but
so here's
so he is a bit the a so we use our scalability implementation of the
standard lda
and does so the scores are the likelihood ratio between the average i-vectors of the
target speaker and the test i-vector not as he that the i-vector the average i-vectors
not normalized in this case which is not the case for the svm
so here also again we can get additional improvements with the mindcf of zero point
the two hundred it and i two
afterward we tried the some
we tried some score normalisation ideas
actually that i tried that you know i tried the
s-norm and others and one that was the working the best is a small
but i will also come back to the slated
as so actually a small usually what was used only at the recognition level but
i also applied as a clustering so he'll when we apply that's clustering we can
we can get additional improvement to the even dcf of zero a zero point the
two hundred eighty six
then i applied this if one at the after a lda scoring and you can
get also another jumping performance of the mindcf of zero point two how that the
fifty and eight and this was a system that was submitted as a dateline
at the design that line of the evaluation
afterward i thought also i that idea which replace this cosine create a score a
clustering by svm clustering which is also done in iraq and a manner
and also we can get them into several as and
additional improvements to the mindcf of the
zero point two hundred the forty seven which is very close to the best performing
system
so we now system this is more or less than i hit of the
just the pushing of our system
we don't have usually don't have quality measures
function
so that's it after afterward we tried the so i was trained with the clustering
so for the clustering
okay clustering was already study in the charts are four i-vectors in either support unsupervised
the manner or supervised manner for example the work from mit on cosine bayes k-means
clustering in which the number of clusters is known a priori and which because they
would what you want composition conversational a telephone speech
and then the improve the system by using good basic spectral clustering i don't with
a simple heuristic that the that in that computing the number of cluster automatically
other words from cream what using the cosine based the mean shift clustering
so wouldn't post methods all if i'm not among all use cosine does the scoring
other method used to provide the clustering like the one from you where the used
integer linear programming
but their method there a distance metric i think a small amount of this
requires labeled training data
in order to compute the within class at companies matrix
other works from the project at all when using the p at a
based clustering but of course this vad a needs labeled the external unlabeled data to
remote two
to compute the lda model and then of to do the similar to compute of
similarity measure and the iraqi could and do the iraqi plastic
so actually we tried different kind of clustering i'm not gonna going to ten and
all of them one of those was the ward clustering
and actually so it is also known also provides you don't clustering with the goal
is to optimize an overall objective functions by function by minimizing the within class scatter
this clustering is very fast
since its use lance williams algorithm
in a recursive manner
like in a recursive manner
and the actually the problem of this algorithm
is that it needs euclidean distance to be to be to be good
and the problem
it was shown in this work that the cost the euclidean this is not as
good as the cosine distance
what the as a cluster that we tried is what i quit the cosine ple
clustering so it's two-step clustering
what the first one is based on cosine
cosine measure
so
actually after each iteration the similarity measure is updated by the computing the cosine measure
between average i-vector of the resulting clusters
and the here the we decide to stop early in the clustering process in order
to ensure high purity clusters
so once we have this a first set of cluster because we can would step
a second us a step of clusters is the
s dataset is that is second step of clustering which debate on the lda
and actually we did it so somehow differently from others so actually we
we after each iteration we could train the p lda model and compute i again
the this is a bit i can similar to make a matrix
and the but since this is hot somehow posterior doing it we would we get
every five hundred
merged
so i'm gonna show this
this figure that the show them as the evaluation of a mindcf in terms of
the clustering process
on the progress set using as back and the bit happier days a model scoring
so as we see boasts
what clustering which is in blue and cosine classical sample at clustering which is in
that we can get better performance so then
baseline system and also we can see that consecutive clustering is much better than the
ward clustering
and the best the heat in this experiment the best the results were obtained was
a number of clusters of sixteen fell
let me now look a bit of the score normalisation
so as i say the we try to think of kind of normalization one of
the most the successful one was introduced by professor can but and he's as soon
then i think energy models on the paper
so this actually works quite nice in was unlabeled code set which is the case
in our that's not you
so as a set i use it for both a recognition and clustering so few
for recognition
the core set
that i used was all the development set
so the thirty six on the
i-vectors and what i took that the top-k neighbours
neighbours the i-vector to the propose but target the speech i-vector and the test i-vector
so use the formalize you see it's a symmetric form a lot
so we have mu and sigma involve this formal or more you the mean you
kate by for instance just means that
we take the top the one thousand five hundred the scores
that are scores that of the highest for
target speaker for the target speaker and then we do this and c and the
same for some there's the duration and that's one
so we have more or less the same formula that was used for the
for clustering
and he it but you of course it's between
two plus two pair of a pair of clusters
and the cohort set in this case is actually all the
what the average i-vectors that what that are not concern in this and this measures
so please or dialect or the clusters
but not see wanted one this one
so
that's that i'm gonna
conclude so
actually in this but this evaluation was very helpful for us we learn a lot
of things and the
and it was i mean and also the by special successful
so
and also we don't that clustering is
what important
and also the adaptive a symmetric normalization
this is that's can be reproduced with the with our open-source libraries that the
that you can see this link and we also you can
use you know what it and icassp paper
as future work and its you start working nist on it is
and how to automatically
addicted mind that of the stopping criteria criterion the clustering process and actually we have
some ideas
but i hope we can lead such a shared with you guys
so like the variation of the number of the mindcf on the development set and
the variation of the number of clusters of nothing written a clusters
and also possible use of spectral clustering
and so one a good idea for next the maybe for next evaluation that could
be considered
because it's because of its potential application is the somewhat supervised the clustering
so actually here there's many techniques emotionally that were that are order to use like
co-training and others
thank you for
congratulations that was very good system without fusion getting these results is amazing i have
the slight impression that you make the distinction between a supervised and unsupervised if you
can go back pieces like
i could easy to go back then slide
well i think this distinction is a little bit arbitrary a good as the unsupervised
we use the tree with that muhammad since i we used we try to use
labels and of course to what's better in the best results we demonstrated was it
was of course
it's always good a good idea is like some labels if you have them and
my impression is that the only way to get a fully unsupervised clustering without knowing
the number of classes is a more like model is bayesian method although in the
main see if there are some tricks in they if you check the original paper
of common each you you'll see that there are some tricks yet you can do
in that are successful in a much processing so you can somehow estimate the number
of classes but i think that's also the guys from
from liam that you have the must supervised
they use it also with the stander prewhitening without even
i getting about the labels and this and the system works fine as well so
it's a little bit are better for me this these distinctions not
it i think and my sense the
supplies an unsupervised adjust the
i in the sense of labeled around the unlabeled training data to
and actually i think
i just have a question of outdoor your svm you said you use this single
positive examples from the averaged i-vector instead of
five was that the examples of you try both
i that the
so you see a number of summation i tried many and actually
as so this one this one was what you the best
and i forgot to mention that it's was in and by used the
it's not you will the weights would like zero point one for positive ends you
want mine for negative
so that's but i think it's not it's not
well we gain bit by doing this
it's more or less the same if use the
but i mean
by the
by nist in a t
but as an online
the new could just the
say what they wanted to say about is em so i just have a comment
i never had the or progress
slide like you a the one you sure in
you of the third slide so
when you're developing the system you had the only progress which is
wonderful a really wonderful situation for to be more
almost it's very interesting for us to know also or you negative trials what you
tried and what was not efficient
during the development of who systems it's a somewhat the but it's interesting for me
that's true and
well if a file if i want to talk what about the things that did
not work i think it takes
but whatever that's
so you show some
different approaches for clustering but like you is just a few of the system
when distance slightly and the stuff to get a the backend was only the lda
the system
you
i combination different back end and i try
it was also
a different from the others i
put me it was not
maybe
some small gain something forensic
i guess regression and you
use measure
i think the that at the adaptive score normalization was doing the work of what
the
a quality measure was do we get for others i think it was also that
you know find
what the