i am certainly not myself and that would like to
tell you
about our
system for the nist i-vector challenge
so
the old land of my topic is false
first
i would like to
show your overall system description
is then i will be the i will describe i clustering program and a
next
i can stick one went we will present so
our subsystems
like i-vector p l d subsystem be vector
r b m or dbn i p l d subsystem
and the last one i-vector lda svm subsystems
next so i would talk about
mark while the matter function to incorporate
test duration information
in scoring
and the
next so
subsystem fusion really present that and finally i will
the present so our results and so i will make conclusions
let's min
show you overall system description
yes you can see we
exploring different systems
subsystems
idea to build the that's a standard one and
state-of-the-art systems the speaker recognition task
the same no and
some noble systems also
and was aware used
aside just our bn or d b and b vectors
subsystems
which is based on a p l d's tandem be of the model
and the last one is
and
well known lda svm subsystem based on i-vectors
we made a fusion or four
our different combinations or for our systems
and also we take we took into account so quality measure function and so we
incorporated test duration information
two
it's a good scoring results
so
our system was developed by different although simultaneously
and that the let us to
different clustering algorithms
to the different subsystems
as you can see for
the lda are be
the l b
subsystem be used
clustering algorithm one
and for the
lda svm subsystem we
have developed
its own clustering
algorithm which name is order and two
so few words about the clustering problem
with the which we
we're
do their thing
so
first so we try to use sound a standard
techniques for clustering such as
kind means and bottoms
but we didn't succeed with
those techniques
and the
there are two empirical established back from the speaker recognition
which are can help us
first of them is that the cosine metric is a kind meaning comparison metric and
on vector space and the second so you the
that the model a raging normalized a vector is
consider the most efficient model the session
model
so
we decided to use for initial clustering step only for initial clustering step
cosine distance
next we try to used to build a big would be very clustering strategy
after there is of course
cosine initial clustering step
it's makes sense to use a more efficient bill dimitri
which explicitly takes into account
between speaker or within speaker variability
so you can see the
this scheme all the
you'll do we clustering on this line
but we
manage
with only one iteration
we obtain good results are on the after the first iteration of the p l
d requires three
so we did
cosine into the station then the lda training and
building a tree clustering
we a deed
sites you know four bars
using a bus
algorithm one em algorithm two
no i should mention about
and b lda model because i will need
some
parameter names on the next slides
so we used on our model
and the number or for eigenvoice matrix a eigenvoice voices source and the one and
the number of eigen channels was and two
well
first
clustering algorithm consist of two stage
states
and so
but you're stage is
and every stick also watch
for the clusters
it is
like i mean shift
clustering algorithm
so we step by step find
the clusters
using mean shift
algorithm
and the second stage we try to compensates the hero all
the weighting for one speaker i-vectors to diff
one different in different clusters
so we used
a simple bottom-up stage of the
agglomerative hierarchical clustering
and so
use a simple repeat until up
i'll
they also you can see the reference
to the mean shift clustering
our viewers told us about
that our algorithm is very similar to the
two
that's it is described
in this or
our seconds algorithm is just a sound or standard
agglomerative four
bottom-up stage of h t algorithm and it is else a used i it is
also uses a course
cosine or plp matrix
and so
the threshold tower three is involved
two
for stopping criterion
the next slide i
show you
i will show you
the same with some parameters
and it's values
for initial post clustering we used to
such condition such conditions
that our threshold from
first and second stage
or was equal
and so
were you go and the equal zero point twenty nine
we used to sixty a
sixteen the random clustering integerization
and also we
use the rules that no liz and two and no more than
fifteen fifty vectors
or could be
in
a cluster one cluster
because
l so it should be mentioned that the p lda clustering
was done using simplified the lda model
so we i used
the three hundred eigenvoices
and the used full covariance noise model
for such a case
the threshold tall one was equal negative zero point two
and shower
two was
zero point twenty two
nine
and for a clustering who we will use the rules a normal it's and three
and no more than
fifty i-vectors
jolt
would be chosen
for algorithm two
would be used to the value
that was three
which was people zero point forty three and we also used simplified really model but
the different is that we used only
the diagonal covariance noise maddox
and the
there was another rule
no list three and no more than
so directors in clusters
well
for as their bodies and false or our experiments
we use we used another plp model
which
two into cannot you count channel factors
and to be used only diagonal covariance matrix
so in our case
and one was required to achieve d and two was
fifty five
model training or to build the i-vector purity system
have to be made using curve the results of for the algorithm one clustering
for the initialisation all their eigenvoice maddox we may have used you see
and the
it to have been mentioned that only one ml duration you maximum likelihood duration is
need
you we will eight
next iteration you'd so we'll that best to some degradation
a few words about a b m p l d system
and we can use it's to
extract
you be vectors from our i-vector
i-vectors
so it is not so strictly speaking it is not
and extractor but it is and non-linear project of role i-vector space to be i-vector
space which incorporate the not information or to the
speaker verification task
so we now simply used
probably in training for their
classification task
two
obtain german
distribute distribution all the i-vectors and its
the labels
and also we try to use so
additional hidden line
with
unsupervised training
and the in this case the number or for a new rounds or for first
wire was two thousand and the number all
neurons of softmax lie was five hundred
just that's in the previous one
where are
each
was equal
five hundred
so what is to be reactive
we used posterior or posteriors of the softmax layer to obtain our be vectors by
using
p c and the
we see projection all the local posteriors
in the low dimensional space
so in our case
the number was
and see it was equal to
number all near on solve who he don't lie and
what equal five
but for that be vector p l b vector space be used
another be lda model which is different from the i-vector space
we use the number of for each invoice four hundred and the in such a
case to be used a simplified be of v mobile
so
lda svm as the have been mentioned
before used to
rusting algorithm to and tusks score normalization procedure yes it's normalization
few worst about well to measure function
we it is well-known that the a threshold of the mean decision cost
function depends on
test
and roll
segment duration
and to take intake for so i in the nist i-vector challenge of a deal
with we don't with
multi session and role model
and the
every duration also and role model is much better a much larger than the duration
of the test models
so we ignored the dependence
one there
and roll durations
and so we
focused
on the explore investigation all the dependence on the test
duration
so we did it using power
clustering results
we
prepare
some protocols
five session
and roll protocols and to be obtained and several points
and the also obtained linear dependence
well the threshold
front
locally from both
this duration
but
it should be mentioned that
though who are very from function no could be replaced by the
power function for example
the
square root
the because of similar bic a or
those function
for of system fusion we used a simple
linear combination weighted sum
well the scores but to be also
we need to some sigma normalising a fusion
for c lda svm subs system
it equals one but for a other subsystems it it's
before
so to results
first
i will show you
our results
with incorporating hopeful to duration information so they can see that using
quite a measure of for function let us a two
significantly to reduce
minimum decision cost function
and i guess
requires the reduction
for me minimum decision cost function by ten percent
for lda svm subsystem but for final fusion with equal weights
it's also achieve achieves good performance break seven thousand
relative
no about the pure sound or for i-vector and be vector
space purity models
and scores of this model
so
it's
we a obtain and
we obtain so
and reduction of the mean decision cost function this is you to the fact that
the
r b m or dbn presents non-linear
transform want the i-vector space it's a it's a little with us to make that
few room
such systems
no for
lda and r b m field is subsystems pure and b
at your good results
but the weights aurora on equal
different that we are there have optimize it by submissions
and v the habit you
zero point the two
four and one
and the to the our best results
we just consists of four
three subsystems
of the svm subsystems are be mpo this subsystem and ubm the only subsystem
or
in such a case the dbn plp
you it gave us a little bit more information
for the verification and we managed to achieve
zero point two
three nine results
results
which is the best one
and took
conclusion
we have presented so our system which consist of
p obviously it'll d and their bm systems
we present its agglomerative clustering algorithms
they also combination of the lda and l die it'll d is frames systems
use
different clustering algorithm
this resulted in effect if you're one
and a nonlinear transformation of
i-vectors in be vector space
it also
leads to successful fusion
classical i-vector systems
so that's all
i have also congratulations a i just wanna one ask you the use of the
mean six outweighs more version of mincing start with
did you compare its for example with that standard right of clustering
to see how much gain from using this algorithm
yes we did it and the
a you can see that to be used the algorithm to and we try to
use a great and two clustering for training the p l d model
and the algorithm to is just an bottom-up stage hands says honour one and the
it's let us to
some degradation the mean shift the was
better
for this task
specially for p l d train
and