my name is the only hobby a from part process research centre points of what's
and on a take the topic e is the i-vector more than in q we
deep belief networks
for multi session speaker recognition
you know the acoustic modeling a using deep belief networks have been shown to be
effective in speech recognition area and it's the getting popular not nowadays
but a very few items the using only r p m's restricted boltzmann machines or
generative ubms have been carried out in speaker recognition area
we have proposed in our period previous work is that the was published in i
can speak at some fourteen
we use the both generative and discriminative it dbn
on that work we use the only a single session target i-vectors as the inputs
the to the networks
in this paper we extend our previous work from a single decision to a more
decision test
that the we have used the then
i-vector challenge database in these experiments
and also we have modified our proposed impostor selection method that the
to be more accurate and more robust against the its parameters
first the ability to short a background about the deep belief networks and then i
will go
i will describe a all our dbn based system and then i will go or
more in details the in our proposed impostor selection method
and the i didn't show the experimental results that and at the and the conclusion
deep belief networks the are originally a problems
probabilistic generative models
that every two at some layers are treated as the restricted boltzmann machines
and the old ones are you to our bn will be the inputs to the
above all the m and is trained to label layer
however by adding top
label layer this you know generative dbn can be converted to a discriminative want by
doing the standard back propagation
in this is like the i have some information about the how they are bm
is trained and trained and
how it's the good fit for to be matched with the per training a neural
networks but i think i can escape is
it's and i and is better to focus on our method
less remind what's the problem
the problem is to model each target the speaker be a valuable i-vectors what we
have you are five i-vectors are part of i-vectors per each target speaker and a
large amount of background the i-vectors of the development set
our proposal is to use the deep belief networks for two main reasons
first is the two
face first is to take that want a job well unsupervised learning using the
i relevant background data at the development set
and to take that mine page of a supervised learning to train each target model
and discriminatively
this is the whole blacked out drama all our proposed method let's the two in
the widely in three main is that's
the first is that is balanced training
what what's the problem imbalanced training here in this case the we have a large
amount of background i make doors as a negative samples and if you amount of
a target data at the positive samples
as we are going to model each target speaker discriminative leaving it you get let's
and the training the network with such a on balanced training be the list the
overfitting
so the solutions we have proposed here to decrease the number of background i-vectors as
much as possible in their effective way
we don't is in tremendous that's the first
we select the only those background i-vectors that are more informative
and then clustering the selected on in post or by k-means algorithm and the using
cosine distance criteria
and then using the
the imposed and the cluster centroids as a negative samples
and then finally a we will distribute a the positive and negative samples and equality
in mind the mini batch it
the second step is the adaptation process that you have proposed in our previous work
i adaptation using all the background i-vectors we have be trained at a deep net
network
unsupervised think the without a label
and because the trained model universal deep belief network
and then each to target the speaker network speaker will be adapted from this a
universal dbn
but how adaptation the works
adaptation
be initialized and the networks the i instead of randomly and be initialized by the
ubm parameters
and then do they are unsupervised learning
on we the balanced data all
from this of one for only a few iterations
in our previous work we have shown that
the period and the pre-training in this case
works better than random initialization
and the proposed occupation works better then pre-training
the second is that this last is that is fine tuning that is actually a
back propagating is
the neural networks using the label later
but we have to change something here in comparison to estimate would be perverts the
do one the only one layer error by provided
propagation
for few iterations the before full back propagation is carried out
our experimental results in our last in our own previous works shown
as shown that is this works better because and the op the top
the label layer
by this is the something like a pre-training the top layer as well and it
works better that during the whole backprop right migration
without doing this
on the other hand be bic and bic and a d by our black there'd
role models is then be to two main phases that the first the phase is
target independent and the c can is target dependent
actually target independent using the whole background i-vectors we have we train a universal deep
belief networks
and it be compute the impostor centroids
that how this process is carried out only once for all the target speakers we
have
in the second that's
and you think
using the you db and impostor centroids
and the available target i-vectors we will train our networks the discriminative be
let's scroll more in details in the proposed impostor selection method
and this method is
it is similar to the
support vector or bayes the
approach that proposed by mitchell at clarion and the is it compose the but we
have used here the cosine distance criteria and the we have changes some other things
it composed of well four main steps the
as some of the we have the whole background i-vectors in wants to hang out
on another so that we have the client i-vectors
each collect direct or
that in this case is the average all five i-vectors berries client
be to compare our bit all background i-vectors we have
using cosine distance criteria
and the top and i killers this the background i-vectors to each client
will be kept in address that thought age in this
a steps
and maybe do the same for all the reliant i-vectors
until the car i-vectors the cocktail ends that we have
and the be compute the impostor frequencies in this that age and be normalized aim
at n is the and top i-vectors the in each other for each client and
the whole number of collect i-vectors
and beep is that the this normalisation
at the impostor frequency is more robust the against the threshold that we will define
on this the frequencies
then we set a threshold on this normalized impostor frequencies and those impostors have higher
frequency frequencies then this are sure will be selected that the most informative impostors
actually we have b
we have the impostor frequencies and for all the background i-vectors we will have one
frequencies will be defined iterations and those i-vectors the impostors that have higher impostor frequencies
that then defined threshold will be selected
this the threshold and the then and parameter will be defined experimentally
at the experiment on section
if the order or the impostor frequencies for the
impostors the we will see that the any post or the have the same frequency
a impostor frequencies
that the that's why be have
defined at a ritual the on the impostor frequencies not just the selecting the top
a fixed number of a simple so
in experimental station the dataset the that you have used is the
nist the two thousand fourteen a i-vector challenge the i-vector size that you know is
six hundred
post processing that you have like eight out on i-vectors on
all mean normalization the last whitening
one hidden layers is used in this extreme as and the hidden layer like a
layer size is four hundred
forty owning the
the two parameters for the impostor selection method that is
the threshold and the and parameter if we plot the per the minimum dcf
verses the this threshold for different and
we will see and he's a
a small
the results are not good i if and is the too high
biz the performance of the system want to be used a bell white changing the
original
and the best one is the choosing in according to our experiments is choosing
and equal to one hundred and it shows the
by setting that originals by this we will have a minimum m
value for minimal dcf by these utterance rolled and setting and equals to one hundred
in experiment all the results the be in this challenge we have we had one
baseline system that everyone knows what's the baseline
our proposed a dbn based is then be the target independent impostors that is good
lowball impostors for the same for all the
target speakers
if we
do this experiments we will have a this results
that the is the big difference between
the baseline system and our system
and if we add a the target dependent the
targets
to the target independent impostors that in this case is one hundred is and the
parameter and the at this pool is targeting depend the non-target depend then we will
have
better performance that is the
this
when you
but in this case a if we at the target dependent the complexity of the
system will be more than the first one because the in for each target the
for each a target speaker for just speaker we need to do the clustering separately
what in this case we just the compute the impostor centroids the ones for all
the speakers
if we do this that normal score normalisation on our baseline i have on or
dbn and basis them maybe without that normalization and the results in this
what if the ad that normalization using the all the whole impostor database we have
the development set we will have words results
if it's select the only ten top one thousand kilos this i-vectors impostors we would
have it be better what is it is the worse than a without using that
norm that normalization
but the
beach the but if
we use the same impostor selection method for that normalization v a v is the
and setting the parameter t and aiken again for this that normalization
we will see that we have a be in for right you be improvement here
and the
and the in comparison to the baseline system we will see that the we will
have
to in the three percent improvements
actually this twenty percent improvement is the in comparison with these results with these results
the that he's the all the results the improvement is more than this
but
in this experiment so the for impostor selection method you have used the client i-vectors
our experiment our new results experimental results have shown that if we don't use the
client i-vectors
i collect i-vectors the
and the just select the particular and the i-vectors collect i-vectors from only the development
set we will see that the
we will have almost the same results then this that are very similar that actually
a
for our system proposed system it doesn't matter that we used the client i-vectors in
or impostor selection method or select or jobs randomly choosing a the actual and i-vectors
from only the background i-vectors
and the main conclusions and
in this paper or b and b have the problem of the impostor selection method
for that we have shown that the helps to well outs is then to what
the
we'll have a good important for performance in multi session task
and that really been the out more i-vectors the well very sharp where each target
speaker helped the dbn system to capture more speaker and session variabilities in comparison to
the single session task
and also the final discriminative dbn per dbn based the approach showed a considerable performance
in comparison to the com conventional baseline system propose the wine is seen in this
challenge
thank you
we have time for question
thanks to talk alike extension of the background dataset selection that you on the
one question that comes to mono is when you doing a selection you looking at
all the clients that are going to be enrolled system sorry i and you know
also are not close enough again a so when you doing this dataset selection you
looking at what is just statistically important are the clients that are going to be
rolling system so you're
system itself fourteen hours information about are you going to test on
why wouldn't you just to closed set speaker i'd say that
so reading it
the when you're choosing at your impostors your before you dbn training all z norm
that selection process itself is aware of all your target speakers
yes that's correct
so why not take a further and just a closed set speaker i they for
the i-vector challenge
yes that's why i'm telling you at the experiment the results extend i told you
if we don't use the non-target i-vectors and just the and select randomly the same
number of actual and i-vectors only from the development set
and we use these a in iteration process use the for instance the one thousand
the three hundred the i-vectors randomly from the development set and do the same processes
the computing the and impostor frequencies
and then again choose the and the random i-vectors and do the same and computing
the impostors and then being the outrage overall impose an impostor frequencies and you the
same set the threshold and setting the parameters
we had almost the very similar results of these results that you have views on
the target like make so that's a that's a very
client specific selection menu not aware of the other clients in that sense
very nice
with data yes technically looking at the other clients with against the rules of the
i-vector challenge but he has a solution that didn't have the other thing is the
closed set scoring don't make here for wouldn't actually work because they are all different
speaker