university espain speaker recognition
i-vector speaker recognition
PLDA
to get the parameters of the PLDA, we need to do the point estimates of
the parameters
maximum likelihood supervise
plenty of data
development data from
the PLDA considers i-vector decompose
where the prior is Gaussian
to use this model
a large number of data
if we don't have a large of data, we are forced to
speaker vector
where the prior for y is Gaussian
Gaussian
in this case we need less
so if we have for example twenty
a number of
dimension of speaker vector ninety
in the Bayesian approach
for the parameters
we are assumed they are
priors
on the model parameters
and then we compute the posterior
given the i-vectors and
so
methods
compute the posterior
prior
in this case we compute the posterior
from now on we call this prior
and finally we take
by computing their expected values given the target posterior
to get the posterior of the model parameters
solutions
what we do is they compose
assume model parameters
then we compute in a cyclic fashion
and finally we approximate
is the number of speakers in the database
and the posterior for the
for the channels
is the number of the segments in the
then we can compute
for the target data set
from the original data set to the target data set
we can compute the weight of the prior
target data
to do that we should modify the prior distribution
the weight prior has dependent
of the number of the speakers
that we have in the last data set
so we change the parameters
we want to multiply the weight prior
we have need to modify the alpha
these two parameters
but at the same time, they give the same expectation values for
we can do the same with the prior of w
and the finally
for the number of speakers and the number of segments
effective number of speakers and segments of the prior Gaussian
we are going to compare out methods
the normalization is
that do centering and whitening
to make more Gaussians
fixing Gaussian
unitary hypersphere
to reduce the data set
now I explain the data set
data set
this is
data set we will use
similar to the
telephone channels
that contains 30 male and 30 female
data has the similar conditions
conditions
two to three minutes
data set with large
we use this five
that contains more than five hundred males and seven hundred females
and it has variety of channels
speaker verification
we got twenty MFCC's plus delta and
we build the system
we use the normalization too
the parameters
and finally we used s norm score normalization with cohorts from the
first here
we compare
we can see improvement
we can see that
the prior distribution
we compare for instance the first line and the last line equal error rate
forty percent for males and fourteen percent for females for min d c f improvement
of twelve percent for males and forty six percent for females
here it is a table compare difference parameters
we can see
improvement
here we show length normalization with s norm and without s norm
when we use
improvement using i-vector but not as much as
we can see too that
in this data set vector normalization
better or
here we show some improvements
and for females
finally
we see that
we can see that without normalization
finally the conclusions we have developed a method to adapt a p l d a
i-vector classifier from a domain with a large amount of development data to a domain
with scarce development data
we have conducted experiments
we can see this technique improves the performance of the system
and these improvement mainly comes from the adaptation of the channel matrix w
we have compared this method with the length normalization
we have better results
we have discussed length normalization
as future work Bayesian adaptation of the u b m and the i-vector extractor
no the i-vector length means
not the dimensional of the i-vector
maybe we can do the same
as we have more norm data