given an i-vector on the value
can be decomposed in part
speaker all part with your town zero on
with
matrix v
score one contains them but this of
and they can based voice subspace
and always it you
which is like to speaker factor normally distributed
which is
so we to do is a
consummate weeks from now
which is for inside too
and the lies in most commonly used
p l system for i-vectors in which shown in effect is kept for
the decision score
proposed by someone prince
is or log likelihood ratio
in which we can see that
the computing the scroll depends only on the
nolan shelf
i matrix fifty transpose of five but
for speaker
factor
and vector transpose it proves long down
which content to talk about every reliability
there shouldn't fairly on modeling can provide good performance but
it has been shown that just performance are achieved only if the condition and prosody
a it follows the and extraction of i-vector all this conditioning posteriors
and is summarized by whitening most commonly used a
whitening is a standardization and length normalisation
i was matrix
of liability shown in for the standardisation
can is a total covariance matrix
so within speaker covariance matrix the volume
eventually to eventually we iterate this process
parameters are computed for the i-vectors present in the training corpus and applied to test
i-vectors
assumptions of the mission p lda the vicinity
justly
and the linearity of eigenvoices it means that
so the speaker are but can be constrained in a linear subspace
and the mostly just a city of the radio or
it means that a system to build your model assumes
that a speaker classes
statistics means that channel effects can be modeled
in a speaker independent way
so that the distributions shells a seven grams metrics
so it's independency between number
and the speaker factor
and the equality of covariance
garlic which occurrence it means that there are also between the residual between the actually
beach of a class
and the middle parameter
computed
for the jelly a seem to be uncorrelated
normally distributed a on the explained by
so front it's a simple
of the development corpus
so randomly
and that surrounds the not vary with the effects being more target
on the left as a graph
is the simple condition of the p lda model is in speaker factor one dimension
one additional subspace
where is no more
while stoned a normal prior for the speaker factor
and some classes with the same
viability metrics
or am is that i-vector no lie on
the nonlinear and find it connects subsets of an impostor
so as the distribution of i-vector noise
which is referred to as it's very core distribution
we think that perhaps insurance that exists a renowned speaker-independent admits a parameter on the
of within stego abilities questionable
in such a not affect be modeled in a speaker independent way
it's difficult to sure that something is right or something is wrong
for example if we find out or ration significant duration between
the whole
and the class parameter
the effect drama to it where you're late the estimation of random variable
first we present the deterministic approach
why printing deterministic approach to compute a purely apparently fast
because first two and we try some
deterministic approach is an remarks and that other approaches
not
all relevant sometimes a not so but the to suit
it should still there is not optimal for i-vector cycle distribution
can we replace is sophistication of the expectation maximization maximum likelihood
estimation of
parameters
by a simple and straightforward while stifle wildest an acoustic approach
so we want to know if
so application of the maximum likelihood
approach compute the parameters of the india
brings significant improvement of performance
we did not sorry may be the value when signals the between losing into programs
matrix was completely
on our development corpus
a singular value decomposition of the between speaker covariance matrix
give a matrix
whose columns are
so eigenvectors of the weighting between speaker
liability and the their remote matrix of eigenvalues
sorted in decreasing order
un a wrong are less and b
we can
compute
as arounds principle between speaker variability
and summarize it's and metric speech times t matrix
defined by the question for
the fast not to x p one two we are used to be turned on
matrix composed of the first occurrence of p
and so they're gonna matrix don't i want to well
is only comprise of the
highest hardest
eigenvalues
and so we propose a two
carry out
experiment with only
i w conditioning
conditioning and the system the still addition according to
within class covariance matrix
followed by next lemmatization
and the direct estimation of others at the parameters of the p l
the lda without which emitted and then and
on the bus on development corpus
so the scoring replaced by is the smart this is the total covariance matrices
for
is estimated by
that at the transmitters of the development corpus
and speaker levity metrics fifty transpose by be want to all
suppose can be justified if we consider somebody solely data from the development corpus
we can express as a factor and the parameters
speaker and with your
factors and she
well i on the value s is the mean vector director of speaker s
we show in the article that the covariance matrix is be i two well as
desirable that the speaker factor is standardised mean zero and i don't to metrics for
ability
and the dependence between that and variables
remark that only the new which of the covariance which is a necessary condition
is the
shift
and we cry and to obtain the lda scoring
next mission is known to improve the question it is so we compute the custody
of the speaker and was or fact also for development corpus
before and after length normalisation
top graphs shows
and distribution offices quell line source to standardise digital factors
left as the speaker factors on whites are ways of the optimum
the dashed lull i and is
the distribution of the key to
the speaker factor or must follow
a key with a degrees of freedom
and still on
okay to is a p u is of freedoms peas dimension of the i-vector space
we show it's not use that
for all
development board line
and so for evaluation
datasets
there is a mismatch
between them
and as a distribution of an intimate we can give it a distribution
remark also the several dataset shift between
development and evaluation dataset
after length normalization
is the volume
right care to experiments with
manage to compute parameters and with a deterministic approaches
in both cases
we can see that
so the numbers and the t v
partially reduced
and the shift
between the development and evaluation
mark sets a deterministic approach
improves the question e g
in a similar manner to ml technique
what's that is on the and it's recognition but most distant of motion t
always use of
three systems
we ultraviolet of conditions of the nist speaker recognition evaluations on eight ten
twelve telephone
is that the noisy environment
with the system
was a length normalization
following do not exist from going to signal
so that learns metrics and
two w which two cases
what is you know and mel
estimate of parameters and is a deterministic
an estimate of parameters
we can see
you can see that the result of the same in terms of
the colour right
between the two last the last two techniques
in terms of this you have the probabilistic approach women superior
and we mark sets l w conditioning performed a bitter
done the l signal conditioning
event with a deterministic approach
so no you consider that maybe the fact that so
and the end and better approach doesn't bring as expected
improvement of performance
maybe is due to the fact that the g p l d ar model is
not optimal for i-vector spherical distributions
so we compute
two series for development corpus
first the average people celebrated of zero is you of our observations
given the model
even
she and until
and standard t money t
which we consider that are likely would
off
class but also for likelihood
of the class given number
then we compile this
likelihood to the parameter of position of the class consider a wider of probabilistic class
position it's pasta or like a likelihood of the speaker and for speaker factor of
the class
and we display
the two series
all horizontal
wasn't really is parameter of class position and optical
the likelihood
of the reason you
according
we can model
the first graph
shows results always that would next normalisation
with i-vector lost provided buys extractor
and we remark here that
no volition a cross between the position of the class
and is a likelihood
of the residue
each time we displays a coefficient of determination task well
two scroll from zero to one
when which indicates l well data for points fit alignment
the task was equal to zero point zero four
close to zero
after are all length normalisation
a significant reduction
appears between the likelihoods of the class factors and the likelihood of there is you
a squirrel are equal to zero point filing nine and zero point six four
so there is a dependency between
the actual vulnerability
matrix of class and the probability position of this classic sperry by the likelihood of
the fractal
so we can see they are that's the show and it there was a dusty
of the raising your
we compute the previews
results who is well training set
in which that are not evenly distributed across speakers
so we can object that relations due to the can to differ information to speaker
some four
so we compute the same graphs and before
but on the for is speaker
training classes
with the minimum number of sessions per training speaker
we don't are you see that a minimal number of sessions speaker
one from two to sixty two
and this time for only segments of speaker which
the more than this minimum well
we compute the l score
we see that before makes them addition there are no problems
because the
the two series are independent
and after maximization be seen
that event for
uses speaker classes with the
the maximum number of sessions
the same
was we took us
is else well which are higher than zero point six
so we remark
that the j p alone modelling is a good model
but if we are obliged to
project that on the nonlinear also phones are
problem is to be sure that and the most acoustic model with the quality of
covariance
will for from the simpson
we don't dusty does take this out to replace the overall with a cluster between
that parameter by the class dependent parameter
steak the queen the local position of the class to fit to it
actual distortions
such an adrenaline is difficult to carry out
because it induces a complex density
i passing the within class variability parameters will nonlinear function
or getting up length normalization and
posting approaches as well which present over the i-vector on the one
attempting to find out attic what why also as heavy tailed be
discriminative classifiers pairwise discriminative
all just on why we are obliged to ignore the non maybe because and all
contain expected
the art abilities
may be related to some parameters
acoustic
just remark
which the and w conditioning
transform is the within class variability in the identity matrix
and identity matrix as no
principal components
maybe it at alleviates is a constant of
almost a dusty city
thank you
i
condition man something eat with experiments that you replaced the probabilistic approach of estimating the
parameters with the say on the screen
the minister
i think that's
in the limit if you're stream have main speakers
these two conditions exactly the same sort the only difference is that you're putting the
prior in the one case
okay so we present us with the number of the number of speakers average number
of speakers a and i guess that's you can go to a small number of
speakers when you train the model
yes and it and difference that's as of the deterministic approach is not intended competes
with a man a matter and then is the best way
but just i was surprised by is a slight yelp of performance
and so it's
assume that maybe because our aim ml count
be optimal because there is a problem of sphericity of data
but deterministic approach is not
and that's exactly this topic when we try to show that the norm
of the speaker factors
whether the full weight a
yes i guess a because you have to treat them as random variables because the
not simply points
under the plp scheme there they have a posterior distribution
"'kay" a better way
to consider whether they following the distribution
would be
broccoli to at the trace
the posterior covariance matrix v should be should also be added when you can't leave
the norm in order to see the overall distribution rather than
dot products on okay
marketing that's
so that in that's the same rationale "'cause" with evaluation was test like toss a
rice with development corpus used as vectors
same effect okay was that the difference is that between length normalization they'll score is
not close to zero
the cost for off test
i-vectors before estimators and has provided by the extract all
is the to zero point three
where is a vector of test not used for training the lda factor analyses so
there is a shift only not only for mean
but only four
this problem of almost instantly
just one quick what i just missed your point when you said
i think you were saying that
trying to make the det spherically distributed you thought was inconsistent with being gaussian
why's that
its empirical but the gaussian high dimensional space are sphere
yes some very
but
we constrained speaker fact all floral
and sphere
the just a goat
to assume that the within class but not be a set of the problem but
we will be affected
by the position
writings the posterior
the prior distribution of the i-vectors
zero mean unit identity rate both in high dimensional space that will be approximate
so that that's i mean that's care what happened i not as mathematically what a
high dimensional space so why's it in its just
here we actually a lot of what's a spherical distribution for phase as well
and applying model with the quality of correlators is a difficult the surface
maybe a see that length normalization is a whole technique projects on the sphere
instead of adjusting the tanks taking the information i think
good but which
but not so
discussion