how do
so i reference investigations about discriminative training
applied to vectors i-vectors that have been probably normalized
shown us the system on which focus
says using more i-vector based system first cognition
who is normalisation within class covariance the next normalization
then modeling notion p lda modeling providing parameters
me mean value mean mu and covariance matrices
and llr score
some works have been point one of the two
optimize parameters of this modeling be lda modeling
by using a discriminative the way
this discriminative classifiers use the logistic regression
maximisation
applying to score conditions of p lda
or for one to period parameters
statistics
the goal here is to have the new step an additional step to the normalization
procedure
which doesn't modifies the distance between i-vectors
unlike maximization em within class and then into constraints a discriminative training
once the and this additional no posted you
is carried out it's possible to
train the discriminative classifier with limited order of questions to optimize records that
as the older of questions to optimize by discriminative way
the core to z-score all of the dimension of the i-vector
then we carry out to the state-of-the-art logistic regression based
discriminative training
and also a new approach that for two hours and also norman discriminative classifier
which is a novel tint
first from addition the mattress
using the f e
is assumed to be statistically
statistically independent of t i s and the sit on
of the is constrained to lie in are line or in our own shove
the eigenvoice subspace
then a new zones comments about two weeks
long dot is four
the most commonly used mode and fourteen year
in speaker recognition
so the at all score can be written as the second degree polynomial function
of components of the two vectors of the trial w
and the value chain
which is can be written
all sonically out with marcus is p and q
we call that the state-of-the-art two days
was duration based
discriminative classifiers
try to optimize coefficients initialize bar be lda modeling
the use of as a low probability of correctly classifying or training
target as target non-target just target trials cold to tell cross entropy
by using gradient descent respect to some coefficients
the coefficients
that have to be maximized can be
is the period and it a score coefficients
so i do not missus p and q
previous slide
and following this way we propose a bible get an hour and so on
there are score can be written
as a dot product
between and expanded vector of trial
and the i-vector w use it is initialized with purely parameters
but books from a marketing proposed in two thousand
thirteen two
optimize purely a parameters mean value
eigenvoice subspace the mattress
three and nuisance variability matrix lambda
by using this
to tell cross entropy
function
discriminative training consider from those limitations of the recall that i since it is in
c
overfitting
overfitting on development data
and the respect of is about a made a conditions
matrices of covariance must be positive
the night the night
and the mattress experience you to the negative or positive
the condition right
so
some solutions have been proposed
constrained discriminative training
attempt to train only a small amount of parameters
for their
d where these the dimension of the i-vector
or then address instead of this call
so it shows proposed for example by wrote in and all
as your own box to mark screen
optimize only some coefficients for each dimension of the i-vector
and also for which a counts like make up scroll
sure you
can see that the scores composes some of
so what terms
it is possible to optimize the problem it coefficients for
each
bottom system
also only mean vector or
and eigenvalues of peeling matrices
can be train and we optimize it when the scaling factor also on the fact
of all
a unique or scholar for each matrix
it's possible so as to what we singular value decomposition of p into four parameters
to respect them it and it to parameter conditions
if it is gonna teach training
as the probably in the interesting results when i-vector we'll not normalized
it struggles to improve
speaker detection one i-vector have been first normalized
whereas assumption that she's the best performance
and represents all the additional normally the simplicity on the screen
propose an intended to constrain the discriminative training
recall that after within class covariance matrix w is a topic
after links number normalisation it has been shown that it remains
almost exactly isn't to pick
i mean and identity matrix in light bias colour
we propose just two
to rotation by z eigenvector basis of between class covariance matrix b of the training
dataset
computed over decomposition of b
and we apply is matrix of eigen vectors of be to each i-vector or
training or test
this is very simple person doesn't twenty four distance between i-vectors
so that doesn't deterministic matrices b is diagonal the value remains almost expected is a
true peak
and therefore they are not
because it b eigenvector basis is also going or
we assume
okay point is that we assume that building matrices from transposed and number become almost
they're going out of and then these all topic for longer
as a consequence is the mattresses of score involved in the air of scorpions you
almost signal
moreover as the solution of lda is
most exactly
according to the subspaces just a convict also be
"'cause" they were doing that is almost exactly equal to
i up to constant negative constant
so the first components of i-vector also proximity the projects them into the ldr also
space
so the score can be written as isomorph
allpass one down
that's there is a one ton for each dimension of the i-vector
and we
the other things are what is your turn
or is it i z off diagonal terms of the initial scoring
all the diagonal terms be on the asked to mention
and the offsets
so stressed and another proportion of a between zero score can be concentrated into this
song of all
terms one for each
dependent of independent
terms
here is an analysis of purely parameters before and after this with addition
and we modules the dignity always entropy of the matrices
value of maximal of one indicates that not expect exactly diagonal
we can see that after the right after
dissertation
all the value or a close to one
whose nearly matrices are very close to be diagonal
and also score metrics
and women's you result of p
so lofty lda by using some functions projection
distance between projects and then
sure the
matrix
aspects
and we see that and i is the most exactly the topic
to misuse the negligible or
part
assume that of for that you're violence we
compute on the last line table
the rest should between the violence
of the residual term and the variances along scroll
and we can see that after a four
manner
female
training set values and i close to zero
in terms of performance
we can possibly lda full baseline with the as a simplified scoring
in which we have removed
was it your term can see that's was it's a single
there is a d or don't of no
or
the plate of or in the speaker detection
so we can
carrier to discriminative training applied to the vectors
first a state-of-the-art logistic regression based
first approach following buggered
and are also then it is an interesting coefficient is the schematic training can be
performed by optimising
vector omega
score is a dot product between an expanded vectors trial given two i-vectors
you're marking on that the score can be written
as vector or of the auto
all that's and the steed off although this war owens initial
descriptive training
so one way second approach is based on works of books from one mike rate
and can be remarked that as a matter this is a close to be diagonal
there are close as you to their eigenvalue
a diagonal matrix
and so we perform following boxed on my we only
performance measures training
intended to optimize as a diagonal off if you transposed the scout are of long
vowel
and the mean value me
then will introduce no anomaly an alternative to the logistic regression
discriminative training
we define a is spectral
expanded vector or score of the trial
i was all this one
spectral where like to all
with a one
component for each dimension of cd
eigenvoice subspace and the last component which is
so was it your terms
so the score is equal to this vector or dot product of this data and
of a vector of ones
the goal here is to replace this
unique normal spectral
the problem vector by the buses
basis of discriminant axes are extracted by using fisher project
then i
we have extracted in
one can but not one but
several vectors we have to combine these buses
basis of the control to fronted the unique normal a vector
needed by speaker detection
so we can use a one woman shucked italian two
extract as the disk a discriminant axes
in this space of expanded vector
so we can see there are data set comprised of for trials target and non-target
trials
for each of one of those of them we
by the expanded vector all
of the destroyer
so in these datasets we can compute the constrain the dimension
we can compute the statistics of trial or a target and non-target trials
the within class between class covariance matrices of
this dataset
in this case of two class classifier target non-target and we can extract is taxes
you maximizing the fisher criterion
of a question nine
problem
since you understand what the problem
with two class
the
between just middle east forms one so we can only
extractor one non you're
value
one axis only can be extracted because we are
limit of is the number of class
but some time ago we get a random it or of proposed them in order
to extract marxism class is like using the fisher we do i am so different
as middle bars also normal discriminative classifier
since you was use the sometimes in face to face recognition
to
two cells and
researchers use it in those errors
the idea is in a given in this other reason we then a training corpus
td off expanded vectors
of scroll trial
target non-target trials
we compute the statistics we compute is are extracted vector maximize
which maximizes as official italian
and born as
we project the data set onto the orthogonal subspace of is a vector
so we extract a vector we have the background and we
project data on the aeroplane of this electoral
and we t right so we can extract more taxes
then
class classes
can be that is that fisher returns the geometrical approach which doesn't need
assumptions of ago sanity for vector corresponding latent all schools
i'm not
additionally
distributed
i can be shown that they follow independent each component of expanding score for one
c dimension following dependent non sound toolkit you distributions with distant parameters
for target trials and non-target trials
can be more supposing that if you
carry out an experiment using expanded vectors course whiskey to distribution
we obtain exactly the sandwich you
then we select a loss the idea that off cool
because if you chew
does not
a new informations
extract i-vectors of standard normal prior
so this is a
the we to put in a multifunctional score
for look at you
so that was on the same
but if we use this method to extract a try to extract the
discriminant axis
or an menstrual to address is to combine this subspace of
discriminant
axis to
to obtain the unique
normal vector are needed by speaker detection we need only
one vector to apply
so we have to find weights to
applied to each
also no discrete on tech vectors
that's proposed
weights equal to the norms the spectral
because by this way it can be shown that the variance of scores off
the
the axis
i don't iteration
the variance is decreasing
and so this is this missile is similar to a singular value decomposition
in which we extract the
most important axes in terms of variability of scroll then
the others
with decreasing violence and remark that at the end
the impact of the lasts and are
discriminant vectors is negligible or in this in the score
so
question ten show that to a trial we can have to rotation by be computed
expanded vector of g i g between two i-vectors
and the price of the product
of cs benedict always is
discriminant axes with seizes is
weighted sum of fisher could tie on
axis
for task training event if the dimension of expanded vector
is folder or do you can not disk or
we can of more than one hundred millions of non-target
trials
and since we have to compute the covariance matrix of
set of more than
and
so i four hundred
billions
trials
we can parameterize just cores that others statistics of
the training set
if we but make a pass training of the system things that can be expressed
as linear combinations
of statistics of subsets
so it's possible to split the task
i don't for experiments to split the task of computation of this you which
current training dataset
another remark
which was not and done by the also has a nice old
i
the nist needs
vertically to project data onto a to one answer space
at each iteration
and also if you are
billions of data it's very long but the paper was an unruly to me
extract i-vectors without
the concern of projecting data at each iteration only by updating statistics
it is possible to extract i-vectors without
are effective
where are projection of data at each iteration
lines use
of z recognition five
of phone is the sorry the two thousand ten telephone extended
with a vector provided by
borrow university of technology so santana
so as an eleven
thanks to on the chernotsky and of a month ago
for male set and from a set
and of the first line for h and i is the baseline
p lda
first as the two approaches using logistic regression on coefficient of score of punitive parameters
and the fourth line easier or something more discriminative classifier
we can see first that logistic regression there is the approach is frightening improving the
performance of p lda
it's why that's why the of the weighting because the incentives the cup
the corresponding is constrained
maybe overfitting on data all
although i don't know
and as the results are not better than p lda
maybe asked other links normalisation a vector r
go shown
it proves gaussianity
and seuss logistic regression is enabled maybe
to improve a getting
the performance
we remark that was more discriminative classifier is able to improve performance in terms of
equal error rate
and see it at all
for all send us more than female
not that's a to take into account and distortions in the television on the critical
original false alarms
it's able to learn or the only on is trials provide things the highest
as a non-target trials providing the highest schools
with the dentist and highest non-target
trial scores
we trained the thirty two
be bitter done with or
so the non-target set
what is the recent speaker in the one and to silence
you know evaluation which is a good way to assess what business of an approach
covers the conditions are not controlled
i'm with the real version noise short duration and mixing
male female
we can see that visit hardly are i-vector of
that or d is able to improve slightly performance of p lda
not just sets present indicated
on all those of the
official score board there are more suited our cruise the channels and their or and
we applaud
or this cost
well in don't not correctly calibrate
the discourse the development set
and so as a result
two versions
future works well working on short duration of the utterance of a team use a
desirable to improve slightly or
sometimes more
others ple baseline
and particulars the speaker variabilities system issue is not very accurate
as
the ones for short duration
and the also on i-vector like representations
following
whole v are which propose them
to extract a lower want to probability factors for speaker diarization
by using deep neural networks
we showed that is p lda framework a is able to texas
a new representation
and to deal with system in addition
thank you