how do

so i reference investigations about discriminative training

applied to vectors i-vectors that have been probably normalized

shown us the system on which focus

says using more i-vector based system first cognition

who is normalisation within class covariance the next normalization

then modeling notion p lda modeling providing parameters

me mean value mean mu and covariance matrices

and llr score

some works have been point one of the two

optimize parameters of this modeling be lda modeling

by using a discriminative the way

this discriminative classifiers use the logistic regression

maximisation

applying to score conditions of p lda

or for one to period parameters

statistics

the goal here is to have the new step an additional step to the normalization

procedure

which doesn't modifies the distance between i-vectors

unlike maximization em within class and then into constraints a discriminative training

once the and this additional no posted you

is carried out it's possible to

train the discriminative classifier with limited order of questions to optimize records that

as the older of questions to optimize by discriminative way

the core to z-score all of the dimension of the i-vector

then we carry out to the state-of-the-art logistic regression based

discriminative training

and also a new approach that for two hours and also norman discriminative classifier

which is a novel tint

first from addition the mattress

using the f e

is assumed to be statistically

statistically independent of t i s and the sit on

of the is constrained to lie in are line or in our own shove

the eigenvoice subspace

then a new zones comments about two weeks

long dot is four

the most commonly used mode and fourteen year

in speaker recognition

so the at all score can be written as the second degree polynomial function

of components of the two vectors of the trial w

and the value chain

which is can be written

all sonically out with marcus is p and q

we call that the state-of-the-art two days

was duration based

discriminative classifiers

try to optimize coefficients initialize bar be lda modeling

the use of as a low probability of correctly classifying or training

target as target non-target just target trials cold to tell cross entropy

by using gradient descent respect to some coefficients

the coefficients

that have to be maximized can be

is the period and it a score coefficients

so i do not missus p and q

previous slide

and following this way we propose a bible get an hour and so on

there are score can be written

as a dot product

between and expanded vector of trial

and the i-vector w use it is initialized with purely parameters

but books from a marketing proposed in two thousand

thirteen two

optimize purely a parameters mean value

eigenvoice subspace the mattress

three and nuisance variability matrix lambda

by using this

to tell cross entropy

function

discriminative training consider from those limitations of the recall that i since it is in

c

overfitting

overfitting on development data

and the respect of is about a made a conditions

matrices of covariance must be positive

the night the night

and the mattress experience you to the negative or positive

the condition right

so

some solutions have been proposed

constrained discriminative training

attempt to train only a small amount of parameters

for their

d where these the dimension of the i-vector

or then address instead of this call

so it shows proposed for example by wrote in and all

as your own box to mark screen

optimize only some coefficients for each dimension of the i-vector

and also for which a counts like make up scroll

sure you

can see that the scores composes some of

so what terms

it is possible to optimize the problem it coefficients for

each

bottom system

also only mean vector or

and eigenvalues of peeling matrices

can be train and we optimize it when the scaling factor also on the fact

of all

a unique or scholar for each matrix

it's possible so as to what we singular value decomposition of p into four parameters

to respect them it and it to parameter conditions

if it is gonna teach training

as the probably in the interesting results when i-vector we'll not normalized

it struggles to improve

speaker detection one i-vector have been first normalized

whereas assumption that she's the best performance

and represents all the additional normally the simplicity on the screen

propose an intended to constrain the discriminative training

recall that after within class covariance matrix w is a topic

after links number normalisation it has been shown that it remains

almost exactly isn't to pick

i mean and identity matrix in light bias colour

we propose just two

to rotation by z eigenvector basis of between class covariance matrix b of the training

dataset

computed over decomposition of b

and we apply is matrix of eigen vectors of be to each i-vector or

training or test

this is very simple person doesn't twenty four distance between i-vectors

so that doesn't deterministic matrices b is diagonal the value remains almost expected is a

true peak

and therefore they are not

because it b eigenvector basis is also going or

we assume

okay point is that we assume that building matrices from transposed and number become almost

they're going out of and then these all topic for longer

as a consequence is the mattresses of score involved in the air of scorpions you

almost signal

moreover as the solution of lda is

most exactly

according to the subspaces just a convict also be

"'cause" they were doing that is almost exactly equal to

i up to constant negative constant

so the first components of i-vector also proximity the projects them into the ldr also

space

so the score can be written as isomorph

allpass one down

that's there is a one ton for each dimension of the i-vector

and we

the other things are what is your turn

or is it i z off diagonal terms of the initial scoring

all the diagonal terms be on the asked to mention

and the offsets

so stressed and another proportion of a between zero score can be concentrated into this

song of all

terms one for each

dependent of independent

terms

here is an analysis of purely parameters before and after this with addition

and we modules the dignity always entropy of the matrices

value of maximal of one indicates that not expect exactly diagonal

we can see that after the right after

dissertation

all the value or a close to one

whose nearly matrices are very close to be diagonal

and also score metrics

and women's you result of p

so lofty lda by using some functions projection

distance between projects and then

sure the

matrix

aspects

and we see that and i is the most exactly the topic

to misuse the negligible or

part

assume that of for that you're violence we

compute on the last line table

the rest should between the violence

of the residual term and the variances along scroll

and we can see that after a four

manner

female

training set values and i close to zero

in terms of performance

we can possibly lda full baseline with the as a simplified scoring

in which we have removed

was it your term can see that's was it's a single

there is a d or don't of no

or

the plate of or in the speaker detection

so we can

carrier to discriminative training applied to the vectors

first a state-of-the-art logistic regression based

first approach following buggered

and are also then it is an interesting coefficient is the schematic training can be

performed by optimising

vector omega

score is a dot product between an expanded vectors trial given two i-vectors

you're marking on that the score can be written

as vector or of the auto

all that's and the steed off although this war owens initial

descriptive training

so one way second approach is based on works of books from one mike rate

and can be remarked that as a matter this is a close to be diagonal

there are close as you to their eigenvalue

a diagonal matrix

and so we perform following boxed on my we only

performance measures training

intended to optimize as a diagonal off if you transposed the scout are of long

vowel

and the mean value me

then will introduce no anomaly an alternative to the logistic regression

discriminative training

we define a is spectral

expanded vector or score of the trial

i was all this one

spectral where like to all

with a one

component for each dimension of cd

eigenvoice subspace and the last component which is

so was it your terms

so the score is equal to this vector or dot product of this data and

of a vector of ones

the goal here is to replace this

unique normal spectral

the problem vector by the buses

basis of discriminant axes are extracted by using fisher project

then i

we have extracted in

one can but not one but

several vectors we have to combine these buses

basis of the control to fronted the unique normal a vector

needed by speaker detection

so we can use a one woman shucked italian two

extract as the disk a discriminant axes

in this space of expanded vector

so we can see there are data set comprised of for trials target and non-target

trials

for each of one of those of them we

by the expanded vector all

of the destroyer

so in these datasets we can compute the constrain the dimension

we can compute the statistics of trial or a target and non-target trials

the within class between class covariance matrices of

this dataset

in this case of two class classifier target non-target and we can extract is taxes

you maximizing the fisher criterion

of a question nine

problem

since you understand what the problem

with two class

the

between just middle east forms one so we can only

extractor one non you're

value

one axis only can be extracted because we are

limit of is the number of class

but some time ago we get a random it or of proposed them in order

to extract marxism class is like using the fisher we do i am so different

as middle bars also normal discriminative classifier

since you was use the sometimes in face to face recognition

to

two cells and

researchers use it in those errors

the idea is in a given in this other reason we then a training corpus

td off expanded vectors

of scroll trial

target non-target trials

we compute the statistics we compute is are extracted vector maximize

which maximizes as official italian

and born as

we project the data set onto the orthogonal subspace of is a vector

so we extract a vector we have the background and we

project data on the aeroplane of this electoral

and we t right so we can extract more taxes

then

class classes

can be that is that fisher returns the geometrical approach which doesn't need

assumptions of ago sanity for vector corresponding latent all schools

i'm not

additionally

distributed

i can be shown that they follow independent each component of expanding score for one

c dimension following dependent non sound toolkit you distributions with distant parameters

for target trials and non-target trials

can be more supposing that if you

carry out an experiment using expanded vectors course whiskey to distribution

we obtain exactly the sandwich you

then we select a loss the idea that off cool

because if you chew

does not

a new informations

extract i-vectors of standard normal prior

so this is a

the we to put in a multifunctional score

for look at you

so that was on the same

but if we use this method to extract a try to extract the

discriminant axis

or an menstrual to address is to combine this subspace of

discriminant

axis to

to obtain the unique

normal vector are needed by speaker detection we need only

one vector to apply

so we have to find weights to

applied to each

also no discrete on tech vectors

that's proposed

weights equal to the norms the spectral

because by this way it can be shown that the variance of scores off

the

the axis

i don't iteration

the variance is decreasing

and so this is this missile is similar to a singular value decomposition

in which we extract the

most important axes in terms of variability of scroll then

the others

with decreasing violence and remark that at the end

the impact of the lasts and are

discriminant vectors is negligible or in this in the score

so

question ten show that to a trial we can have to rotation by be computed

expanded vector of g i g between two i-vectors

and the price of the product

of cs benedict always is

discriminant axes with seizes is

weighted sum of fisher could tie on

axis

for task training event if the dimension of expanded vector

is folder or do you can not disk or

we can of more than one hundred millions of non-target

trials

and since we have to compute the covariance matrix of

set of more than

and

so i four hundred

billions

trials

we can parameterize just cores that others statistics of

the training set

if we but make a pass training of the system things that can be expressed

as linear combinations

of statistics of subsets

so it's possible to split the task

i don't for experiments to split the task of computation of this you which

current training dataset

another remark

which was not and done by the also has a nice old

i

the nist needs

vertically to project data onto a to one answer space

at each iteration

and also if you are

billions of data it's very long but the paper was an unruly to me

extract i-vectors without

the concern of projecting data at each iteration only by updating statistics

it is possible to extract i-vectors without

are effective

where are projection of data at each iteration

lines use

of z recognition five

of phone is the sorry the two thousand ten telephone extended

with a vector provided by

borrow university of technology so santana

so as an eleven

thanks to on the chernotsky and of a month ago

for male set and from a set

and of the first line for h and i is the baseline

p lda

first as the two approaches using logistic regression on coefficient of score of punitive parameters

and the fourth line easier or something more discriminative classifier

we can see first that logistic regression there is the approach is frightening improving the

performance of p lda

it's why that's why the of the weighting because the incentives the cup

the corresponding is constrained

maybe overfitting on data all

although i don't know

and as the results are not better than p lda

maybe asked other links normalisation a vector r

go shown

it proves gaussianity

and seuss logistic regression is enabled maybe

to improve a getting

the performance

we remark that was more discriminative classifier is able to improve performance in terms of

equal error rate

and see it at all

for all send us more than female

not that's a to take into account and distortions in the television on the critical

original false alarms

it's able to learn or the only on is trials provide things the highest

as a non-target trials providing the highest schools

with the dentist and highest non-target

trial scores

we trained the thirty two

be bitter done with or

so the non-target set

what is the recent speaker in the one and to silence

you know evaluation which is a good way to assess what business of an approach

covers the conditions are not controlled

i'm with the real version noise short duration and mixing

male female

we can see that visit hardly are i-vector of

that or d is able to improve slightly performance of p lda

not just sets present indicated

on all those of the

official score board there are more suited our cruise the channels and their or and

we applaud

or this cost

well in don't not correctly calibrate

the discourse the development set

and so as a result

two versions

future works well working on short duration of the utterance of a team use a

desirable to improve slightly or

sometimes more

others ple baseline

and particulars the speaker variabilities system issue is not very accurate

as

the ones for short duration

and the also on i-vector like representations

following

whole v are which propose them

to extract a lower want to probability factors for speaker diarization

by using deep neural networks

we showed that is p lda framework a is able to texas

a new representation

and to deal with system in addition

thank you