first i will give a quick overview of i-vectors

after that i will

only some of the methods for hand recounts and start the

of the i-vector eyes them estimate scores by

limited the

duration of recordings

then i will

describe a simple preprocessing weighting scheme which uses duration information as a measure of

i wrecked or a oral ability

then i will describe some experiments and the results

followed by concluding remarks

in theory each decision should be made to dependent on the amount of data available

and the same should hold also in the case of speaker recognition since

we usually have recordings of different lengths

in practice this is usually not the case mainly due to practical reasons since panic

of uncertainty increases the article we agreed to make and computational complexity

and also

the gain in performance

cohen

can be not that could be not so significant especially if the recordings are sufficiently

long

in the case of

i-vector challenge

the

i-vectors were extracted from recordings of different lengths

and to the duration follows log normal distribution this suggests

that

we should see some improvement

if the duration information is taken into account

i-vector is defined as a map point estimate of keeping the variable of factor analysis

model

and it serves as a compact representation of speech utterance

the posterior covariance encodes the answer t

of the i-vector or

estimate

which is caused by a limited to duration of the recordings

usually

the i sort the

is discarded to and comparing i-vectors for example in the

be lda model

nevertheless there have been proposed some solutions how to the

take the uncertainty into account for example be a day with uncertainty propagation

where and then we should note term is added to the model

which models

which explicitly models the

duration variability

another one

is score calibration using different

duration is a quality measure

and yet another recycle i-vector scaling where the length normalisation is modified this to account

for the

uncertainty of i-vectors

and those solutions are not directly applicable or at least not

easily applicable in the context of i-vector challenge

scenes

the data for we can start

reconstructing the posterior covariance is not available

and also there is no development data that could be used for

optimising the calibration parameters

so is there another possibility how to use duration information

to as a measure of i-vector a

rely reliability

prior to

comparing the i-vectors are usually preprocessed

among more common preprocessing methods are pca lda and do within class covariance normalization

in which the basic step is to calculate mean and

covariance matrix s

we implicitly assume

in those calculations that

each the i-vector is equally all i-vectors are equally reliable

some to account for the difference in a reliability of i-vectors

re

proposed a simple weighting scheme in of each other

in which the to could contribution of each i-vector is multiplied by its corresponding duration

so

to verify the

soundness of the proposed idea

we implemented that the baseline system right in which we compare it

the standard pca with

the weighted version of the pca

the results showed that weighted version of peace

pca

produce slightly better results than a standard one

we also wanted to

try within class covariance normalisation

but

in order to

the apply within class covariance normalization

we need to have labeled to date time which was not the case in the

challenge

so we needed to perform unsupervised the clustering we

but

experiment that with the different clustering algorithms but that the end to be selected k-means

with cosine distance and four thousand clusters

unfortunately the results are worse for within class covariance normalization then for a pca but

at least the

the weighted version was

slightly a cat of the standard one

we tried also several different classifiers and the best results were at used

with a logistic regression but only after reading remove the

length normalisation of from the processing pipeline

in that case within class covariance normalisation

gave better results then pca and all spend the can

weighted towards and was

score the better than standard one

we try to further improve the results by additional fine-tuning

so we put the duration as it and additional feature of i-vectors we excluded clusters

with small official score

we were is the roles of

target and test i-vectors

and do you can't do you want to the hyper parameters of logistic regression

we did this fine tuning we were able to improve

the previous result

for a little bit more

so this was also our

best submitted result

to conclude we

present that

a simple preprocessing weighting scheme which uses do duration information is a measure of i-vector

a reliability

we at you would quite reason the bus six sets

with a clustering in the case of within class covariance normalization

but okay but cat

nearly no success with clustering in the case of the lda

which suggests that we had a is more susceptible for labeling errors

and the last remark we found out that length normalization does not help logistic regression

thank you

okay

just empirical results but maybe somebody s can comment that i don't know

nicole

with on the side

but we with at the same results as logistic regression icons otherwise

did you generate what we did a clustering or you just one clustering stage we

tried

different things also two

to iterate the clustering but didn't six it

this was also experiments clear sets of four thousand because we didn't the get then

you improvements by

changing