Speech Transcript - Incorporating Duration Information into I-Vector-Based Speaker Recognition Systems

0:00:15	first i will give a quick overview of i-vectors
0:00:19	after that i will
0:00:21	only some of the methods for hand recounts and start the
0:00:26	of the i-vector eyes them estimate scores by
0:00:30	limited the
0:00:31	duration of recordings
0:00:34	then i will
0:00:36	describe a simple preprocessing weighting scheme which uses duration information as a measure of
0:00:46	i wrecked or a oral ability
0:00:49	then i will describe some experiments and the results
0:00:54	followed by concluding remarks
0:01:00	in theory each decision should be made to dependent on the amount of data available
0:01:07	and the same should hold also in the case of speaker recognition since
0:01:13	we usually have recordings of different lengths
0:01:19	in practice this is usually not the case mainly due to practical reasons since panic
0:01:27	of uncertainty increases the article we agreed to make and computational complexity
0:01:34	and also
0:01:36	the gain in performance
0:01:38	cohen
0:01:39	can be not that could be not so significant especially if the recordings are sufficiently
0:01:46	long
0:01:49	in the case of
0:01:51	i-vector challenge
0:01:53	the
0:01:55	i-vectors were extracted from recordings of different lengths
0:02:00	and to the duration follows log normal distribution this suggests
0:02:06	that
0:02:08	we should see some improvement
0:02:11	if the duration information is taken into account
0:02:18	i-vector is defined as a map point estimate of keeping the variable of factor analysis
0:02:23	model
0:02:24	and it serves as a compact representation of speech utterance
0:02:31	the posterior covariance encodes the answer t
0:02:36	of the i-vector or
0:02:39	estimate
0:02:42	which is caused by a limited to duration of the recordings
0:02:46	usually
0:02:47	the i sort the
0:02:50	is discarded to and comparing i-vectors for example in the
0:02:54	be lda model
0:02:59	nevertheless there have been proposed some solutions how to the
0:03:05	take the uncertainty into account for example be a day with uncertainty propagation
0:03:11	where and then we should note term is added to the model
0:03:16	which models
0:03:17	which explicitly models the
0:03:20	duration variability
0:03:22	another one
0:03:24	is score calibration using different
0:03:26	duration is a quality measure
0:03:28	and yet another recycle i-vector scaling where the length normalisation is modified this to account
0:03:35	for the
0:03:37	uncertainty of i-vectors
0:03:43	and those solutions are not directly applicable or at least not
0:03:48	easily applicable in the context of i-vector challenge
0:03:53	scenes
0:03:54	the data for we can start
0:03:57	reconstructing the posterior covariance is not available
0:04:01	and also there is no development data that could be used for
0:04:08	optimising the calibration parameters
0:04:12	so is there another possibility how to use duration information
0:04:18	to as a measure of i-vector a
0:04:22	rely reliability
0:04:27	prior to
0:04:32	comparing the i-vectors are usually preprocessed
0:04:37	among more common preprocessing methods are pca lda and do within class covariance normalization
0:04:46	in which the basic step is to calculate mean and
0:04:52	covariance matrix s
0:04:55	we implicitly assume
0:04:56	in those calculations that
0:04:59	each the i-vector is equally all i-vectors are equally reliable
0:05:07	some to account for the difference in a reliability of i-vectors
0:05:14	re
0:05:14	proposed a simple weighting scheme in of each other
0:05:21	in which the to could contribution of each i-vector is multiplied by its corresponding duration
0:05:29	so
0:05:30	to verify the
0:05:34	soundness of the proposed idea
0:05:36	we implemented that the baseline system right in which we compare it
0:05:42	the standard pca with
0:05:45	the weighted version of the pca
0:05:49	the results showed that weighted version of peace
0:05:52	pca
0:05:56	produce slightly better results than a standard one
0:06:01	we also wanted to
0:06:04	try within class covariance normalisation
0:06:07	but
0:06:08	in order to
0:06:10	the apply within class covariance normalization
0:06:14	we need to have labeled to date time which was not the case in the
0:06:19	challenge
0:06:21	so we needed to perform unsupervised the clustering we
0:06:28	but
0:06:29	experiment that with the different clustering algorithms but that the end to be selected k-means
0:06:35	with cosine distance and four thousand clusters
0:06:42	unfortunately the results are worse for within class covariance normalization then for a pca but
0:06:49	at least the
0:06:51	the weighted version was
0:06:54	slightly a cat of the standard one
0:07:00	we tried also several different classifiers and the best results were at used
0:07:07	with a logistic regression but only after reading remove the
0:07:12	length normalisation of from the processing pipeline
0:07:17	in that case within class covariance normalisation
0:07:21	gave better results then pca and all spend the can
0:07:27	weighted towards and was
0:07:29	score the better than standard one
0:07:35	we try to further improve the results by additional fine-tuning
0:07:42	so we put the duration as it and additional feature of i-vectors we excluded clusters
0:07:49	with small official score
0:07:52	we were is the roles of
0:07:56	target and test i-vectors
0:07:59	and do you can't do you want to the hyper parameters of logistic regression
0:08:04	we did this fine tuning we were able to improve
0:08:09	the previous result
0:08:11	for a little bit more
0:08:13	so this was also our
0:08:16	best submitted result
0:08:20	to conclude we
0:08:22	present that
0:08:23	a simple preprocessing weighting scheme which uses do duration information is a measure of i-vector
0:08:30	a reliability
0:08:33	we at you would quite reason the bus six sets
0:08:37	with a clustering in the case of within class covariance normalization
0:08:42	but okay but cat
0:08:45	nearly no success with clustering in the case of the lda
0:08:50	which suggests that we had a is more susceptible for labeling errors
0:08:56	and the last remark we found out that length normalization does not help logistic regression
0:09:03	thank you
0:09:21	okay
0:09:31	just empirical results but maybe somebody s can comment that i don't know
0:09:40	nicole
0:09:46	with on the side
0:09:47	but we with at the same results as logistic regression icons otherwise
0:10:06	did you generate what we did a clustering or you just one clustering stage we
0:10:10	tried
0:10:11	different things also two
0:10:14	to iterate the clustering but didn't six it
0:10:26	this was also experiments clear sets of four thousand because we didn't the get then
0:10:32	you improvements by
0:10:35	changing

Incorporating Duration Information into I-Vector-Based Speaker Recognition Systems

Nist I-Vector Special Session

Bostjan Vesnicer, Jerneja Zganec-Gros, Simon Dobrisek and Vitomir Struc