Speech Transcript - Bayesian Adaptation of PLDA Based Speaker Recognition to Domains with Scarce Development Data

0:00:29	university espain speaker recognition
0:01:02	i-vector speaker recognition
0:01:11	PLDA
0:01:16	to get the parameters of the PLDA, we need to do the point estimates of
0:01:23	the parameters
0:01:24	maximum likelihood supervise
0:01:30	plenty of data
0:01:43	development data from
0:02:04	the PLDA considers i-vector decompose
0:02:22	where the prior is Gaussian
0:02:34	to use this model
0:02:41	a large number of data
0:02:47	if we don't have a large of data, we are forced to
0:02:54	speaker vector
0:03:03	where the prior for y is Gaussian
0:03:09	Gaussian
0:03:14	in this case we need less
0:03:24	so if we have for example twenty
0:03:30	a number of
0:03:36	dimension of speaker vector ninety
0:03:44	in the Bayesian approach
0:03:59	for the parameters
0:04:04	we are assumed they are
0:04:09	priors
0:04:13	on the model parameters
0:04:15	and then we compute the posterior
0:04:20	given the i-vectors and
0:04:25	so
0:04:27	methods
0:04:32	compute the posterior
0:04:37	prior
0:04:45	in this case we compute the posterior
0:04:56	from now on we call this prior
0:05:06	and finally we take
0:05:13	by computing their expected values given the target posterior
0:05:20	to get the posterior of the model parameters
0:05:27	solutions
0:05:31	what we do is they compose
0:05:35	assume model parameters
0:05:47	then we compute in a cyclic fashion
0:05:57	and finally we approximate
0:06:19	is the number of speakers in the database
0:06:22	and the posterior for the
0:06:25	for the channels
0:06:29	is the number of the segments in the
0:06:35	then we can compute
0:06:38	for the target data set
0:06:47	from the original data set to the target data set
0:06:54	we can compute the weight of the prior
0:06:59	target data
0:07:01	to do that we should modify the prior distribution
0:07:05	the weight prior has dependent
0:07:10	of the number of the speakers
0:07:13	that we have in the last data set
0:07:19	so we change the parameters
0:07:22	we want to multiply the weight prior
0:07:29	we have need to modify the alpha
0:07:31	these two parameters
0:07:42	but at the same time, they give the same expectation values for
0:07:49	we can do the same with the prior of w
0:07:53	and the finally
0:07:59	for the number of speakers and the number of segments
0:08:03	effective number of speakers and segments of the prior Gaussian
0:08:10	we are going to compare out methods
0:08:14	the normalization is
0:08:20	that do centering and whitening
0:08:30	to make more Gaussians
0:08:32	fixing Gaussian
0:08:41	unitary hypersphere
0:08:49	to reduce the data set
0:08:56	now I explain the data set
0:09:01	data set
0:09:04	this is
0:09:07	data set we will use
0:09:13	similar to the
0:09:18	telephone channels
0:09:26	that contains 30 male and 30 female
0:09:29	data has the similar conditions
0:09:32	conditions
0:09:40	two to three minutes
0:09:52	data set with large
0:09:55	we use this five
0:10:04	that contains more than five hundred males and seven hundred females
0:10:12	and it has variety of channels
0:10:18	speaker verification
0:10:24	we got twenty MFCC's plus delta and
0:10:36	we build the system
0:10:50	we use the normalization too
0:10:53	the parameters
0:11:02	and finally we used s norm score normalization with cohorts from the
0:11:09	first here
0:11:24	we compare
0:11:34	we can see improvement
0:11:50	we can see that
0:11:58	the prior distribution
0:12:01	we compare for instance the first line and the last line equal error rate
0:12:07	forty percent for males and fourteen percent for females for min d c f improvement
0:12:13	of twelve percent for males and forty six percent for females
0:12:17	here it is a table compare difference parameters
0:12:27	we can see
0:12:31	improvement
0:12:41	here we show length normalization with s norm and without s norm
0:12:48	when we use
0:12:57	improvement using i-vector but not as much as
0:13:09	we can see too that
0:13:11	in this data set vector normalization
0:13:23	better or
0:13:29	here we show some improvements
0:14:03	and for females
0:14:28	finally
0:14:42	we see that
0:14:49	we can see that without normalization
0:14:58	finally the conclusions we have developed a method to adapt a p l d a
0:15:03	i-vector classifier from a domain with a large amount of development data to a domain
0:15:07	with scarce development data
0:15:09	we have conducted experiments
0:15:15	we can see this technique improves the performance of the system
0:15:19	and these improvement mainly comes from the adaptation of the channel matrix w
0:15:28	we have compared this method with the length normalization
0:15:38	we have better results
0:15:48	we have discussed length normalization
0:15:51	as future work Bayesian adaptation of the u b m and the i-vector extractor
0:16:22	no the i-vector length means
0:16:31	not the dimensional of the i-vector
0:17:40	maybe we can do the same
0:17:45	as we have more norm data

Bayesian Adaptation of PLDA Based Speaker Recognition to Domains with Scarce Development Data

SESSION 02: Speaker Recognition - Generative modeling

Jesus Villalba