0:00:29 | university espain speaker recognition |
---|
0:01:02 | i-vector speaker recognition |
---|
0:01:11 | PLDA |
---|
0:01:16 | to get the parameters of the PLDA, we need to do the point estimates of |
---|
0:01:23 | the parameters |
---|
0:01:24 | maximum likelihood supervise |
---|
0:01:30 | plenty of data |
---|
0:01:43 | development data from |
---|
0:02:04 | the PLDA considers i-vector decompose |
---|
0:02:22 | where the prior is Gaussian |
---|
0:02:34 | to use this model |
---|
0:02:41 | a large number of data |
---|
0:02:47 | if we don't have a large of data, we are forced to |
---|
0:02:54 | speaker vector |
---|
0:03:03 | where the prior for y is Gaussian |
---|
0:03:09 | Gaussian |
---|
0:03:14 | in this case we need less |
---|
0:03:24 | so if we have for example twenty |
---|
0:03:30 | a number of |
---|
0:03:36 | dimension of speaker vector ninety |
---|
0:03:44 | in the Bayesian approach |
---|
0:03:59 | for the parameters |
---|
0:04:04 | we are assumed they are |
---|
0:04:09 | priors |
---|
0:04:13 | on the model parameters |
---|
0:04:15 | and then we compute the posterior |
---|
0:04:20 | given the i-vectors and |
---|
0:04:25 | so |
---|
0:04:27 | methods |
---|
0:04:32 | compute the posterior |
---|
0:04:37 | prior |
---|
0:04:45 | in this case we compute the posterior |
---|
0:04:56 | from now on we call this prior |
---|
0:05:06 | and finally we take |
---|
0:05:13 | by computing their expected values given the target posterior |
---|
0:05:20 | to get the posterior of the model parameters |
---|
0:05:27 | solutions |
---|
0:05:31 | what we do is they compose |
---|
0:05:35 | assume model parameters |
---|
0:05:47 | then we compute in a cyclic fashion |
---|
0:05:57 | and finally we approximate |
---|
0:06:19 | is the number of speakers in the database |
---|
0:06:22 | and the posterior for the |
---|
0:06:25 | for the channels |
---|
0:06:29 | is the number of the segments in the |
---|
0:06:35 | then we can compute |
---|
0:06:38 | for the target data set |
---|
0:06:47 | from the original data set to the target data set |
---|
0:06:54 | we can compute the weight of the prior |
---|
0:06:59 | target data |
---|
0:07:01 | to do that we should modify the prior distribution |
---|
0:07:05 | the weight prior has dependent |
---|
0:07:10 | of the number of the speakers |
---|
0:07:13 | that we have in the last data set |
---|
0:07:19 | so we change the parameters |
---|
0:07:22 | we want to multiply the weight prior |
---|
0:07:29 | we have need to modify the alpha |
---|
0:07:31 | these two parameters |
---|
0:07:42 | but at the same time, they give the same expectation values for |
---|
0:07:49 | we can do the same with the prior of w |
---|
0:07:53 | and the finally |
---|
0:07:59 | for the number of speakers and the number of segments |
---|
0:08:03 | effective number of speakers and segments of the prior Gaussian |
---|
0:08:10 | we are going to compare out methods |
---|
0:08:14 | the normalization is |
---|
0:08:20 | that do centering and whitening |
---|
0:08:30 | to make more Gaussians |
---|
0:08:32 | fixing Gaussian |
---|
0:08:41 | unitary hypersphere |
---|
0:08:49 | to reduce the data set |
---|
0:08:56 | now I explain the data set |
---|
0:09:01 | data set |
---|
0:09:04 | this is |
---|
0:09:07 | data set we will use |
---|
0:09:13 | similar to the |
---|
0:09:18 | telephone channels |
---|
0:09:26 | that contains 30 male and 30 female |
---|
0:09:29 | data has the similar conditions |
---|
0:09:32 | conditions |
---|
0:09:40 | two to three minutes |
---|
0:09:52 | data set with large |
---|
0:09:55 | we use this five |
---|
0:10:04 | that contains more than five hundred males and seven hundred females |
---|
0:10:12 | and it has variety of channels |
---|
0:10:18 | speaker verification |
---|
0:10:24 | we got twenty MFCC's plus delta and |
---|
0:10:36 | we build the system |
---|
0:10:50 | we use the normalization too |
---|
0:10:53 | the parameters |
---|
0:11:02 | and finally we used s norm score normalization with cohorts from the |
---|
0:11:09 | first here |
---|
0:11:24 | we compare |
---|
0:11:34 | we can see improvement |
---|
0:11:50 | we can see that |
---|
0:11:58 | the prior distribution |
---|
0:12:01 | we compare for instance the first line and the last line equal error rate |
---|
0:12:07 | forty percent for males and fourteen percent for females for min d c f improvement |
---|
0:12:13 | of twelve percent for males and forty six percent for females |
---|
0:12:17 | here it is a table compare difference parameters |
---|
0:12:27 | we can see |
---|
0:12:31 | improvement |
---|
0:12:41 | here we show length normalization with s norm and without s norm |
---|
0:12:48 | when we use |
---|
0:12:57 | improvement using i-vector but not as much as |
---|
0:13:09 | we can see too that |
---|
0:13:11 | in this data set vector normalization |
---|
0:13:23 | better or |
---|
0:13:29 | here we show some improvements |
---|
0:14:03 | and for females |
---|
0:14:28 | finally |
---|
0:14:42 | we see that |
---|
0:14:49 | we can see that without normalization |
---|
0:14:58 | finally the conclusions we have developed a method to adapt a p l d a |
---|
0:15:03 | i-vector classifier from a domain with a large amount of development data to a domain |
---|
0:15:07 | with scarce development data |
---|
0:15:09 | we have conducted experiments |
---|
0:15:15 | we can see this technique improves the performance of the system |
---|
0:15:19 | and these improvement mainly comes from the adaptation of the channel matrix w |
---|
0:15:28 | we have compared this method with the length normalization |
---|
0:15:38 | we have better results |
---|
0:15:48 | we have discussed length normalization |
---|
0:15:51 | as future work Bayesian adaptation of the u b m and the i-vector extractor |
---|
0:16:22 | no the i-vector length means |
---|
0:16:31 | not the dimensional of the i-vector |
---|
0:17:40 | maybe we can do the same |
---|
0:17:45 | as we have more norm data |
---|