0:01:00 | This is the outline of my presentation. For start, I leave the motivation and I |
---|
0:01:06 | will discuss how we investigate PLDA techniques |
---|
0:01:12 | section, |
---|
0:01:13 | I will talk about what is PLDA and about speaker verification. |
---|
0:01:23 | I will discuss about telephone speech only based PLDA system with this standard and |
---|
0:01:28 | utternace evaluated development data conditions |
---|
0:01:31 | Finally I will talk about how telephone and microphone speak evaluation system perform |
---|
0:01:38 | evaluating data I've been discussing. |
---|
0:01:45 | There are two factors |
---|
0:01:46 | that are considered when deploying speaker verification in practical applications. First one is compensating channel |
---|
0:01:53 | mismatch, with it we help improve the speaker verification. |
---|
0:02:00 | Second one is the erasing of speech requirement for |
---|
0:02:07 | for... it is really what affords users convenience. In this paper we will analyze the |
---|
0:02:13 | short utterance |
---|
0:02:16 | speech requrements. Previous researchers investigated some other techniques, such as Joint Factor Analysis, and SVM |
---|
0:02:23 | and i-vectors with short utterances. However, no one has found any single system that solved |
---|
0:02:29 | short utterance issue. |
---|
0:02:31 | So, in this paper, we just |
---|
0:02:35 | do some experiments with PLDA system, it's own preferences. |
---|
0:02:43 | telephone only when PLDA system. Speaker verification PLDA total variability score joint factor analysis. |
---|
0:02:51 | However, there has been a detailed investigation how this PLDA system perform with limited in |
---|
0:02:55 | enrollment or development data conditions. |
---|
0:03:05 | I get some questions when I started investigating PLDA system with development data conditions. First |
---|
0:03:11 | one is how speaker verification performance improves when |
---|
0:03:15 | when core normalization is trained with matched utterances. |
---|
0:03:19 | Then, score normalization utterance length matched with evaluation utterance length. Second one is called speaker |
---|
0:03:25 | verification performance improvement. PLDA is modelled with matched utterance length. |
---|
0:03:30 | In this case also, PLDA modelling data utterance length, this might trick evaluation data. |
---|
0:03:40 | In background sets, I will be talking about implementation of PLDA based speaker verification systems. |
---|
0:03:47 | For i-vector feature extraction in JFA approach, it is |
---|
0:03:53 | it is believed that channel space has some speaker information, which can be described. |
---|
0:03:57 | We can use these to discriminate speech. |
---|
0:04:02 | They have proposed i-vectors, that are based on one variability space, instead of separate spaces. |
---|
0:04:09 | It has also found that original i-vector feature behaviour is heavy tailed and i-vector that |
---|
0:04:14 | is similar in size, so it helps easily train the heavy tailed PLDA. |
---|
0:04:23 | In varial GMM, super-vector M is modeled one variability space to reduce the dimensionality, which |
---|
0:04:29 | is total variability space. |
---|
0:04:33 | the total variability space training using the single process eigenvoice map training, that can be |
---|
0:04:39 | one difference, all the recordings of given speakers consider difference the same. |
---|
0:04:48 | Previously, we have talked about i-vector feature extraction, now we will want PLDA modeling. |
---|
0:04:57 | PLDA generative model actually was proposed in face recognition and later it was adapted to |
---|
0:05:02 | i-vector by Kenny. |
---|
0:05:07 | This approach can be similar approach as approach, but the i-vectors are supervectors |
---|
0:05:20 | The i-vecors are modeled to speaker part and channel part, there gaussian distribution can be |
---|
0:05:26 | used to model. For our experiment we have |
---|
0:05:29 | precision matrix is full rank and the removed eigenchannel part, U two; it helps reduce |
---|
0:05:34 | the computational complexity. |
---|
0:05:43 | investigated two types of PLDA, GPLDA and HTPLDA. |
---|
0:05:48 | For Gaussian case, speaker vector, |
---|
0:05:54 | standard normal distribution. |
---|
0:05:56 | And residual factors, |
---|
0:05:58 | having normal distribution and be zero matrix |
---|
0:06:05 | the model parameters are converted using |
---|
0:06:09 | mean eigenvoice and to example, semantic second using maximum likelihood and minimum divergence |
---|
0:06:18 | Because of outer layers in the i-vector space the choice of Gaussian modeling is |
---|
0:06:23 | is not optimal and |
---|
0:06:24 | and this later investigated HTPLDA by Kevin. |
---|
0:06:39 | The Student distribution for modeling speaker in HTPLDA, which is more than in GPLDA. |
---|
0:06:46 | yes speaker x. it's one user is estimated to actually in scale space |
---|
0:06:56 | also model parameters are using that be in the and |
---|
0:07:06 | This is a simplified version of PLDA based speaker verification systems. |
---|
0:07:11 | it consists of three pieces: development, enrollment and verification phase. |
---|
0:07:18 | The development phase |
---|
0:07:20 | eigenvoice matrix mean, matrix and estimated using Gaussian or heavy-tailed PLDA |
---|
0:07:27 | In enrollment verification phase, target test i-vectors are extracted. The speaker and channel in various |
---|
0:07:36 | are estimated in PLDA. Finally, score is calculated using Batch likelihood |
---|
0:07:42 | ratio. |
---|
0:07:50 | Scoring is calculated by batch likelihood ratio. |
---|
0:07:58 | but we have investigated telephone speech only as PLDA system standard and then progressively investigated |
---|
0:08:04 | on short utterance verification. After that, we have also investigated utterance development data. |
---|
0:08:12 | finally we have investigated telephone and microphone speech PLDA system. It sorts out |
---|
0:08:24 | to exploit all the speaker variability information. |
---|
0:08:28 | such as pooled or concatenated approaches |
---|
0:08:35 | i-vector features with dimensional five hundred extracted using variability UBM components, and twenty six MFCC |
---|
0:08:42 | delta. |
---|
0:08:45 | The UBM mostly used two tho thousand four telephone utterances. Telephone speech only PLDA system |
---|
0:08:51 | speaker verification |
---|
0:08:53 | total variability PLDA train using telephone utterances from |
---|
0:08:57 | two thousand four, two thousand five and six and Switchboard |
---|
0:09:02 | Telephone and microphone PLDA rate system, telephone and microphone utterances |
---|
0:09:09 | two thousand four, two thousand five as well as Switchboard. |
---|
0:09:18 | For telephone only PLDA, score normalization was in telephone utterances. |
---|
0:09:24 | and from two thousand four and two thousand five |
---|
0:09:26 | telephone and microphone |
---|
0:09:29 | utterances, telephone and microphone utterances from two thousand two, two thousand fou thousand five and |
---|
0:09:34 | two thousand six. |
---|
0:09:36 | All the experiments were conducted using short two short three and ten second-ten second variation. |
---|
0:09:42 | Short |
---|
0:09:43 | utterances were obtained by targeting. |
---|
0:09:45 | short two short three evaluation condition. |
---|
0:09:48 | Truncation was twenty seconds of active speech was removed |
---|
0:09:52 | from all utterances to avoid capturing similar introductory statement at cross multiple letters |
---|
0:10:04 | telephone only speech PLDA system, speaker verification, that's total variability substace was trained on telephone |
---|
0:10:10 | speech. |
---|
0:10:12 | Telephone and microphone based PLDA system |
---|
0:10:18 | McClaren investigated different type recorded successful and what is and they have non pooled |
---|
0:10:25 | total variability approach that combined telephone and microphone speech. So in this table we have |
---|
0:10:32 | investigated pool to find that whether that is |
---|
0:10:36 | PLDA system action. |
---|
0:10:47 | In I will be discussing of telephone speech only on this standard condition to see |
---|
0:10:53 | how our system performs |
---|
0:11:00 | compared the performance of GPLDA and HTPLDA with |
---|
0:11:05 | s-norm on NIST 2010 those and ten second ten second twenty second isn't in this |
---|
0:11:12 | as previously shown by Kenny, we have confirmed that HTPLDA improved more than GPLDA. |
---|
0:11:19 | Similarly, s-norm improved the GPLDA system performance only. |
---|
0:11:28 | And we will move on... short utterance investigations. |
---|
0:11:36 | Previous studies have found that ... |
---|
0:11:38 | Previous studies and our experiments have found that when sufficient speech is available then PLDA |
---|
0:11:43 | achieved significant improvement. However, the robustness of PLDA with limited resources in enrolment and vefirication |
---|
0:11:52 | is important issue that has not been investigated |
---|
0:11:57 | previously. |
---|
0:12:04 | For experiment we evaluated GPLDA and the HTPLDA system for truncated variation data as shown |
---|
0:12:10 | by figure one and figure two. |
---|
0:12:17 | HTPLDA continues to achieve better performance and than GPLDA for all trunks... truncated condition, although |
---|
0:12:23 | that difference is not implemented |
---|
0:12:29 | at the equal error rate. Overall, the research shows that when utterance decreases, performance degrades |
---|
0:12:37 | at increasing rate rather then in proportion |
---|
0:12:48 | How we move on short utterance development data conditions? In typical speaker verification the full |
---|
0:12:55 | utterances are used for score normalization training. When speaker verification is performed with short utterance |
---|
0:13:02 | evaluation data, we can call this |
---|
0:13:05 | matching score normalization length and evaluation utterance then could provide an improvement. |
---|
0:13:15 | To test this hypertheory speaker evaluator GPLDA and HTPLDA system with |
---|
0:13:20 | short utterance and large utterance for normalization of data |
---|
0:13:27 | We had to connect to compare performance of GPLDA and HTPLDA system. |
---|
0:13:34 | Full length score normalization, matched length score normalization. |
---|
0:13:41 | that matching, matched length score normalization improve equal error rate performance on both systems because |
---|
0:13:48 | most of the all |
---|
0:13:50 | truncated conditions. |
---|
0:13:54 | But it doesn't descent normally in DCF length. |
---|
0:14:11 | Short utterance develop a database PLDA modeling. Normal PLDA is modeled in full utterance length. |
---|
0:14:18 | this can when PLDA is modeled with |
---|
0:14:23 | matched utterance length it could provide an improvement, since evaluation i-vector distribution behaviour is matched |
---|
0:14:29 | with development data i-vector distribution behaviour. |
---|
0:14:38 | To test this hypothesis, posterior evaluated bought |
---|
0:14:44 | evaluated GPLDA with full-length match utterance length. |
---|
0:14:49 | The above chart compare the results of GPLDA system. |
---|
0:14:52 | PLDA is trained full utterance, match utterance length. |
---|
0:14:59 | Full match utterance length as previously explained, development data utterance length that match with variance |
---|
0:15:04 | in utterance length. The chart suggest that when PLDA is modeled with matched length, |
---|
0:15:09 | it achieves useful improvement over based upon full length based PLDA modeling. |
---|
0:15:21 | As HTPLDA |
---|
0:15:23 | we try to model HTPLDA with matched utterance length, but we have some |
---|
0:15:28 | we were unable to train |
---|
0:15:31 | We believe that the short utterance development i-vector distribution has less outliers. |
---|
0:15:37 | than full length. So, we evaluated GPLDA with mixed length, |
---|
0:15:44 | full length utterances and matched length utterances pooled together. |
---|
0:15:52 | We can see from both charts that mixed length HTPLDA made improved performance |
---|
0:15:59 | full length HTPLDA. |
---|
0:16:09 | that PLDA approaches are investigated with only telephone speech as evaluation and development data conditions |
---|
0:16:15 | So in this section, I will discuss over telephone and microphone based speech |
---|
0:16:21 | PLDA system. |
---|
0:16:27 | We have analyzed two different kinds of |
---|
0:16:31 | total variability representations. |
---|
0:16:33 | such as pooled and concatenated short utterance evaluation techniques. |
---|
0:16:41 | The equal rate performance of pool and concatenated total variability approaches GPLDA and HTPLDA system. |
---|
0:16:49 | It's only all the results were presented with applied. |
---|
0:16:59 | In these pictures we can see that |
---|
0:17:03 | pooled total variability approach provided an improved performance for both GPLDA and HTPLDA. |
---|
0:17:09 | across all the utterance different |
---|
0:17:12 | conditions. |
---|
0:17:21 | When we specifically looked on |
---|
0:17:24 | pooled total variability approach, |
---|
0:17:32 | GPLDA and HTPLDA system, pooled total variability approach has achieved considerable improvement for concatenated total |
---|
0:17:40 | variability approach. |
---|
0:17:41 | However, the pooled total variability approach is GPLDA. |
---|
0:17:44 | Enjoy the immediate improvement. |
---|
0:17:54 | It's been previously discussed, PLDA system and short utterances. |
---|
0:17:58 | We found from experiments that HTPLDA continued to achieve better performance than GPLDA short utterance |
---|
0:18:04 | evaluation conditions. |
---|
0:18:09 | The advantage of including short utterances in development data for score normalization and |
---|
0:18:15 | PLDA modeling were also found. |
---|
0:18:18 | Finally, we have investigated |
---|
0:18:21 | telephone and microphone PLDA system with |
---|
0:18:24 | different total variability solutions. |
---|
0:18:32 | We have been working on length-normalized |
---|
0:18:34 | i-vector features with GPLDA system |
---|
0:18:37 | since it is more efficient that is HTPLDA. |
---|
0:18:41 | and it's seen to provide the improvement |
---|
0:18:47 | GPLDA system will be also analyzed |
---|
0:18:49 | with short utterance |
---|
0:18:52 | and development data. |
---|
0:19:59 | we were trying to find the |
---|
0:20:01 | full utterance i-vector feature and short utterance i-vector feature. |
---|
0:20:09 | Yeah, it is. |
---|
0:20:16 | Because of that it is |
---|