0:01:00This is the outline of my presentation. For start, I leave the motivation and I
0:01:06will discuss how we investigate PLDA techniques
0:01:12section,
0:01:13I will talk about what is PLDA and about speaker verification.
0:01:23I will discuss about telephone speech only based PLDA system with this standard and
0:01:28utternace evaluated development data conditions
0:01:31Finally I will talk about how telephone and microphone speak evaluation system perform
0:01:38evaluating data I've been discussing.
0:01:45There are two factors
0:01:46that are considered when deploying speaker verification in practical applications. First one is compensating channel
0:01:53mismatch, with it we help improve the speaker verification.
0:02:00Second one is the erasing of speech requirement for
0:02:07for... it is really what affords users convenience. In this paper we will analyze the
0:02:13short utterance
0:02:16speech requrements. Previous researchers investigated some other techniques, such as Joint Factor Analysis, and SVM
0:02:23and i-vectors with short utterances. However, no one has found any single system that solved
0:02:29short utterance issue.
0:02:31So, in this paper, we just
0:02:35do some experiments with PLDA system, it's own preferences.
0:02:43telephone only when PLDA system. Speaker verification PLDA total variability score joint factor analysis.
0:02:51However, there has been a detailed investigation how this PLDA system perform with limited in
0:02:55enrollment or development data conditions.
0:03:05I get some questions when I started investigating PLDA system with development data conditions. First
0:03:11one is how speaker verification performance improves when
0:03:15when core normalization is trained with matched utterances.
0:03:19Then, score normalization utterance length matched with evaluation utterance length. Second one is called speaker
0:03:25verification performance improvement. PLDA is modelled with matched utterance length.
0:03:30In this case also, PLDA modelling data utterance length, this might trick evaluation data.
0:03:40In background sets, I will be talking about implementation of PLDA based speaker verification systems.
0:03:47For i-vector feature extraction in JFA approach, it is
0:03:53it is believed that channel space has some speaker information, which can be described.
0:03:57We can use these to discriminate speech.
0:04:02They have proposed i-vectors, that are based on one variability space, instead of separate spaces.
0:04:09It has also found that original i-vector feature behaviour is heavy tailed and i-vector that
0:04:14is similar in size, so it helps easily train the heavy tailed PLDA.
0:04:23In varial GMM, super-vector M is modeled one variability space to reduce the dimensionality, which
0:04:29is total variability space.
0:04:33the total variability space training using the single process eigenvoice map training, that can be
0:04:39one difference, all the recordings of given speakers consider difference the same.
0:04:48Previously, we have talked about i-vector feature extraction, now we will want PLDA modeling.
0:04:57PLDA generative model actually was proposed in face recognition and later it was adapted to
0:05:02i-vector by Kenny.
0:05:07This approach can be similar approach as approach, but the i-vectors are supervectors
0:05:20The i-vecors are modeled to speaker part and channel part, there gaussian distribution can be
0:05:26used to model. For our experiment we have
0:05:29precision matrix is full rank and the removed eigenchannel part, U two; it helps reduce
0:05:34the computational complexity.
0:05:43investigated two types of PLDA, GPLDA and HTPLDA.
0:05:48For Gaussian case, speaker vector,
0:05:54standard normal distribution.
0:05:56And residual factors,
0:05:58having normal distribution and be zero matrix
0:06:05the model parameters are converted using
0:06:09mean eigenvoice and to example, semantic second using maximum likelihood and minimum divergence
0:06:18Because of outer layers in the i-vector space the choice of Gaussian modeling is
0:06:23is not optimal and
0:06:24and this later investigated HTPLDA by Kevin.
0:06:39The Student distribution for modeling speaker in HTPLDA, which is more than in GPLDA.
0:06:46yes speaker x. it's one user is estimated to actually in scale space
0:06:56also model parameters are using that be in the and
0:07:06This is a simplified version of PLDA based speaker verification systems.
0:07:11it consists of three pieces: development, enrollment and verification phase.
0:07:18The development phase
0:07:20eigenvoice matrix mean, matrix and estimated using Gaussian or heavy-tailed PLDA
0:07:27In enrollment verification phase, target test i-vectors are extracted. The speaker and channel in various
0:07:36are estimated in PLDA. Finally, score is calculated using Batch likelihood
0:07:42ratio.
0:07:50Scoring is calculated by batch likelihood ratio.
0:07:58but we have investigated telephone speech only as PLDA system standard and then progressively investigated
0:08:04on short utterance verification. After that, we have also investigated utterance development data.
0:08:12finally we have investigated telephone and microphone speech PLDA system. It sorts out
0:08:24to exploit all the speaker variability information.
0:08:28such as pooled or concatenated approaches
0:08:35i-vector features with dimensional five hundred extracted using variability UBM components, and twenty six MFCC
0:08:42delta.
0:08:45The UBM mostly used two tho thousand four telephone utterances. Telephone speech only PLDA system
0:08:51speaker verification
0:08:53total variability PLDA train using telephone utterances from
0:08:57two thousand four, two thousand five and six and Switchboard
0:09:02Telephone and microphone PLDA rate system, telephone and microphone utterances
0:09:09two thousand four, two thousand five as well as Switchboard.
0:09:18For telephone only PLDA, score normalization was in telephone utterances.
0:09:24and from two thousand four and two thousand five
0:09:26telephone and microphone
0:09:29utterances, telephone and microphone utterances from two thousand two, two thousand fou thousand five and
0:09:34two thousand six.
0:09:36All the experiments were conducted using short two short three and ten second-ten second variation.
0:09:42Short
0:09:43utterances were obtained by targeting.
0:09:45short two short three evaluation condition.
0:09:48Truncation was twenty seconds of active speech was removed
0:09:52from all utterances to avoid capturing similar introductory statement at cross multiple letters
0:10:04telephone only speech PLDA system, speaker verification, that's total variability substace was trained on telephone
0:10:10speech.
0:10:12Telephone and microphone based PLDA system
0:10:18McClaren investigated different type recorded successful and what is and they have non pooled
0:10:25total variability approach that combined telephone and microphone speech. So in this table we have
0:10:32investigated pool to find that whether that is
0:10:36PLDA system action.
0:10:47In I will be discussing of telephone speech only on this standard condition to see
0:10:53how our system performs
0:11:00compared the performance of GPLDA and HTPLDA with
0:11:05s-norm on NIST 2010 those and ten second ten second twenty second isn't in this
0:11:12as previously shown by Kenny, we have confirmed that HTPLDA improved more than GPLDA.
0:11:19Similarly, s-norm improved the GPLDA system performance only.
0:11:28And we will move on... short utterance investigations.
0:11:36Previous studies have found that ...
0:11:38Previous studies and our experiments have found that when sufficient speech is available then PLDA
0:11:43achieved significant improvement. However, the robustness of PLDA with limited resources in enrolment and vefirication
0:11:52is important issue that has not been investigated
0:11:57previously.
0:12:04For experiment we evaluated GPLDA and the HTPLDA system for truncated variation data as shown
0:12:10by figure one and figure two.
0:12:17HTPLDA continues to achieve better performance and than GPLDA for all trunks... truncated condition, although
0:12:23that difference is not implemented
0:12:29at the equal error rate. Overall, the research shows that when utterance decreases, performance degrades
0:12:37at increasing rate rather then in proportion
0:12:48How we move on short utterance development data conditions? In typical speaker verification the full
0:12:55utterances are used for score normalization training. When speaker verification is performed with short utterance
0:13:02evaluation data, we can call this
0:13:05matching score normalization length and evaluation utterance then could provide an improvement.
0:13:15To test this hypertheory speaker evaluator GPLDA and HTPLDA system with
0:13:20short utterance and large utterance for normalization of data
0:13:27We had to connect to compare performance of GPLDA and HTPLDA system.
0:13:34Full length score normalization, matched length score normalization.
0:13:41that matching, matched length score normalization improve equal error rate performance on both systems because
0:13:48most of the all
0:13:50truncated conditions.
0:13:54But it doesn't descent normally in DCF length.
0:14:11Short utterance develop a database PLDA modeling. Normal PLDA is modeled in full utterance length.
0:14:18this can when PLDA is modeled with
0:14:23matched utterance length it could provide an improvement, since evaluation i-vector distribution behaviour is matched
0:14:29with development data i-vector distribution behaviour.
0:14:38To test this hypothesis, posterior evaluated bought
0:14:44evaluated GPLDA with full-length match utterance length.
0:14:49The above chart compare the results of GPLDA system.
0:14:52PLDA is trained full utterance, match utterance length.
0:14:59Full match utterance length as previously explained, development data utterance length that match with variance
0:15:04in utterance length. The chart suggest that when PLDA is modeled with matched length,
0:15:09it achieves useful improvement over based upon full length based PLDA modeling.
0:15:21As HTPLDA
0:15:23we try to model HTPLDA with matched utterance length, but we have some
0:15:28we were unable to train
0:15:31We believe that the short utterance development i-vector distribution has less outliers.
0:15:37than full length. So, we evaluated GPLDA with mixed length,
0:15:44full length utterances and matched length utterances pooled together.
0:15:52We can see from both charts that mixed length HTPLDA made improved performance
0:15:59full length HTPLDA.
0:16:09that PLDA approaches are investigated with only telephone speech as evaluation and development data conditions
0:16:15So in this section, I will discuss over telephone and microphone based speech
0:16:21PLDA system.
0:16:27We have analyzed two different kinds of
0:16:31total variability representations.
0:16:33such as pooled and concatenated short utterance evaluation techniques.
0:16:41The equal rate performance of pool and concatenated total variability approaches GPLDA and HTPLDA system.
0:16:49It's only all the results were presented with applied.
0:16:59In these pictures we can see that
0:17:03pooled total variability approach provided an improved performance for both GPLDA and HTPLDA.
0:17:09across all the utterance different
0:17:12conditions.
0:17:21When we specifically looked on
0:17:24pooled total variability approach,
0:17:32GPLDA and HTPLDA system, pooled total variability approach has achieved considerable improvement for concatenated total
0:17:40variability approach.
0:17:41However, the pooled total variability approach is GPLDA.
0:17:44Enjoy the immediate improvement.
0:17:54It's been previously discussed, PLDA system and short utterances.
0:17:58We found from experiments that HTPLDA continued to achieve better performance than GPLDA short utterance
0:18:04evaluation conditions.
0:18:09The advantage of including short utterances in development data for score normalization and
0:18:15PLDA modeling were also found.
0:18:18Finally, we have investigated
0:18:21telephone and microphone PLDA system with
0:18:24different total variability solutions.
0:18:32We have been working on length-normalized
0:18:34i-vector features with GPLDA system
0:18:37since it is more efficient that is HTPLDA.
0:18:41and it's seen to provide the improvement
0:18:47GPLDA system will be also analyzed
0:18:49with short utterance
0:18:52and development data.
0:19:59we were trying to find the
0:20:01full utterance i-vector feature and short utterance i-vector feature.
0:20:09Yeah, it is.
0:20:16Because of that it is