This is the outline of my presentation. For start, I leave the motivation and I
will discuss how we investigate PLDA techniques
section,
I will talk about what is PLDA and about speaker verification.
I will discuss about telephone speech only based PLDA system with this standard and
utternace evaluated development data conditions
Finally I will talk about how telephone and microphone speak evaluation system perform
evaluating data I've been discussing.
There are two factors
that are considered when deploying speaker verification in practical applications. First one is compensating channel
mismatch, with it we help improve the speaker verification.
Second one is the erasing of speech requirement for
for... it is really what affords users convenience. In this paper we will analyze the
short utterance
speech requrements. Previous researchers investigated some other techniques, such as Joint Factor Analysis, and SVM
and i-vectors with short utterances. However, no one has found any single system that solved
short utterance issue.
So, in this paper, we just
do some experiments with PLDA system, it's own preferences.
telephone only when PLDA system. Speaker verification PLDA total variability score joint factor analysis.
However, there has been a detailed investigation how this PLDA system perform with limited in
enrollment or development data conditions.
I get some questions when I started investigating PLDA system with development data conditions. First
one is how speaker verification performance improves when
when core normalization is trained with matched utterances.
Then, score normalization utterance length matched with evaluation utterance length. Second one is called speaker
verification performance improvement. PLDA is modelled with matched utterance length.
In this case also, PLDA modelling data utterance length, this might trick evaluation data.
In background sets, I will be talking about implementation of PLDA based speaker verification systems.
For i-vector feature extraction in JFA approach, it is
it is believed that channel space has some speaker information, which can be described.
We can use these to discriminate speech.
They have proposed i-vectors, that are based on one variability space, instead of separate spaces.
It has also found that original i-vector feature behaviour is heavy tailed and i-vector that
is similar in size, so it helps easily train the heavy tailed PLDA.
In varial GMM, super-vector M is modeled one variability space to reduce the dimensionality, which
is total variability space.
the total variability space training using the single process eigenvoice map training, that can be
one difference, all the recordings of given speakers consider difference the same.
Previously, we have talked about i-vector feature extraction, now we will want PLDA modeling.
PLDA generative model actually was proposed in face recognition and later it was adapted to
i-vector by Kenny.
This approach can be similar approach as approach, but the i-vectors are supervectors
The i-vecors are modeled to speaker part and channel part, there gaussian distribution can be
used to model. For our experiment we have
precision matrix is full rank and the removed eigenchannel part, U two; it helps reduce
the computational complexity.
investigated two types of PLDA, GPLDA and HTPLDA.
For Gaussian case, speaker vector,
standard normal distribution.
And residual factors,
having normal distribution and be zero matrix
the model parameters are converted using
mean eigenvoice and to example, semantic second using maximum likelihood and minimum divergence
Because of outer layers in the i-vector space the choice of Gaussian modeling is
is not optimal and
and this later investigated HTPLDA by Kevin.
The Student distribution for modeling speaker in HTPLDA, which is more than in GPLDA.
yes speaker x. it's one user is estimated to actually in scale space
also model parameters are using that be in the and
This is a simplified version of PLDA based speaker verification systems.
it consists of three pieces: development, enrollment and verification phase.
The development phase
eigenvoice matrix mean, matrix and estimated using Gaussian or heavy-tailed PLDA
In enrollment verification phase, target test i-vectors are extracted. The speaker and channel in various
are estimated in PLDA. Finally, score is calculated using Batch likelihood
ratio.
Scoring is calculated by batch likelihood ratio.
but we have investigated telephone speech only as PLDA system standard and then progressively investigated
on short utterance verification. After that, we have also investigated utterance development data.
finally we have investigated telephone and microphone speech PLDA system. It sorts out
to exploit all the speaker variability information.
such as pooled or concatenated approaches
i-vector features with dimensional five hundred extracted using variability UBM components, and twenty six MFCC
delta.
The UBM mostly used two tho thousand four telephone utterances. Telephone speech only PLDA system
speaker verification
total variability PLDA train using telephone utterances from
two thousand four, two thousand five and six and Switchboard
Telephone and microphone PLDA rate system, telephone and microphone utterances
two thousand four, two thousand five as well as Switchboard.
For telephone only PLDA, score normalization was in telephone utterances.
and from two thousand four and two thousand five
telephone and microphone
utterances, telephone and microphone utterances from two thousand two, two thousand fou thousand five and
two thousand six.
All the experiments were conducted using short two short three and ten second-ten second variation.
Short
utterances were obtained by targeting.
short two short three evaluation condition.
Truncation was twenty seconds of active speech was removed
from all utterances to avoid capturing similar introductory statement at cross multiple letters
telephone only speech PLDA system, speaker verification, that's total variability substace was trained on telephone
speech.
Telephone and microphone based PLDA system
McClaren investigated different type recorded successful and what is and they have non pooled
total variability approach that combined telephone and microphone speech. So in this table we have
investigated pool to find that whether that is
PLDA system action.
In I will be discussing of telephone speech only on this standard condition to see
how our system performs
compared the performance of GPLDA and HTPLDA with
s-norm on NIST 2010 those and ten second ten second twenty second isn't in this
as previously shown by Kenny, we have confirmed that HTPLDA improved more than GPLDA.
Similarly, s-norm improved the GPLDA system performance only.
And we will move on... short utterance investigations.
Previous studies have found that ...
Previous studies and our experiments have found that when sufficient speech is available then PLDA
achieved significant improvement. However, the robustness of PLDA with limited resources in enrolment and vefirication
is important issue that has not been investigated
previously.
For experiment we evaluated GPLDA and the HTPLDA system for truncated variation data as shown
by figure one and figure two.
HTPLDA continues to achieve better performance and than GPLDA for all trunks... truncated condition, although
that difference is not implemented
at the equal error rate. Overall, the research shows that when utterance decreases, performance degrades
at increasing rate rather then in proportion
How we move on short utterance development data conditions? In typical speaker verification the full
utterances are used for score normalization training. When speaker verification is performed with short utterance
evaluation data, we can call this
matching score normalization length and evaluation utterance then could provide an improvement.
To test this hypertheory speaker evaluator GPLDA and HTPLDA system with
short utterance and large utterance for normalization of data
We had to connect to compare performance of GPLDA and HTPLDA system.
Full length score normalization, matched length score normalization.
that matching, matched length score normalization improve equal error rate performance on both systems because
most of the all
truncated conditions.
But it doesn't descent normally in DCF length.
Short utterance develop a database PLDA modeling. Normal PLDA is modeled in full utterance length.
this can when PLDA is modeled with
matched utterance length it could provide an improvement, since evaluation i-vector distribution behaviour is matched
with development data i-vector distribution behaviour.
To test this hypothesis, posterior evaluated bought
evaluated GPLDA with full-length match utterance length.
The above chart compare the results of GPLDA system.
PLDA is trained full utterance, match utterance length.
Full match utterance length as previously explained, development data utterance length that match with variance
in utterance length. The chart suggest that when PLDA is modeled with matched length,
it achieves useful improvement over based upon full length based PLDA modeling.
As HTPLDA
we try to model HTPLDA with matched utterance length, but we have some
we were unable to train
We believe that the short utterance development i-vector distribution has less outliers.
than full length. So, we evaluated GPLDA with mixed length,
full length utterances and matched length utterances pooled together.
We can see from both charts that mixed length HTPLDA made improved performance
full length HTPLDA.
that PLDA approaches are investigated with only telephone speech as evaluation and development data conditions
So in this section, I will discuss over telephone and microphone based speech
PLDA system.
We have analyzed two different kinds of
total variability representations.
such as pooled and concatenated short utterance evaluation techniques.
The equal rate performance of pool and concatenated total variability approaches GPLDA and HTPLDA system.
It's only all the results were presented with applied.
In these pictures we can see that
pooled total variability approach provided an improved performance for both GPLDA and HTPLDA.
across all the utterance different
conditions.
When we specifically looked on
pooled total variability approach,
GPLDA and HTPLDA system, pooled total variability approach has achieved considerable improvement for concatenated total
variability approach.
However, the pooled total variability approach is GPLDA.
Enjoy the immediate improvement.
It's been previously discussed, PLDA system and short utterances.
We found from experiments that HTPLDA continued to achieve better performance than GPLDA short utterance
evaluation conditions.
The advantage of including short utterances in development data for score normalization and
PLDA modeling were also found.
Finally, we have investigated
telephone and microphone PLDA system with
different total variability solutions.
We have been working on length-normalized
i-vector features with GPLDA system
since it is more efficient that is HTPLDA.
and it's seen to provide the improvement
GPLDA system will be also analyzed
with short utterance
and development data.
we were trying to find the
full utterance i-vector feature and short utterance i-vector feature.
Yeah, it is.
Because of that it is