Speech Transcript - PLDA based Speaker Recognition on Short Utterances

This is the outline of my presentation. For start, I leave the motivation and I

will discuss how we investigate PLDA techniques

section,

I will talk about what is PLDA and about speaker verification.

I will discuss about telephone speech only based PLDA system with this standard and

utternace evaluated development data conditions

Finally I will talk about how telephone and microphone speak evaluation system perform

evaluating data I've been discussing.

There are two factors

that are considered when deploying speaker verification in practical applications. First one is compensating channel

mismatch, with it we help improve the speaker verification.

Second one is the erasing of speech requirement for

for... it is really what affords users convenience. In this paper we will analyze the

short utterance

speech requrements. Previous researchers investigated some other techniques, such as Joint Factor Analysis, and SVM

and i-vectors with short utterances. However, no one has found any single system that solved

short utterance issue.

So, in this paper, we just

do some experiments with PLDA system, it's own preferences.

telephone only when PLDA system. Speaker verification PLDA total variability score joint factor analysis.

However, there has been a detailed investigation how this PLDA system perform with limited in

enrollment or development data conditions.

I get some questions when I started investigating PLDA system with development data conditions. First

one is how speaker verification performance improves when

when core normalization is trained with matched utterances.

Then, score normalization utterance length matched with evaluation utterance length. Second one is called speaker

verification performance improvement. PLDA is modelled with matched utterance length.

In this case also, PLDA modelling data utterance length, this might trick evaluation data.

In background sets, I will be talking about implementation of PLDA based speaker verification systems.

For i-vector feature extraction in JFA approach, it is

it is believed that channel space has some speaker information, which can be described.

We can use these to discriminate speech.

They have proposed i-vectors, that are based on one variability space, instead of separate spaces.

It has also found that original i-vector feature behaviour is heavy tailed and i-vector that

is similar in size, so it helps easily train the heavy tailed PLDA.

In varial GMM, super-vector M is modeled one variability space to reduce the dimensionality, which

is total variability space.

the total variability space training using the single process eigenvoice map training, that can be

one difference, all the recordings of given speakers consider difference the same.

Previously, we have talked about i-vector feature extraction, now we will want PLDA modeling.

PLDA generative model actually was proposed in face recognition and later it was adapted to

i-vector by Kenny.

This approach can be similar approach as approach, but the i-vectors are supervectors

The i-vecors are modeled to speaker part and channel part, there gaussian distribution can be

used to model. For our experiment we have

precision matrix is full rank and the removed eigenchannel part, U two; it helps reduce

the computational complexity.

investigated two types of PLDA, GPLDA and HTPLDA.

For Gaussian case, speaker vector,

standard normal distribution.

And residual factors,

having normal distribution and be zero matrix

the model parameters are converted using

mean eigenvoice and to example, semantic second using maximum likelihood and minimum divergence

Because of outer layers in the i-vector space the choice of Gaussian modeling is

is not optimal and

and this later investigated HTPLDA by Kevin.

The Student distribution for modeling speaker in HTPLDA, which is more than in GPLDA.

yes speaker x. it's one user is estimated to actually in scale space

also model parameters are using that be in the and

This is a simplified version of PLDA based speaker verification systems.

it consists of three pieces: development, enrollment and verification phase.

The development phase

eigenvoice matrix mean, matrix and estimated using Gaussian or heavy-tailed PLDA

In enrollment verification phase, target test i-vectors are extracted. The speaker and channel in various

are estimated in PLDA. Finally, score is calculated using Batch likelihood

ratio.

Scoring is calculated by batch likelihood ratio.

but we have investigated telephone speech only as PLDA system standard and then progressively investigated

on short utterance verification. After that, we have also investigated utterance development data.

finally we have investigated telephone and microphone speech PLDA system. It sorts out

to exploit all the speaker variability information.

such as pooled or concatenated approaches

i-vector features with dimensional five hundred extracted using variability UBM components, and twenty six MFCC

delta.

The UBM mostly used two tho thousand four telephone utterances. Telephone speech only PLDA system

speaker verification

total variability PLDA train using telephone utterances from

two thousand four, two thousand five and six and Switchboard

Telephone and microphone PLDA rate system, telephone and microphone utterances

two thousand four, two thousand five as well as Switchboard.

For telephone only PLDA, score normalization was in telephone utterances.

and from two thousand four and two thousand five

telephone and microphone

utterances, telephone and microphone utterances from two thousand two, two thousand fou thousand five and

two thousand six.

All the experiments were conducted using short two short three and ten second-ten second variation.

Short

utterances were obtained by targeting.

short two short three evaluation condition.

Truncation was twenty seconds of active speech was removed

from all utterances to avoid capturing similar introductory statement at cross multiple letters

telephone only speech PLDA system, speaker verification, that's total variability substace was trained on telephone

speech.

Telephone and microphone based PLDA system

McClaren investigated different type recorded successful and what is and they have non pooled

total variability approach that combined telephone and microphone speech. So in this table we have

investigated pool to find that whether that is

PLDA system action.

In I will be discussing of telephone speech only on this standard condition to see

how our system performs

compared the performance of GPLDA and HTPLDA with

s-norm on NIST 2010 those and ten second ten second twenty second isn't in this

as previously shown by Kenny, we have confirmed that HTPLDA improved more than GPLDA.

Similarly, s-norm improved the GPLDA system performance only.

And we will move on... short utterance investigations.

Previous studies have found that ...

Previous studies and our experiments have found that when sufficient speech is available then PLDA

achieved significant improvement. However, the robustness of PLDA with limited resources in enrolment and vefirication

is important issue that has not been investigated

previously.

For experiment we evaluated GPLDA and the HTPLDA system for truncated variation data as shown

by figure one and figure two.

HTPLDA continues to achieve better performance and than GPLDA for all trunks... truncated condition, although

that difference is not implemented

at the equal error rate. Overall, the research shows that when utterance decreases, performance degrades

at increasing rate rather then in proportion

How we move on short utterance development data conditions? In typical speaker verification the full

utterances are used for score normalization training. When speaker verification is performed with short utterance

evaluation data, we can call this

matching score normalization length and evaluation utterance then could provide an improvement.

To test this hypertheory speaker evaluator GPLDA and HTPLDA system with

short utterance and large utterance for normalization of data

We had to connect to compare performance of GPLDA and HTPLDA system.

Full length score normalization, matched length score normalization.

that matching, matched length score normalization improve equal error rate performance on both systems because

most of the all

truncated conditions.

But it doesn't descent normally in DCF length.

Short utterance develop a database PLDA modeling. Normal PLDA is modeled in full utterance length.

this can when PLDA is modeled with

matched utterance length it could provide an improvement, since evaluation i-vector distribution behaviour is matched

with development data i-vector distribution behaviour.

To test this hypothesis, posterior evaluated bought

evaluated GPLDA with full-length match utterance length.

The above chart compare the results of GPLDA system.

PLDA is trained full utterance, match utterance length.

Full match utterance length as previously explained, development data utterance length that match with variance

in utterance length. The chart suggest that when PLDA is modeled with matched length,

it achieves useful improvement over based upon full length based PLDA modeling.

As HTPLDA

we try to model HTPLDA with matched utterance length, but we have some

we were unable to train

We believe that the short utterance development i-vector distribution has less outliers.

than full length. So, we evaluated GPLDA with mixed length,

full length utterances and matched length utterances pooled together.

We can see from both charts that mixed length HTPLDA made improved performance

full length HTPLDA.

that PLDA approaches are investigated with only telephone speech as evaluation and development data conditions

So in this section, I will discuss over telephone and microphone based speech

PLDA system.

We have analyzed two different kinds of

total variability representations.

such as pooled and concatenated short utterance evaluation techniques.

The equal rate performance of pool and concatenated total variability approaches GPLDA and HTPLDA system.

It's only all the results were presented with applied.

In these pictures we can see that

pooled total variability approach provided an improved performance for both GPLDA and HTPLDA.

across all the utterance different

conditions.

When we specifically looked on

pooled total variability approach,

GPLDA and HTPLDA system, pooled total variability approach has achieved considerable improvement for concatenated total

variability approach.

However, the pooled total variability approach is GPLDA.

Enjoy the immediate improvement.

It's been previously discussed, PLDA system and short utterances.

We found from experiments that HTPLDA continued to achieve better performance than GPLDA short utterance

evaluation conditions.

The advantage of including short utterances in development data for score normalization and

PLDA modeling were also found.

Finally, we have investigated

telephone and microphone PLDA system with

different total variability solutions.

We have been working on length-normalized

i-vector features with GPLDA system

since it is more efficient that is HTPLDA.

and it's seen to provide the improvement

GPLDA system will be also analyzed

with short utterance

and development data.

we were trying to find the

full utterance i-vector feature and short utterance i-vector feature.

Yeah, it is.

Because of that it is

PLDA based Speaker Recognition on Short Utterances

SESSION 02: Speaker Recognition - Generative modeling

Ahilan Kanagasundaram