okay so might don't women both generative better ways model for speaker recognition
i was some of you may know had been working quite not dealing with what
some sucks the sling discriminative models
for i-vector our classification and in particular i've been working
mostly with a discriminative models able to directly classify but also i-vectors that is i-vector
trials directly as belonging to same speaker or different speaker classes
this discriminative models will first introduced as a way to discriminatively trained p lda parameters
and then have all
when we get then we get some explanations some interpretation of this model sells discriminative
more training all model parameters for a second order taylor expansion of a log-likelihood ratio
so i've been working mostly in trials place here the idea was to go back
from discriminative to denote the but remaining target space so the question was
whether would it be possible to better to train a generative model and trial space
and how well would it behave
does out that it's very easy to do it
in practice and it works pretty my well
i would say was more or less like all the other states of the art
models
so in this talk a we show you how
we define these model which is a very easy model which
employs two gaussian distributions to model trials and then why we show the relationship of
this model with p lda and
the discriminately plp am pair-wise svm approach
and then i will also show how this model can be very easily extended to
handle more
complicated distributions in particular i will work with
heavy tailed distributions follow in the work from but the canny about a bit lp
lda
so to eigenspace
so actually to the final tire we take two i-vectors we stick then two k
we stuck them together and we get our definition of trial
here i have a couple of pictures we show what would happen if we were
working in with one dimensional i-vectors so on the
left here i have i've a one dimensional i-vectors which of the black dots and
on the right then taking all cross pairs of i-vectors
we can see you have that there is a nowhere the final region where
i-vectors belonging to the same speaker are
and
and which is quite well separated from the region where the i-vectors coming from the
where per person coming from different regions are
so overweight the discriminative training we try to discriminatively trained so fail surveys to separate
is the region
and now i'm going to try to build a generative model to describe
these two sets of points
so the easiest generative model we can think all okay we have two class
problem so it's a binary problem we can assume that
the trials are what buttons and that they can be modeled by question distributions
so we would have a gaussian distribution describing
the
trials which belongs to the same speaker class
and the flyers which belong to the different speaker class
each of them would have their its own parameters
and for symmetries on i with a will assume that the mean of the two
distributions is the same
so
reasoning about
so the symmetry of the target that is
if we take a pair of i-vectors we can stick them in two ways we
can take enrollment and test force or vice versa but we don't want to give
any
any particular altogether vectors so we want
generative models which treats
both version of the trial in the same way
this imposes some constraints on a war one ances matrices which are
sorry described here that's actually
we have this to make this is which this would be the same as well
as these two and the same for the other distribution
in practice when working with the a all pairs from a single i-vector dataset we
don't even need to impose this selection because it that arises naturally during the training
so how can we trained these weights
use
just the simple thing we can think of we did it by maximum likelihood then
we did not assuming that i-vector priors are independent
of course i-vector trials are not independent because they are all bands that we can
built from a single i-vector set
however in practice these does not really affect our results even though the assumption is
very not curate
so this is a representation of what would happen
if we were working one dimensional space so i'll assuming that the mean is zero
for the two distribution which is
essentially what we with the recovery if we center i-vectors we would end up with
our look like a racial which is just the racial between two gaussian distributions which
is up with a tick for mean the i-vectors per in the i-vector trial space
you can see two plots of two different no syntactic the one they may show
synthetic i-vectors whether you can see the
a level some the log-likelihood ratio a as a function of the trial
and you cannot is that essentially we have separating with quadratic surfaces the same speaker
area which is the
this diagonal from the rest of the
of the
points
so
this involves nice we show you the results in a moment but force the one
to show you the relationship between this model and
the other state-of-the-art approach is like be lda in the discriminative be lda
so this is the classical p lda approach the simplified version where we have full
around
channel factors merge will together with the residual noise
and we have a subspace for speaker for the speaker space
so if we think this model and try to jointly modeled the distribution of apparel
i-vectors were we
can consider separately the case the when the two i-vectors of from the same speaker
then
when they are from different speakers in the first
case we would have that the speaker variable
for the
latent variable for the speaker would be shared so we would have only one speaker
and we would that this expression for the jaw
for the trial
while in the case of different speaker trial we would have one different speaker latent
variable for each of the two i-vectors
now with the standard lda all these but it wasn't question distribute this so we
can integrates over the speaker
latent variables and if we integrate
it would end up with a distribution for same-speaker pairs and different speaker pairs which
is like going ocean
and which has this form so again we see that it does not share mean
and to go one else matters is which
looks very similar which have that very similar structure to what i was showing before
so i in practice p lda here is a what is telling these it's telling
us that the p lda is estimating
and model one which is coherent with our assumption we want that want to go
shown model assumptions
and the spatially difference from our model just in the
objective function that is optimized here we are optimising for i-vector like to the while
in our two gaussian model real optimising for trial likelihood
so again for the where and are we
goal
when we compute look like a racial we end up with very similar separation surface
is allows our two gaussian model in one this one dimensional space i-vector space
and we will see that
this also reflects in the real i-vector space that since the to model performs pretty
much the same
so going to the
relationship with the discriminative approach
this is the scoring function we were used for the pairwise svm
so we have assumed the this was the scoring function which
corresponds which is a scoring function we used to compute the loss of the of
the svm from
and it's going function is actually formally equivalent to the
score look like a racial function we've seen for our to go some model
and of course this is also equivalent to the plp a scoring function as it
was forced to the right from
that approach
horace all we can think about the svm as a way to discriminative train these
matrix which
which if we think about it in the two gaussian model is nothing as than
the difference between the procedure might this is of the two distribution
so i can we have a mother which is also the
same kind of separation of star feces
and the gain the only difference is the objective function we are optimising
so to see some results about this first part
okay desire was done on nist two thousand on the ten telephone condition
and i'm comparing essentially p lda with this
to go some model
so the first line a first one p lda without dimensionality reduction which is also
known as two covariance model and
spatially here it means that i'm taking full around
speaker space
and both case design doing length normalization and is the two lines of the results
of the plp a wood flooring speaker space and the two gaussian model trained by
maximum likelihood in the i-vector space in the trial space
and as you can see they perform pretty much the same
while a well of course to go two covariance model is for us to train
this logo some model is even faster than the test they the same
the same requirement computational requirements
the problem is when we moved to r p lda with
an overall speaker with
and low rank speaker subspace in this case values one on the twenty dimensional speaker
subspace what i-vector were four hundred dimensional
we cannot directly apply this
the dimensionality reduction onto the two gaussian model so we
and we
replaced it by are dimensionality reduction down by lda projection
and that's good enough so here we have p lda with the radius of speaker
subspace and two covariance model well the
the dimensionality reduction is done by lda they perform
i would say the same
and then in these
reduced one on the domain and twenty dimensional i-vector space we trained our
go show model on trials and it performs again pretty much the same as the
p lda model
for compare is on these are the results we had with the discriminative model
the difference between all these models the discriminative model didn't required
length normalization
so this means that we can
do are generative model in trial space it's very easy to do actually and it
works very well so let's see i if we can
make things a little more complicated than how do i becomes training and testing so
to complicate things we
took
we did something similar to what about the can indeed with this a bit lp
lda we said okay let's replace
i one gaussian distributions with
t distribution and see what happens
so it does all that training can still be done or using an em algorithm
although it's not that fast becomes more or less the same computational expensive as the
discriminative approach
but the good thing is that in test we can perform close
sorry
we can use closed-form integration
and sour look like a racial becomes simply the racial between two students this is
distributions
so a testing time this thing is well as fast as
be lda or the to go some more the well you i've shown before
how the soul
i said all these yes okay as with a with lp lda we don't need
length normalization if we use these heavy tailed distributions
of course the separation surfaces are slightly more complicated complex because we don't ever anymore
quadratic separation of sources is but
we have this kind of
scenes
and for the results
what happens is that we managed to get more or less the same results of
the go show model without bits
for length normalization which is
i would say aligned with
the finding about
p lda
or again this model is
and what's different between the with p lda is that is model is more expensive
in training button testing is us fossils all the others
so
to summarise what we get here
we get that we can use a very simple question classifier to in the target
space which can be very easily trained then
despite the
does we use incorrectly
make incorrect assumption about via independence is still work very well
and it turns out that is more that is quite easy to extend to handle
more complicated distributions
so while with p lda for example just about to the heavy tailed
the distribution it becomes
very difficult to train the model and test the model we can is the use
for example for the students these solutions without
almost any hassle
saw from here we hope to be able to find some better way to model
i a trial distribution on the in a trial space which will still allow us
to have
fast solution for scoring without incurring in
too big
problems for training
and that was like that's
the first question
the reason freedom in that i think that case
yes
a i don't remember exactly but it was or something like five six
i maybe in something like that
i remember the are we had that all you in the war should but they
had a bug then
when it was work
then a fixed
speech yes telephone speech rather than microphone
will just telephone i didn't trial microphone well i tried something on microphone rates
what can slightly worse than p lda but it's not that different anyway i didn't
write the retail version yet
i think it might run into problems without length normalization
that was my expert
i didn't really tried to maybe ten one on a the microphone data
i have a common
which may be standard and had to
i source and are used em algorithm to estimate that the heavy tailed parameters
and
for example in the paper that are presented on monday i was using at t
distribution in score space
and within em algorithm to
they help me to estimate the parameters and i found
that
didn't i would generate synthetic data where i knew what that degrees-of-freedom would be
and then i tried to recover that
using an em algorithm and that is very frustrating i just
good navigate recover
the same
degrees-of-freedom and then i switched from using an em algorithm to using that the wreck
optimisation i think it is b s g s
of all that
of the likelihood
and that is much better to recovering the that degrees-of-freedom
okay for a while so that for this synthetic models here i was generating then
with the retail distribution and that was getting more or less the same estimates
for this but i'd similar problem when i was assigned to
do some things you know to what you did for calibration with
like non gaussian distribution but skewed distribution and those kind of things and they realise
that
em there was not that would i was doing it numerically and was working but
so maybe i was lucky with the two distributions
i think to combine that that's really heavy-tail then that let's but if it's not
that doesn't so probably the degrees-of-freedom is allow you can recover it but if it's
like be around ten or twenty then you can't recovered anymore
one question
the speaker again