all right nice the so i'm going to propose something that we have been the
what knowing actually during the last
see last p washable hopkinson trying to explore
you've these any information useful in the gmm weights because of the i-vectors are or
probably no is only at the
try to adapt the means
so
so as you probably all down now that the i-vectors is related to adapting the
means for handies be have been
very well applied for speaker language dialect and different all applications
so and the story
behind only adapting them each is
going back to the
gmm amount but the gmm map adaptation with the ubm universal background model as basis
so undefined it only means can be adapted to what whenever try to revise the
fact that maybe what the i-vectors it only older useful information
in the we trained the weights or even in the variance
probably patrick well a already tried with the variance for jfa
but
so we hear when this work we try to
to do something with the weight so huh the having a lot of a peak
i technique set of inoperable proposed for the weights
and
and we have tried will build a new one called non-negative factor analysis what was
actually has and who was the students and belgian one he was busy male the
mighty and we first tried foreign language id and which actually we have some success
with it
and the reason was that you know for language at some time of me haven't
ubm and you use for that you're portions are kind of phonemes supposedly so if
some from language this phoneme that not appearing so turning dams how can also zero
or even the weights of this goal cushions can be useful information and that's what
we
we found out and that's what actually motivated construct four speakers there's any information that
can be also used for speaker from the gmm weights
and this is ultimately the topics of this work
and we also compared to switch to this non-negative factor analysis and fa two
already existing techniques that it was proposed in but we should subspace a multinomial model
and over the speed this that the this presentation is kind of comprising between the
two for in the case of gmm weight adaptation
so
it's for forty adapting the gmm weights i have been a lot of a peak
techniques already applied maximum a posteriori maximum likelihood any many recreational are
eigenvoice wishes they
the starting point of all the new technology not jfa and i-vector
and they were also a lot of
weight adaptation techniques they're like for example maximum likelihood nonnegative matrix factorization and multinomial subspace
model
and then you wanted we propose non-negative factor analyzers
so
sell the idea behind this example the i-vector concept i don't want to bore you
with this
is you know you say that for a given utterance there is an ubm which
is not a prior of all the sounds how the sauce look like and the
i-vectors try to model all the shift for this ubm to a given recording you
can be
can be require a model by a low dimensional
us matrix representation
and the coordinates of this
recording in this space we call that i-vectors
so we tried to use the same concepts actually for doing though the means and
all that this fourteen sorry for doing that we the weights
and the only difference that we were facing is that the way it should be
all positive and they sum to one so
i can explain that later so in order to do the weights now so the
first thing so you when you when you when you have we have one ubm
for example universal background model
and you have a sequence of features you can compute some counts which is the
posterior distribution of occupational fish a portion given the frame cell which given here in
equation
so the objective function in the weights is kind of be of this kind of
le
it's work
this callback liberty versions which is kind of trying to maximize
the cover different versions between the because the that's a redeeming about the cover different
prices between the counts
and of the weights that you want to model
and so if you want if you get the data discounts and you don't normalise
in with the land of
you're for eight euros sentence you get the maximum likelihood estimation
for the weight which is easy to do so for example the first two well
technique that we propose a bit unfortunately we can compare would it for this for
this paper is more negative matrix factorization so we suppose you have a weights and
you say that okay this which can be
split into negative nonnegative sorry
mattresses
and the first one is gonna be the basis of your space and the second
one he's the cover the coordinate of this
and this the composition can be
a
sorry
can be a the of to find to optimize
the auxiliary function
okay so this is the fact that and forty two in have time to do
comparison with it
but we did what we did we compare with this subspace model because always would
look actually that i
so we try to compare with a two
so they behind and what this subsystems to model this is you have that the
concise and of accounts here
and you try to find a multinomial distribution
that fit
this
this distribution
and this can be defined by saluted much there are i-vector space this is the
ubm weights
and this is normalized
but to get the that the weights sum to one
and it had they have splits over papers in that how to do the optimization
have some haitian
solution for that
so for example suppose you have for example for the s m suppose you have
to go options
and the in band for each point here is the maximal likely to put that
is a maximum likelihood estimation of the weights for a given recording okay
and for this example which we see with the actually this point are generated for
and suppress model don't the subspace multimodal distribution
so we generate from this model
because the time of belief that for low dim for high dimensional space
you that the detector should be distributed like to not over like because if you
take a lot of data and it right only to go options the data would
be everywhere
but if and high dimensional space
i would try to simulate that and is it and to find that
you know tried to simulate high dimensional gmm intrigue oceans
quite case and this is kind of that's what we did so we they you
to look at low and other people but did
so we generate a data from this model
and we shall we what's difference between this
this model and the non-negative factor analyses
so we non-negative factor analysis actually what would say let's say which is the same
as the i-vectors we suppose that we haven't ubm
and issue recording which the weight for each recording can be explained by a shift
just t v in the direction of the data
and this
so the same as the i-vector sell this can be a low rank and are
is a new i-vectors in this new space
so the only problem with this we had were facing is that
the weights for each recording should be always positive initial sum to one
so here we have we develop some kind of an em like
so in of so we have an like an
we first a big air
we get some statistics for this here
we get some a to the gods in the a sound
to estimate the air
and then
when we obtain the l
we do and projected crow project projected the gradient ascent
what the projection metric that we used try to
a given as the constraint that they should always sum to one and they should
always be positive
and that's what we actually did if you want to have more explanation
i don't have time for that and here i can find that so
remember this is the several account this is the auxiliary function for the lack of
for the for the gmm weights case and with this is our weight and we
would like to estimate is to
parameters
which subjected that they should sum to one so what we did we just
assume that the g is a one vector of one so they should just one
should sum to one
and they all we should be positive
okay
so this is a to constrain that a low us to keep that the weights
should be something to one and they should be opposite so indicates for example if
you compare between what the gmm what the non-negative factor as a the when compared
to what subspace model to model
and what you know muir model is doing so
so for this case for example
differently the s m is different refitting well the data
because it was generated from it
but the i-vector the anything would choose an approximation of the data so it has
the benefit it has a disadvantage to because the been if it is in the
case of it and s m has a behavior to overfit of the data because
he we really model well the distribution of the training data but twenty go to
the lid task
sometime and in generalize well
so as to what but did they have the user is a regularization
to try to control this over fitting
so they have an orgasm regularization term that you to one when you're dead
in order to do that so for our case we are so we are not
suffering too much from this the good from this the we don't fit to may
very well the training data
but we approximate one generalize sometime better
sometime then that's mm but is that then of that application to be honest we
compare that for several application is sometimes the one is a bit another sometime the
opposite
but anyway so the difference is this one is like some this as a man
can fit really well the data
the training data but can have problem of overfitting we need to control with regularization
or and the n f a approximate optional the data
and you will sometimes generalize better
so this is the approximate the experimented with this so we have actually train and
i-vectors first and all that the data that we have
and which would test it actually in telephone condition of nist two thousand ten
and we have ubm of
two thousand forty eight this is not more technical things so we haven't i-vectors of
extend read we use the lda let normalisation p lda scheme that but use
and then we ask which so we try to use an i-vector for the means
and an i-vector from
the weights from s m and four and fa
and we tried to do fusion how we can combine them for example just a
simple fusion so we did score fusion
didn't help
allow so we just so key for get it would be some i-vectors fusion
it seems to be
a little bit of the better but not too much for speaker that's what little
disappointed
but for a language id actually with helping a lot
so i for example i two can affect for example i try to see how
the dimensionality of and then the day this new weight adaptation a compared to for
example the i-vectors
so i took and i don't wanna get factor analysis to train five hundred
thousand
and one by one power
one thousand five hundred so we remember that the starting ubm was two thousand forty
eight
so i and the this one's the lda first do much reduction before you length
normalization
and you see that
it's not really do is to the difference of not really big by the one
by varying the data dimension d for lda
and even
if you compare between five hundred thousand data as the difference is not really big
so we were a little bit surprised especially for and fa which we seen the
same behaviour for s m as well
and
but sometimes they just send them is you need to be more low dimensional compared
to all
wanna get the factorized as non-negative factor try to be more high dimension compared to
the other one
so here for example you i we compare the best result that we obtained from
a negative factor analysis
compared to one for the
subsist multi model
and for the core condition of male
and female and eight conversations so we can see that actually that's not really too
much difference
some time and if a listening that sometime is less but better than
then an s m
and
and the but you see that for the conversation you know you can get very
nice improvement over a nice result even without using the gmm weights is the mean
they're just t weights
no if you compared with the i-vectors
so the i-vectors is i don't so i the says a lot the maximum likelihood
of the weights so we should talk about the
the maximum likelihood of the weights with of the log and feed that to lda
maybe it's not the best way to do
so maybe you can do something clever so it seems that to with a local
the women selected was worse
compared to s m and
and the weight for all the condition eight conversation male female and core condition as
well
so now we remove the maximum likelihood from the loop
and we put the i-vectors here so we can see that
usually the i-vectors is twice better
this year we can do you get the can i-vectors other than the weight vectors
any divide by traffic to get the i-vectors so
so the
so the i-vectors is it differently much better than the weights
and
let's not too much but if you go to do eight conversation
it's actually pretty cool "'cause" the correct is a very low
so even for the long for when you have a lack of a lot a
more recording from the speaker
the weights can also give your
almost useful information that need
the i-vector can give you
so that was of the source surprising for that of reason
so here what we took sector this will have and you the minimal dcf only
the c of liquids doesn't a this doesn't and is the great you have the
baseline with you the i-vector
for female and male
so one would you the i-vectors with the weights would use the
i-vector fusion here
so this is an f when you're ready to and if we if we added
and fa will win little bit here
we use an acre eight looks but here but not rate too much
but at one for example for female when we do this when you fuse we
s m we get muscular but again for new dcf
you know operating point and even the correct
so for f e m s m was the best
diffuse with
for male you know we can see that the and if a was much better
for all this but not really in medium new minimal dcf
so it was not really
exciting to tour of fusion to be honest it was loaded with improvement of really
locked compared what was seen for language id
so here since the i-vectors is an awful related to the dimensionality of the supervectors
so we cannot go right to increase the ubm sizes for the gmm weights
the dimensionality is kind of related to how many courses you have so
we have
with tried to say okay well let's try to increase and decrease the ubm sides
and see what happened with the with the
we did not example here we tried only get a factor as for the only
one
so if we can see that for example if we increase the
the portions in the ubm size you get a very nice improvement in the for
both men and female
especially if only maybe
and you mindcf so
so here vince the i-vector that the weight is not ready to the size of
the supervector as you can increase
the
the amount of the
or of the
of the portions in the so in the ubm size so you can you can
even think about using
a speech recognizer and try it some if you want
so i'll so what we did here is actually took the baseline as well
which i
sorry
we took the i storage notably i-vectors
that is all i would try to fuse it with the
the i-vectors from different
ubm size
and he can see that for example of is a kind of are not really
kind of conclusions
over true even you increase even here for example we get
well sorry yes
if and you get better results with two thousand forty thousand questions
diffusions for example for female didn't help too much
to be honest was actually words
and for female form a was a little bit so as well
political court and would do
it doesn't mean that you get better results in the weights will happen would if
use only i-vectors as they could the question
so as a conclusion here we try to a
use the weights and try to think if it is worth a little better way
of and using the weights and updating as well
not only the means which is the
what the i-vector is doing
and so we will we seen some slight improvement when you want to combine them
maybe we need to find a better way to combine them some for example
similar to what subspace gmms are doing for is for speech recognition
i don't know what
then and look for working on that and on of they make some progress
which i tried interactively for example you estimate the weights you all data gmm weights
of the ubm
and then extract wilma statistic second you i-vectors it didn't have for speaker to be
honest i tried it and in a given the same result no improvement not think
so
but i met in tried for language id but only for speaker
thank you
so
you have a lot of time that i'm to understand my question that i
you know we walk a lot on the way to unit in avenue always negotiate
for mainly
and we are looking also on
the weights with l you know we approach and
michelle has also some results so
it seems to me since beginning was because she felt that
the weights are very interesting very nice source of information
in fact it's of in every information
why if you come back to ubm-gmm
and come back to dog results
when he proposed via a top cushion scoring compression you are using on the
one motion put one to this one and zero to all view of those the
lowest them of that summons was quite small
after that when you go to a nickel ship a results
which wine is the em too many out and
do a lot of things very close to what you
presented
at the end the best solution was to use very rank based normalisation
in the right based is very close to a
put the one to some portion zero to view of those on the weight and
count
this after
and now if you look at p m share the results he's bit of a
need to explain a text in but and of the time using just the zero
and one information a of the weights it seems that we are able to find
so according to me the way to a position
represents information birdies
this information yes or no and not a continuous information
like
you are trying to do any we
so i so there is a good point here because
one i one eight one to one has sent could start working with the mean
in a negative factor is my first question on my first think was that was
aware what nicole lasted in
i don't i'm
i want to sparsity into in the weights
this is this is not able to do what are what we're doing now sell
because i agree
with top one top five what it what ones for the top five
so it's like i say one sparsity in the in the in the weights like
which is right and all of them the response like some
zero one can only the top five for example or something
but for this system
well for this model that we have we're not doing that
that was my first actually common one because of it was in the committee and
made was my first comment was like how we can be good sparse
because based on what you what you're saying exactly
at
extract the i-vectors adaptively
okay that the ubm
before you extract
for each frame
there are very few girls
that's what happens i don't know it's that's
them going to knock down to solution to your problem but you will get sparsity
about way how okay
thanks one a
so this kind of follows up on patrick's questionnaire so you're doing sequential estimation for
the l six l some c are
on how many iterations at you go through two
to get that
a wrong
ten of like an em style greater and for each one is a grant in
this s and i go looks five for our and three for l
so
i'm asking this because to me it's interesting to see the rate of convergence that
you might actually hit on this and i know it's extra work right
you did in your evaluations i believe you're doing your evaluations when you believe you
converge did you run any previous system
so let's say that before hitting five you try to that i try and just
to see where you're actually see it may be there are certain
seems to get activator i-vector to get active
are you might actually see
there might be some insight into c
i try this but not in the context on the on this context of the
without of this enforce something what that the n and it
the different and it's a little it's sensitive to one like to see it up
more sometime when you get it that's true like fifteen iteration
kind of like seen that results going to grabbing after some from point the degradation
start to be seen
and usually it's like between
eight five to eight you already saturated
yes we need to control a little bit that
yes if we go if elected goal you is sort yourself actually s m sometime
especially for the sparsity s m is much better because it will it would hit
it just so what you will know like it would fit the data but he
with just
and have a would not do that because it's like an approximation
so that's my issue would and fa
s m would definitely get miss some sparsity if you know if you know how
to control it because you know you maybe you might overfit
the side
more probably marceau it can no better than meets is
for a morsel
you probably know what are then mean that
"'cause" he was doing this isn't that right
where did i
actually when we did this work are we tried different optimization algorithms
for these approximate hacienda it converts and iterated
quite good and also like the questions before we also saw like even few iterations
already
you got already quite good results
and if you like when only iterating you've got some degradation that so it looked
like it starts over fitting the model
so i guess you use all similar
two