so hi everyone i'll present thing to the iterative bayesian and mmse by noise compensation
techniques for speaker recognition in the i-vector space
so let's
start by setting up the problem
here we are working on noise also noise is one of the biggest problem in
speaker recognition
and the a lot of techniques have been proposed in the but in the past
years to deal with it in different domains
such as speech enhancement techniques
feature compensation mother compensation and robust scoring and in the last years the nn based
techniques
for a the robust feature extraction or a robust computations or statistics or
i-vector like representation of speech
so what we are proposing sheer ease a combination of two algorithms
in order to clean up and noisy i-vectors
so we are using a
clean front end so system trained using clean data and a clean back end so
in scoring model
so the first algorithm
in the past work in the previous work we presented a i'm up
it's an additive noise model operating in the i-vector space
it's based on a two hypothesis
the gaussianity of
the i-vectors distribution and the gaussianity of the night distribution
in the i-vector space
here i'm not saying that noise is additive in the i-vector space and just use
ink this model to represent relationship between clean and noisy i-vectors
just to be here
so using not criterion we can
there are in this equation
and we end up we a model that it given a y zero noisy i-vector
we can
d noise it
clean it up using
the between i-vectors distribution hyper parameters and the noise distribution hyper parameters
so in practice this algorithm is implemented like this given a test segment we start
by checking it's the snr level if the segment it's clean is clean so we
are okay
if it's not
we extract the noisy version of the i-vectors y zero and then using a voice
activity detection system we extract
noise from the signal using the silence intervals
and then we inject
this noise
into clean training utterances
this way we have clean i-vectors and they are noisy preference using the test noise
so we can build the noise model
using the gaussian distribution and then we can use the previous equation to clean up
the noisy i-vectors
so
the novelty of this paper is how can we improve the i'm
so that the problem is that we can apply time up many times
successfully
iteratively because we can guarantee the goshen hypothesis on the on the residual noise
so the solution that we came up with is to use another algorithm and to
iteratively between these two algorithms in order to achieve better training for the i-vectors
so this second algorithms this call the catfish algorithm it's used mainly in chemistry two
align different molecules so here we we're applying it on i-vectors and we're starting from
noisy i-vectors
and we want to estimate the best translation and rotation matrix
in order to go to the clean version
so formally for the formulation of the problem
it's called the
program this
problem and its start with two matrices to data matrices and noisy i-vectors
presented at a matrix and the clean version
this way we can estimate the best relation matrix or here
that relates the two
so in the training we start by
that we said that we are estimating a translation vector and the rotation matrix so
to get rid of the translation we start by center ink the data the we
compute the centroid on the clean data and the noisy data and then
we center
the clean and noisy very i-vectors
then
now we can compute the
to the best rotation matrix between the noisy i-vectors and their cleavers and using svd
decomposition
the once we've done this when we have the best translation and rotation for a
given noise
on the test
the weekend
extract the test i-vector
we apply we start by applying the translation a minus
here we subtract the centroid of the
the noisy i-vectors and then we apply the rotation and then either translation to and
up with its cleaver
so we use needs and switchboard data for training
and the nist two thousand and eight four test that seven condition we are using
nineteen mfcc coefficients plus energy plus their first and second derivatives
five hundred twelve components gmm
our i-vectors have a four hundred components under using the two covariance scoring
so here we are applying
each algorithm independently and then what combining the two
so
we've the first algorithm i'm up we can achieve from forty to sixty percent
for a t v equal error rate improvement
for each noise
for the first algorithm we jan achieved up to forty five percent of equal error
rate improvement but
when we combine the two
in the for one iteration or for you we can and up with up to
eighty five percent of whatever it improvement
here i presented the data for male they may
for male data and to your for you might but well for female it's
the error rates are a little bit tired but it's efficient for both
the and here we compare the two algorithms and their combination
on heterogeneous the setup it's the when we use a lot of data noisy and
clean data for enrollment and test with different snr levels on the target and test
and we can see that's a it's it remains efficient in this context
so as a summary
using
i'm out or that they kept algorithm we can improve the equal error rate from
forty to sixty percent but the interesting part is that combining the two
can achieve
for better gains
thank you
so we have questions
is the patient matrix a noise and it's
or anti noise that yes that's really different sorry
yes here we're estimating for each different noise at different a translation and rotation matrix
we just want to show the efficiency of this technique but in of the future
in another paper will be published in interspeech i guess we well it's except that
so
it will
we propose another approach so that the that does not
suppose a certain model of noise in the i-vector space
and that can be used for many noise
that can be trained using many noises and use it if you used efficiently
on the test with different places
so here is to just to show the how four we can go to the
best case scenario
but in another paper we show how we can extend this to go away many
noises
and i was presentation so
if you go back many years ago how lemon oppenheimer had a sequential map estimation
that be used for speech enhancement obliterated back and forth between noise suppression filters and
speech parameterization so you're iterating back and forth between two algorithms here
you show results we had one iteration to iteration is there any way to come
up with some well maybe two questions here anyway to come up with some form
of convergence criteria that you can assess and second is there any way to look
at the i-vectors as you go through the two iterations to see
which i-vectors are actually changing the most that might tell you a little bit more
about which vectors are more sensitive to the type of noise
so the first question
so the first question was is there any way to look at a convergence criteria
because when you say eight or two you need to know whether you convergence and
okay
so well here what we've that is just to iterate many tendency
at which
from a which level we get
we start the
making the results were so it's not really
it's not that the
we haven't the gone that gone there in that
so if you look at the two noise types you cycling fan noise and i
think you had to
car noise so both are low frequency type noises can you see if you have
similar changes in the i-vectors in both those noise types
yes
maybe i can't the common in that because i haven't then the full analysis but
the just from the right we can
i can tell you for sure for sure is the that the efficiency depends on
the
on which noise you're playing at all so
it sufficient store but it's it can be the
that is in the way that makes it more efficient if we have different noises
in the between enrollment and test
thank you for the nice presentation
one a while ago try to read original are mapped paper so if you don't
mind i just as a question about the original i'm out that the iterative one
sorry that i didn't understood original are you map
yes not data at one
okay so go like i mean in the block diagram that you how
can you go back to the block diagram of this
or email
yes
so you're estimating extracting noise from the signal or somehow estimating the noise and in
the signal
so and then you go up to the for noisy and of zero db that
the speech and noise are of steam similar or same strengths over there can you
tell us how would you or in extracting noise from signal in zero db
so here were using energy based voice activity detection system but we are we just
making the threshold more strict in order to avoid the and you got with speech
confused as noise so it's not we
we did the well as sophisticated the voice activity detection system for this task specific
well as the avoiding a slight as much as possible to end up with the
with speech by using a very strict this one on the energy
c use the it's just it's quite amazing the level of improvement you gain from
twenty something to date present it is it is quite something that it feel it
feels that you have very good model of noise here and if you have such
thing then it would make sense also to just check we is speech enhancement i
mean you have this
and misty based approach like wiener filtering if you have a good model the contract
the noise than it is good to also compare with that was to do you
like feature enhancement noise reduction in compare with that as well just a common
yes okay
okay that doesn't be any more questions over so that the speaker