0:00:15mission
0:00:15i'm not sure must be from but
0:00:18and we will have a five
0:00:21papers
0:00:21first one
0:00:23will the was that incorporation of eer for a start but
0:00:29score variance spectra based normalization for i-vector standard probabilistic linear discriminant analysis
0:00:36the authors are okay started
0:00:38if we show skit domain is it is possible
0:00:42not possible task
0:00:44also you are the last one i can problems
0:00:47so
0:00:49present paper
0:00:54yes thinking on that
0:00:55so that in the past on that just mention needs a collaborative work so actually
0:01:01it's also offer a lot because
0:01:04the
0:01:05right
0:01:06the work has been started with condition scale and because you don't invest in the
0:01:11speech and so i wanna start with some analysis of for what we did before
0:01:17and also try to improve the work that has been previously
0:01:22so
0:01:23this is based on i think so and kcca welcome back to i-vectors
0:01:28so i would start with a brief description of a system and which is based
0:01:34on classical i-vector in i
0:01:37yeah
0:01:40a tall muscular the post processing of the i-vectors beef between the i-vector extraction the
0:01:45plp
0:01:46which is the buttons a system where we try to improve the discriminant see so
0:01:52usually by using a D approaches
0:01:55and also to compensate for the session variability so one way to do it is
0:02:00to use the length normalization there are plenty of way to do this but i
0:02:03will focus on these two
0:02:06and as the discriminant C is a related to the variance
0:02:10the data are and we look at
0:02:14in the between and within class variability
0:02:19so
0:02:20we start with the description of the system so that
0:02:23between on one for
0:02:26so the system is just a classical ubm
0:02:30everything is gender dependent from the beginning to the end
0:02:33so the idea is to some distribution
0:02:36the we extract mfcc sixty dimensions of the use the based on the use recognizer
0:02:44and the constraint is the very classical so using a large amount of data for
0:02:50based on four or five or six
0:02:52and wait
0:02:55so for the second pass the i-vector extractor also gender dependent and
0:03:01we only telephone data from these four or five was the switchboard future
0:03:06i think it's quite the state of the art
0:03:09so just a rough idea of the number of sessions
0:03:12and for the i would say that a normalization and classification training which includes both
0:03:18the gplda training
0:03:19and you training and everything will see in the following
0:03:24we used a gender dependent subsets of the various sets of data
0:03:28based on still for five or six and sweet spot and we use only is
0:03:32because of the number of sessions yeah
0:03:35and the we restrain the development set to segments for which the nominal X
0:03:42is higher than one hundred eighty seconds
0:03:46so no look at some tools that can be useful when we talk about variability
0:03:52so first i would just remind discriminant C and covariances so
0:03:58we
0:03:59a commonly used the covariance matrices of the total covariance the between class covariance the
0:04:05within class covariance
0:04:07but usually it's very i mean it's very common speaker verification to instead of using
0:04:12the between and within class covariance matrices to use the scatter
0:04:16matrices
0:04:18so the definition is roughly similar and so they can use
0:04:21is that one of the ozark and for several applications
0:04:25the recent chapter
0:04:27is that
0:04:28i don't the scatter matrices the do not take into account
0:04:32the number of sessions per speaker so the weight actually a speaker is that the
0:04:36one of the pounding of the number of sessions
0:04:39so i think it's a commonly used look at we just need a few experiments
0:04:45distance to see
0:04:47in our system
0:04:49one of the other it's much efficient
0:04:52so what talking about classification what we are interested in is to
0:04:57read use
0:04:58the maximise the between
0:05:01speaker variability and reduce the within speaker variability
0:05:05and one way to do this is to look at the covariance
0:05:10and so what we need to do
0:05:13this to which is a spectrum and so too is very common to
0:05:17yeah of the raw
0:05:19the main
0:05:21what is it so on this graph of we can see three plots which are
0:05:25coming from the top of any to the violence
0:05:28is that science and within class variance of for us so the speaker and session
0:05:35so what we compute the between class covariance matrix
0:05:38B
0:05:39then we rotate all the data on the development set in the i-vector basis
0:05:44can be
0:05:46we compute then dimensions
0:05:49and then we just but the diagonal of this matrix so you can see that
0:05:52the variability
0:05:53in the first dimensions is higher for the speaker and also for the sessions
0:06:01so now talking about this way to maximize this ratio is to use the very
0:06:05common lda someone is just maximizing the rayleigh coefficient
0:06:10so there is completely defined is really coefficient using the within and between-class covariance matrices
0:06:17or using the scatter matrix
0:06:19so in this work the it would be used to reduce an exercise from six
0:06:24hundred to eight so this is constant for all the experiments we have
0:06:29and the to go is that it system description
0:06:33we try to define scoring the first one is based on the two covariance model
0:06:39that has been used by need to two years ago we can write
0:06:42and so
0:06:46shen
0:06:47and the second one
0:06:49is based on the period using the gaussian assumption
0:06:53that you were used is based on so we used the eigenchannel matrix of the
0:07:00key
0:07:01but the full range because on television this time was using the diagonal see
0:07:06so the number of speaker factors in the key thing but i mean at to
0:07:11be consistent with the lda
0:07:14and the number of channel factors six something because it's the way to
0:07:19compensate for the diagonal
0:07:24so that the problem is all this
0:07:27students including to model programs and here is that everything is based on the questioned
0:07:34assumption and
0:07:36for those working you know that's two D C
0:07:39we have very good to know that we are talking about it at the T
0:07:45and the noise very company
0:07:49not in the community that the i-vector are not following the nice motion but something
0:07:54a bit more that you like
0:07:56they didn't
0:07:57distribution
0:07:59so what we do is that we try to take all decided these i-vectors and
0:08:04make that make the distribution motion
0:08:07in one way to do this just been proposed initially by to present the same
0:08:12time
0:08:13is so then i guess they're male and that's the speech intention
0:08:18is to normalize the magnitude of i-vectors
0:08:21so using this formula as this one and just the
0:08:25we centered at thousand that we just normalize them into
0:08:31so using this method the distribution the car become a bit more cushion
0:08:37and we can see that the effect is
0:08:40very efficient
0:08:42so just using the tool to
0:08:45but
0:08:46two covariance model
0:08:48we can see that again in both equal error rate
0:08:50and this form at mit on nist two thousand and so this
0:08:55and instances two thousand and extending
0:08:58is a simple presentation
0:09:01so everything until now is very common so going back to the to the to
0:09:06introduce previously
0:09:07oh we would like to show the effect of length normalization
0:09:12provides a by a spectrum
0:09:15and as you can clearly see
0:09:17a det curves a exactly the same except for the rest of the value was
0:09:22because a normalizing the magnitude
0:09:24we can see that
0:09:26the button on the right side are smaller but it doesn't affect me much just
0:09:32so
0:09:35fortunately
0:09:36an initial papers the maximization as we introduce with
0:09:41whitening so it has to be done after whitening of the data
0:09:45so that they got several in this in this algorithm so the whitening is just
0:09:50using the total covariance matrix you know when i vectors and then we apply the
0:09:56length normalization
0:09:58at the same time initial risque introduce the eigenvector measure which is just a whitening
0:10:03plus like summarisation but don't iterative
0:10:07and by this iteratively the interest of this method is that for
0:10:12converge very fast
0:10:13and we introduce some properties
0:10:17that we can use further
0:10:20so the properties out that the need of the development set is a converging to
0:10:25zero very fast
0:10:26the covariance matrix the total covariance matrix is become the identity you five i
0:10:34and going from this all the eigen vectors for the from the
0:10:39between class covariance matrix
0:10:42because also eigen vectors of the within class covariance matrix
0:10:47and thus using all this property together
0:10:50it happens that the eigen vectors of the
0:10:54between and so within class covariance matrices
0:10:57now solution of the
0:11:00and the optimization
0:11:01that means after all this
0:11:03it at and the eight yeah improvement is
0:11:08so that was one of the conclusion junctions
0:11:12first paper
0:11:13and the that we can see the effect of the this normalization of the on
0:11:18the variance spectra
0:11:19so before we a treatment i-vector based on this
0:11:25provide
0:11:26and after one
0:11:29after one iteration which is exactly what the former romero
0:11:33proposed
0:11:34and so what i think the signal
0:11:37we can see that the total covariance spectra become a flat
0:11:42after two iterations
0:11:44even better
0:11:45and after three
0:11:47almost perfect at least for the human eye
0:11:50so you can see that
0:11:52the big advantage of this paper is that the first dimensions
0:11:55data does not contain the major portion
0:12:00the variability
0:12:01there might a portion of the session variability
0:12:04so what actually
0:12:06after this treatment the i-vectors become
0:12:09optimal for the weighting coefficient optimization that means this should be the
0:12:15optimization of at
0:12:19so to illustrate this some results using the lda then we use the two covariance
0:12:27model for score
0:12:29and
0:12:30so the baseline is just the length normalization when i say length normalization
0:12:34without any whitening
0:12:36is just the magnitude normalization
0:12:40so you can see that using the
0:12:43and eigenvector original doesn't improve
0:12:46the performance after one iteration
0:12:48if we use the scatter matrices to compute the U
0:12:52but in the case we compute the
0:12:54the at you using the between and within class covariance
0:12:57we can see that for the female at least it improves the performance
0:13:02and after two iterations
0:13:04we can see that the conclusion is the same means using this data
0:13:08the between and within class covariance matrices
0:13:11he seems a not optimal so it's better to inspectors use the between class covariance
0:13:18the initial definition
0:13:21so that after this result we try to apply the same data to
0:13:26before the P which is more robust maybe the covariance model
0:13:31so
0:13:32this is the baseline using only length normalization and when we apply two iterations eigenvector
0:13:37original which is optimal in this case
0:13:40that we see that the data is not adapted for the key idea
0:13:44so the performance on the bizarre
0:13:47might states even worse
0:13:49but
0:13:50there
0:13:51so it was a extending this work a by still looking at the covariance is
0:13:58but
0:13:59thinking that after the length normalization everything is on the sphere so that means we
0:14:04have a spherical surface and what it does not like this
0:14:08and is very difficult to estimate the covariance matrix
0:14:11because when you look at each speaker
0:14:13from one side of this field one also that the within speaker variability we
0:14:19very different
0:14:21and if we just take the average of this
0:14:23to estimate the development set within class covariance matrix
0:14:27then it doesn't make sense anymore because the them at the
0:14:31metrics negative for some speaker but obviously not for
0:14:35so what was in this paper is that keeping the detectors on the surface because
0:14:42no it's commonly admitted that is
0:14:46really to use t-normalization for the session compensation
0:14:50but we want to be the principal directions for the decision boundaries
0:14:56that means
0:14:57we won't us within class covariance matrix to become
0:15:01diagonal and even better if it's the just the i don't teams about
0:15:06a constant
0:15:08so we decided to apply exactly the same algorithm as previously
0:15:12an iterative process which is using the same process instead that we replace
0:15:17the performance metrics
0:15:19by the within class commencement
0:15:22and so by doing this
0:15:24we can see on the spectral of the same set of development that one iteration
0:15:29make them
0:15:31the set become very fast so this is the session but we can see that
0:15:35it's almost what spread
0:15:37oh the dimensions
0:15:41and the after two iterations
0:15:44for all or from the point of view fume and still exactly the same but
0:15:48in the rate that you the performance
0:15:51so that weak emission we can see that it's completely flat and what's the effect
0:15:56so this
0:15:57when we use them to us so that's why i'm gonna show in a few
0:16:00minutes
0:16:01but before that i just want to identify the this process can also be used
0:16:05to initialize the key here matrices
0:16:09actually
0:16:11for most of us are using a pc in order to them
0:16:16to initialize the key idea matrices because
0:16:20provide the first point
0:16:23the first information space
0:16:25which we can reproduce so that's a very good starting point
0:16:29and but actually what we propose here is to use this process so we what
0:16:34they all the i-vectors the eigenvectors basis of B
0:16:39and then we initialize the was that this is the speech in the speaker factor
0:16:43matrices matrix
0:16:45we each we initialize by using the first ten dollars
0:16:49the distance
0:16:51then for the eigenchannel metrics we use the
0:16:56to rescue decomposition of the brain
0:16:59the within class commencement
0:17:00actually if you
0:17:02if you can see that actually using you wanted to think the eigenchannel matrix
0:17:07you can just initialize the signal using the same process works
0:17:11i think
0:17:12so that some results using the so we don't using before it's just detectors plus
0:17:18the normalization process
0:17:20and
0:17:21so i just want to mention that
0:17:23for the random initialization of the pac as the performance can vary it
0:17:30depending on the initialization point
0:17:32we performed and experiments with different physician and then we may be averaged the results
0:17:40so you can see the baseline that i previously presented and also the eigenvector method
0:17:45which is not efficient this case
0:17:48and you can see that using the spherical normalization
0:17:52how we call this
0:17:54you normalization
0:17:55performance
0:17:56so improve in the case
0:18:01so
0:18:02no
0:18:03the say the C station
0:18:06process that i just described we can see that the performance of data
0:18:10but i just want to that the fact that performance on the best are actually
0:18:14it's just the fact that
0:18:16in this case going towards for
0:18:19the performance when using this physician are just the lower bound
0:18:23of what we obtained by using mandarin physician
0:18:26so that means it's a it's maybe better but i guarantee a certain of and
0:18:36performance
0:18:38so
0:18:40to conclude this presentation i just want to new
0:18:44and for that the fact that we used
0:18:49so i didn't do this to the band spectra which is very well known be
0:18:53non-separable so that
0:18:56use that used in the presentation may be a few
0:18:59use it
0:19:00but this tool was to analyze the performance of the system and actually can also
0:19:05be used to
0:19:06what i'm thinking after obtaining the two i-vectors
0:19:10it's a very good indicator of the quite
0:19:13what
0:19:13extractor
0:19:14because just looking at the spectral you can have a rough idea of the performance
0:19:17we get that yeah
0:19:19and i think iteration is doing some experiments at this time and you will present
0:19:23this
0:19:24in this thesis i think very soon or he doesn't
0:19:29so i
0:19:30this would have to be useful for analysis proposed
0:19:34so for the case we shoot
0:19:38coming back to our previous paper we show that the rating process
0:19:43the normalization whitening
0:19:45to improve the performance slightly so it's not that the improvement
0:19:49why not doing it twice and it's three
0:19:52and
0:19:54also that the co-occurrence matrices
0:19:56i think you know case perform better than using the scatter matrix
0:20:00then to and this talk just remember that the spherical nuisance normalization
0:20:06in the in the middle
0:20:08improve the performance of in the case of
0:20:11you scoring
0:20:13and also that
0:20:14something in mentioned before but when you use the this type of process to initialize
0:20:19you didn't matrices
0:20:21the and the you don't need to perform
0:20:24so yeah em iterations
0:20:26so for the case i presently we obtained the best performance but
0:20:31using hundred iterations of yeah
0:20:34in case of problem can see section
0:20:36using this process we just need to make ten iterations
0:20:41so if the key is not the requesting them
0:20:44training
0:20:45in some ways to reduce the time
0:20:49so no if you and question
0:20:51yeah
0:20:59oh
0:21:04the
0:21:07oh
0:21:09i
0:21:14i
0:21:17oh
0:21:18i
0:21:22i
0:21:43i
0:21:51yeah i actually if we get really i don't like the length normalization because it's
0:21:56three a
0:21:57and only not process which is going just right now so apps of justly but
0:22:03and i think we need to find a way we address this issue
0:22:06by finding something more
0:22:09consistent
0:22:10you
0:22:13yeah i
0:22:26and you
0:22:27i