0:00:15given an i-vector on the value
0:00:18can be decomposed in part
0:00:20speaker all part with your town zero on
0:00:24with
0:00:27matrix v
0:00:29score one contains them but this of
0:00:32and they can based voice subspace
0:00:34and always it you
0:00:37which is like to speaker factor normally distributed
0:00:42which is
0:00:43so we to do is a
0:00:45consummate weeks from now
0:00:48which is for inside too
0:00:50and the lies in most commonly used
0:00:53p l system for i-vectors in which shown in effect is kept for
0:00:59the decision score
0:01:01proposed by someone prince
0:01:03is or log likelihood ratio
0:01:06in which we can see that
0:01:09the computing the scroll depends only on the
0:01:12nolan shelf
0:01:13i matrix fifty transpose of five but
0:01:16for speaker
0:01:18factor
0:01:19and vector transpose it proves long down
0:01:23which content to talk about every reliability
0:01:30there shouldn't fairly on modeling can provide good performance but
0:01:34it has been shown that just performance are achieved only if the condition and prosody
0:01:40a it follows the and extraction of i-vector all this conditioning posteriors
0:01:46and is summarized by whitening most commonly used a
0:01:51whitening is a standardization and length normalisation
0:01:58i was matrix
0:01:59of liability shown in for the standardisation
0:02:02can is a total covariance matrix
0:02:05so within speaker covariance matrix the volume
0:02:09eventually to eventually we iterate this process
0:02:13parameters are computed for the i-vectors present in the training corpus and applied to test
0:02:18i-vectors
0:02:22assumptions of the mission p lda the vicinity
0:02:26justly
0:02:27and the linearity of eigenvoices it means that
0:02:31so the speaker are but can be constrained in a linear subspace
0:02:36and the mostly just a city of the radio or
0:02:39it means that a system to build your model assumes
0:02:42that a speaker classes
0:02:46statistics means that channel effects can be modeled
0:02:50in a speaker independent way
0:02:53so that the distributions shells a seven grams metrics
0:02:58so it's independency between number
0:03:01and the speaker factor
0:03:03and the equality of covariance
0:03:06garlic which occurrence it means that there are also between the residual between the actually
0:03:12beach of a class
0:03:13and the middle parameter
0:03:14computed
0:03:16for the jelly a seem to be uncorrelated
0:03:20normally distributed a on the explained by
0:03:24so front it's a simple
0:03:27of the development corpus
0:03:30so randomly
0:03:31and that surrounds the not vary with the effects being more target
0:03:38on the left as a graph
0:03:41is the simple condition of the p lda model is in speaker factor one dimension
0:03:47one additional subspace
0:03:50where is no more
0:03:52while stoned a normal prior for the speaker factor
0:03:56and some classes with the same
0:03:59viability metrics
0:04:01or am is that i-vector no lie on
0:04:04the nonlinear and find it connects subsets of an impostor
0:04:10so as the distribution of i-vector noise
0:04:13which is referred to as it's very core distribution
0:04:19we think that perhaps insurance that exists a renowned speaker-independent admits a parameter on the
0:04:24of within stego abilities questionable
0:04:27in such a not affect be modeled in a speaker independent way
0:04:31it's difficult to sure that something is right or something is wrong
0:04:37for example if we find out or ration significant duration between
0:04:42the whole
0:04:44and the class parameter
0:04:46the effect drama to it where you're late the estimation of random variable
0:04:54first we present the deterministic approach
0:04:58why printing deterministic approach to compute a purely apparently fast
0:05:03because first two and we try some
0:05:07deterministic approach is an remarks and that other approaches
0:05:11not
0:05:12all relevant sometimes a not so but the to suit
0:05:17it should still there is not optimal for i-vector cycle distribution
0:05:22can we replace is sophistication of the expectation maximization maximum likelihood
0:05:28estimation of
0:05:30parameters
0:05:32by a simple and straightforward while stifle wildest an acoustic approach
0:05:37so we want to know if
0:05:40so application of the maximum likelihood
0:05:44approach compute the parameters of the india
0:05:49brings significant improvement of performance
0:05:54we did not sorry may be the value when signals the between losing into programs
0:05:59matrix was completely
0:06:00on our development corpus
0:06:04a singular value decomposition of the between speaker covariance matrix
0:06:08give a matrix
0:06:10whose columns are
0:06:12so eigenvectors of the weighting between speaker
0:06:16liability and the their remote matrix of eigenvalues
0:06:21sorted in decreasing order
0:06:24un a wrong are less and b
0:06:27we can
0:06:30compute
0:06:31as arounds principle between speaker variability
0:06:36and summarize it's and metric speech times t matrix
0:06:40defined by the question for
0:06:45the fast not to x p one two we are used to be turned on
0:06:49matrix composed of the first occurrence of p
0:06:52and so they're gonna matrix don't i want to well
0:06:57is only comprise of the
0:06:59highest hardest
0:07:01eigenvalues
0:07:03and so we propose a two
0:07:07carry out
0:07:09experiment with only
0:07:11i w conditioning
0:07:14conditioning and the system the still addition according to
0:07:19within class covariance matrix
0:07:21followed by next lemmatization
0:07:23and the direct estimation of others at the parameters of the p l
0:07:28the lda without which emitted and then and
0:07:31on the bus on development corpus
0:07:35so the scoring replaced by is the smart this is the total covariance matrices
0:07:40for
0:07:42is estimated by
0:07:44that at the transmitters of the development corpus
0:07:47and speaker levity metrics fifty transpose by be want to all
0:07:55suppose can be justified if we consider somebody solely data from the development corpus
0:08:02we can express as a factor and the parameters
0:08:06speaker and with your
0:08:08factors and she
0:08:10well i on the value s is the mean vector director of speaker s
0:08:15we show in the article that the covariance matrix is be i two well as
0:08:19desirable that the speaker factor is standardised mean zero and i don't to metrics for
0:08:26ability
0:08:27and the dependence between that and variables
0:08:32remark that only the new which of the covariance which is a necessary condition
0:08:36is the
0:08:39shift
0:08:40and we cry and to obtain the lda scoring
0:08:48next mission is known to improve the question it is so we compute the custody
0:08:53of the speaker and was or fact also for development corpus
0:08:57before and after length normalisation
0:09:00top graphs shows
0:09:03and distribution offices quell line source to standardise digital factors
0:09:09left as the speaker factors on whites are ways of the optimum
0:09:14the dashed lull i and is
0:09:17the distribution of the key to
0:09:20the speaker factor or must follow
0:09:25a key with a degrees of freedom
0:09:29and still on
0:09:31okay to is a p u is of freedoms peas dimension of the i-vector space
0:09:37we show it's not use that
0:09:39for all
0:09:40development board line
0:09:43and so for evaluation
0:09:46datasets
0:09:48there is a mismatch
0:09:50between them
0:09:54and as a distribution of an intimate we can give it a distribution
0:10:01remark also the several dataset shift between
0:10:05development and evaluation dataset
0:10:09after length normalization
0:10:12is the volume
0:10:14right care to experiments with
0:10:18manage to compute parameters and with a deterministic approaches
0:10:23in both cases
0:10:24we can see that
0:10:26so the numbers and the t v
0:10:29partially reduced
0:10:31and the shift
0:10:34between the development and evaluation
0:10:38mark sets a deterministic approach
0:10:42improves the question e g
0:10:43in a similar manner to ml technique
0:10:47what's that is on the and it's recognition but most distant of motion t
0:10:53always use of
0:10:54three systems
0:10:58we ultraviolet of conditions of the nist speaker recognition evaluations on eight ten
0:11:05twelve telephone
0:11:08is that the noisy environment
0:11:11with the system
0:11:13was a length normalization
0:11:15following do not exist from going to signal
0:11:19so that learns metrics and
0:11:21two w which two cases
0:11:24what is you know and mel
0:11:26estimate of parameters and is a deterministic
0:11:29an estimate of parameters
0:11:32we can see
0:11:34you can see that the result of the same in terms of
0:11:39the colour right
0:11:40between the two last the last two techniques
0:11:43in terms of this you have the probabilistic approach women superior
0:11:48and we mark sets l w conditioning performed a bitter
0:11:53done the l signal conditioning
0:11:56event with a deterministic approach
0:12:04so no you consider that maybe the fact that so
0:12:09and the end and better approach doesn't bring as expected
0:12:13improvement of performance
0:12:15maybe is due to the fact that the g p l d ar model is
0:12:20not optimal for i-vector spherical distributions
0:12:26so we compute
0:12:28two series for development corpus
0:12:32first the average people celebrated of zero is you of our observations
0:12:36given the model
0:12:38even
0:12:39she and until
0:12:40and standard t money t
0:12:45which we consider that are likely would
0:12:48off
0:12:49class but also for likelihood
0:12:51of the class given number
0:12:55then we compile this
0:12:57likelihood to the parameter of position of the class consider a wider of probabilistic class
0:13:04position it's pasta or like a likelihood of the speaker and for speaker factor of
0:13:09the class
0:13:12and we display
0:13:16the two series
0:13:19all horizontal
0:13:21wasn't really is parameter of class position and optical
0:13:26the likelihood
0:13:28of the reason you
0:13:30according
0:13:31we can model
0:13:35the first graph
0:13:37shows results always that would next normalisation
0:13:41with i-vector lost provided buys extractor
0:13:44and we remark here that
0:13:47no volition a cross between the position of the class
0:13:53and is a likelihood
0:13:55of the residue
0:13:58each time we displays a coefficient of determination task well
0:14:02two scroll from zero to one
0:14:05when which indicates l well data for points fit alignment
0:14:10the task was equal to zero point zero four
0:14:13close to zero
0:14:16after are all length normalisation
0:14:19a significant reduction
0:14:21appears between the likelihoods of the class factors and the likelihood of there is you
0:14:27a squirrel are equal to zero point filing nine and zero point six four
0:14:35so there is a dependency between
0:14:37the actual vulnerability
0:14:41matrix of class and the probability position of this classic sperry by the likelihood of
0:14:46the fractal
0:14:48so we can see they are that's the show and it there was a dusty
0:14:51of the raising your
0:14:58we compute the previews
0:15:02results who is well training set
0:15:05in which that are not evenly distributed across speakers
0:15:10so we can object that relations due to the can to differ information to speaker
0:15:15some four
0:15:17so we compute the same graphs and before
0:15:21but on the for is speaker
0:15:25training classes
0:15:27with the minimum number of sessions per training speaker
0:15:33we don't are you see that a minimal number of sessions speaker
0:15:36one from two to sixty two
0:15:41and this time for only segments of speaker which
0:15:46the more than this minimum well
0:15:48we compute the l score
0:15:50we see that before makes them addition there are no problems
0:15:54because the
0:15:55the two series are independent
0:15:57and after maximization be seen
0:16:00that event for
0:16:04uses speaker classes with the
0:16:07the maximum number of sessions
0:16:10the same
0:16:11was we took us
0:16:14is else well which are higher than zero point six
0:16:24so we remark
0:16:27that the j p alone modelling is a good model
0:16:32but if we are obliged to
0:16:34project that on the nonlinear also phones are
0:16:38problem is to be sure that and the most acoustic model with the quality of
0:16:44covariance
0:16:46will for from the simpson
0:16:51we don't dusty does take this out to replace the overall with a cluster between
0:16:55that parameter by the class dependent parameter
0:16:58steak the queen the local position of the class to fit to it
0:17:01actual distortions
0:17:05such an adrenaline is difficult to carry out
0:17:11because it induces a complex density
0:17:14i passing the within class variability parameters will nonlinear function
0:17:19or getting up length normalization and
0:17:22posting approaches as well which present over the i-vector on the one
0:17:27attempting to find out attic what why also as heavy tailed be
0:17:31discriminative classifiers pairwise discriminative
0:17:36all just on why we are obliged to ignore the non maybe because and all
0:17:41contain expected
0:17:43the art abilities
0:17:46may be related to some parameters
0:17:49acoustic
0:17:51just remark
0:17:54which the and w conditioning
0:17:59transform is the within class variability in the identity matrix
0:18:03and identity matrix as no
0:18:06principal components
0:18:08maybe it at alleviates is a constant of
0:18:13almost a dusty city
0:18:16thank you
0:18:22i
0:18:33condition man something eat with experiments that you replaced the probabilistic approach of estimating the
0:18:42parameters with the say on the screen
0:18:47the minister
0:18:49i think that's
0:18:50in the limit if you're stream have main speakers
0:18:53these two conditions exactly the same sort the only difference is that you're putting the
0:18:58prior in the one case
0:19:00okay so we present us with the number of the number of speakers average number
0:19:06of speakers a and i guess that's you can go to a small number of
0:19:10speakers when you train the model
0:19:12yes and it and difference that's as of the deterministic approach is not intended competes
0:19:19with a man a matter and then is the best way
0:19:22but just i was surprised by is a slight yelp of performance
0:19:28and so it's
0:19:31assume that maybe because our aim ml count
0:19:35be optimal because there is a problem of sphericity of data
0:19:40but deterministic approach is not
0:19:43and that's exactly this topic when we try to show that the norm
0:19:49of the speaker factors
0:19:52whether the full weight a
0:19:56yes i guess a because you have to treat them as random variables because the
0:20:01not simply points
0:20:03under the plp scheme there they have a posterior distribution
0:20:07"'kay" a better way
0:20:09to consider whether they following the distribution
0:20:13would be
0:20:14broccoli to at the trace
0:20:16the posterior covariance matrix v should be should also be added when you can't leave
0:20:20the norm in order to see the overall distribution rather than
0:20:25dot products on okay
0:20:30marketing that's
0:20:31so that in that's the same rationale "'cause" with evaluation was test like toss a
0:20:37rice with development corpus used as vectors
0:20:42same effect okay was that the difference is that between length normalization they'll score is
0:20:47not close to zero
0:20:49the cost for off test
0:20:52i-vectors before estimators and has provided by the extract all
0:20:58is the to zero point three
0:21:02where is a vector of test not used for training the lda factor analyses so
0:21:08there is a shift only not only for mean
0:21:11but only four
0:21:13this problem of almost instantly
0:21:19just one quick what i just missed your point when you said
0:21:23i think you were saying that
0:21:27trying to make the det spherically distributed you thought was inconsistent with being gaussian
0:21:32why's that
0:21:36its empirical but the gaussian high dimensional space are sphere
0:21:41yes some very
0:21:45but
0:21:47we constrained speaker fact all floral
0:21:52and sphere
0:21:53the just a goat
0:21:55to assume that the within class but not be a set of the problem but
0:22:03we will be affected
0:22:05by the position
0:22:07writings the posterior
0:22:09the prior distribution of the i-vectors
0:22:11zero mean unit identity rate both in high dimensional space that will be approximate
0:22:19so that that's i mean that's care what happened i not as mathematically what a
0:22:23high dimensional space so why's it in its just
0:22:28here we actually a lot of what's a spherical distribution for phase as well
0:22:38and applying model with the quality of correlators is a difficult the surface
0:22:44maybe a see that length normalization is a whole technique projects on the sphere
0:22:50instead of adjusting the tanks taking the information i think
0:22:59good but which
0:23:02but not so
0:23:05discussion