0:00:15 | how do |
---|
0:00:16 | so i reference investigations about discriminative training |
---|
0:00:22 | applied to vectors i-vectors that have been probably normalized |
---|
0:00:28 | shown us the system on which focus |
---|
0:00:33 | says using more i-vector based system first cognition |
---|
0:00:37 | who is normalisation within class covariance the next normalization |
---|
0:00:42 | then modeling notion p lda modeling providing parameters |
---|
0:00:48 | me mean value mean mu and covariance matrices |
---|
0:00:52 | and llr score |
---|
0:00:57 | some works have been point one of the two |
---|
0:01:01 | optimize parameters of this modeling be lda modeling |
---|
0:01:06 | by using a discriminative the way |
---|
0:01:09 | this discriminative classifiers use the logistic regression |
---|
0:01:15 | maximisation |
---|
0:01:16 | applying to score conditions of p lda |
---|
0:01:21 | or for one to period parameters |
---|
0:01:27 | statistics |
---|
0:01:30 | the goal here is to have the new step an additional step to the normalization |
---|
0:01:36 | procedure |
---|
0:01:37 | which doesn't modifies the distance between i-vectors |
---|
0:01:41 | unlike maximization em within class and then into constraints a discriminative training |
---|
0:01:49 | once the and this additional no posted you |
---|
0:01:52 | is carried out it's possible to |
---|
0:01:56 | train the discriminative classifier with limited order of questions to optimize records that |
---|
0:02:03 | as the older of questions to optimize by discriminative way |
---|
0:02:08 | the core to z-score all of the dimension of the i-vector |
---|
0:02:13 | then we carry out to the state-of-the-art logistic regression based |
---|
0:02:18 | discriminative training |
---|
0:02:19 | and also a new approach that for two hours and also norman discriminative classifier |
---|
0:02:25 | which is a novel tint |
---|
0:02:28 | first from addition the mattress |
---|
0:02:32 | using the f e |
---|
0:02:35 | is assumed to be statistically |
---|
0:02:38 | statistically independent of t i s and the sit on |
---|
0:02:42 | of the is constrained to lie in are line or in our own shove |
---|
0:02:50 | the eigenvoice subspace |
---|
0:02:53 | then a new zones comments about two weeks |
---|
0:02:56 | long dot is four |
---|
0:03:00 | the most commonly used mode and fourteen year |
---|
0:03:04 | in speaker recognition |
---|
0:03:07 | so the at all score can be written as the second degree polynomial function |
---|
0:03:11 | of components of the two vectors of the trial w |
---|
0:03:15 | and the value chain |
---|
0:03:17 | which is can be written |
---|
0:03:20 | all sonically out with marcus is p and q |
---|
0:03:28 | we call that the state-of-the-art two days |
---|
0:03:31 | was duration based |
---|
0:03:33 | discriminative classifiers |
---|
0:03:35 | try to optimize coefficients initialize bar be lda modeling |
---|
0:03:42 | the use of as a low probability of correctly classifying or training |
---|
0:03:48 | target as target non-target just target trials cold to tell cross entropy |
---|
0:03:55 | by using gradient descent respect to some coefficients |
---|
0:03:59 | the coefficients |
---|
0:04:01 | that have to be maximized can be |
---|
0:04:03 | is the period and it a score coefficients |
---|
0:04:06 | so i do not missus p and q |
---|
0:04:09 | previous slide |
---|
0:04:11 | and following this way we propose a bible get an hour and so on |
---|
0:04:16 | there are score can be written |
---|
0:04:18 | as a dot product |
---|
0:04:20 | between and expanded vector of trial |
---|
0:04:23 | and the i-vector w use it is initialized with purely parameters |
---|
0:04:30 | but books from a marketing proposed in two thousand |
---|
0:04:34 | thirteen two |
---|
0:04:36 | optimize purely a parameters mean value |
---|
0:04:40 | eigenvoice subspace the mattress |
---|
0:04:43 | three and nuisance variability matrix lambda |
---|
0:04:48 | by using this |
---|
0:04:50 | to tell cross entropy |
---|
0:04:51 | function |
---|
0:04:56 | discriminative training consider from those limitations of the recall that i since it is in |
---|
0:05:00 | c |
---|
0:05:01 | overfitting |
---|
0:05:02 | overfitting on development data |
---|
0:05:05 | and the respect of is about a made a conditions |
---|
0:05:09 | matrices of covariance must be positive |
---|
0:05:14 | the night the night |
---|
0:05:16 | and the mattress experience you to the negative or positive |
---|
0:05:21 | the condition right |
---|
0:05:22 | so |
---|
0:05:23 | some solutions have been proposed |
---|
0:05:27 | constrained discriminative training |
---|
0:05:30 | attempt to train only a small amount of parameters |
---|
0:05:33 | for their |
---|
0:05:35 | d where these the dimension of the i-vector |
---|
0:05:37 | or then address instead of this call |
---|
0:05:42 | so it shows proposed for example by wrote in and all |
---|
0:05:46 | as your own box to mark screen |
---|
0:05:48 | optimize only some coefficients for each dimension of the i-vector |
---|
0:05:53 | and also for which a counts like make up scroll |
---|
0:06:02 | sure you |
---|
0:06:04 | can see that the scores composes some of |
---|
0:06:08 | so what terms |
---|
0:06:10 | it is possible to optimize the problem it coefficients for |
---|
0:06:14 | each |
---|
0:06:16 | bottom system |
---|
0:06:21 | also only mean vector or |
---|
0:06:24 | and eigenvalues of peeling matrices |
---|
0:06:27 | can be train and we optimize it when the scaling factor also on the fact |
---|
0:06:32 | of all |
---|
0:06:33 | a unique or scholar for each matrix |
---|
0:06:39 | it's possible so as to what we singular value decomposition of p into four parameters |
---|
0:06:44 | to respect them it and it to parameter conditions |
---|
0:06:50 | if it is gonna teach training |
---|
0:06:53 | as the probably in the interesting results when i-vector we'll not normalized |
---|
0:06:58 | it struggles to improve |
---|
0:07:00 | speaker detection one i-vector have been first normalized |
---|
0:07:04 | whereas assumption that she's the best performance |
---|
0:07:09 | and represents all the additional normally the simplicity on the screen |
---|
0:07:14 | propose an intended to constrain the discriminative training |
---|
0:07:19 | recall that after within class covariance matrix w is a topic |
---|
0:07:25 | after links number normalisation it has been shown that it remains |
---|
0:07:30 | almost exactly isn't to pick |
---|
0:07:32 | i mean and identity matrix in light bias colour |
---|
0:07:37 | we propose just two |
---|
0:07:40 | to rotation by z eigenvector basis of between class covariance matrix b of the training |
---|
0:07:45 | dataset |
---|
0:07:46 | computed over decomposition of b |
---|
0:07:49 | and we apply is matrix of eigen vectors of be to each i-vector or |
---|
0:07:56 | training or test |
---|
0:07:58 | this is very simple person doesn't twenty four distance between i-vectors |
---|
0:08:03 | so that doesn't deterministic matrices b is diagonal the value remains almost expected is a |
---|
0:08:09 | true peak |
---|
0:08:11 | and therefore they are not |
---|
0:08:13 | because it b eigenvector basis is also going or |
---|
0:08:16 | we assume |
---|
0:08:18 | okay point is that we assume that building matrices from transposed and number become almost |
---|
0:08:23 | they're going out of and then these all topic for longer |
---|
0:08:27 | as a consequence is the mattresses of score involved in the air of scorpions you |
---|
0:08:32 | almost signal |
---|
0:08:36 | moreover as the solution of lda is |
---|
0:08:39 | most exactly |
---|
0:08:41 | according to the subspaces just a convict also be |
---|
0:08:45 | "'cause" they were doing that is almost exactly equal to |
---|
0:08:48 | i up to constant negative constant |
---|
0:08:52 | so the first components of i-vector also proximity the projects them into the ldr also |
---|
0:08:57 | space |
---|
0:09:00 | so the score can be written as isomorph |
---|
0:09:04 | allpass one down |
---|
0:09:08 | that's there is a one ton for each dimension of the i-vector |
---|
0:09:12 | and we |
---|
0:09:14 | the other things are what is your turn |
---|
0:09:17 | or is it i z off diagonal terms of the initial scoring |
---|
0:09:22 | all the diagonal terms be on the asked to mention |
---|
0:09:25 | and the offsets |
---|
0:09:29 | so stressed and another proportion of a between zero score can be concentrated into this |
---|
0:09:34 | song of all |
---|
0:09:36 | terms one for each |
---|
0:09:38 | dependent of independent |
---|
0:09:39 | terms |
---|
0:09:42 | here is an analysis of purely parameters before and after this with addition |
---|
0:09:46 | and we modules the dignity always entropy of the matrices |
---|
0:09:52 | value of maximal of one indicates that not expect exactly diagonal |
---|
0:09:58 | we can see that after the right after |
---|
0:10:02 | dissertation |
---|
0:10:03 | all the value or a close to one |
---|
0:10:05 | whose nearly matrices are very close to be diagonal |
---|
0:10:09 | and also score metrics |
---|
0:10:11 | and women's you result of p |
---|
0:10:14 | so lofty lda by using some functions projection |
---|
0:10:19 | distance between projects and then |
---|
0:10:21 | sure the |
---|
0:10:23 | matrix |
---|
0:10:24 | aspects |
---|
0:10:25 | and we see that and i is the most exactly the topic |
---|
0:10:30 | to misuse the negligible or |
---|
0:10:33 | part |
---|
0:10:34 | assume that of for that you're violence we |
---|
0:10:36 | compute on the last line table |
---|
0:10:39 | the rest should between the violence |
---|
0:10:42 | of the residual term and the variances along scroll |
---|
0:10:46 | and we can see that after a four |
---|
0:10:48 | manner |
---|
0:10:50 | female |
---|
0:10:50 | training set values and i close to zero |
---|
0:10:55 | in terms of performance |
---|
0:10:57 | we can possibly lda full baseline with the as a simplified scoring |
---|
0:11:01 | in which we have removed |
---|
0:11:05 | was it your term can see that's was it's a single |
---|
0:11:08 | there is a d or don't of no |
---|
0:11:12 | or |
---|
0:11:13 | the plate of or in the speaker detection |
---|
0:11:18 | so we can |
---|
0:11:20 | carrier to discriminative training applied to the vectors |
---|
0:11:26 | first a state-of-the-art logistic regression based |
---|
0:11:30 | first approach following buggered |
---|
0:11:33 | and are also then it is an interesting coefficient is the schematic training can be |
---|
0:11:38 | performed by optimising |
---|
0:11:42 | vector omega |
---|
0:11:44 | score is a dot product between an expanded vectors trial given two i-vectors |
---|
0:11:51 | you're marking on that the score can be written |
---|
0:11:54 | as vector or of the auto |
---|
0:11:58 | all that's and the steed off although this war owens initial |
---|
0:12:04 | descriptive training |
---|
0:12:07 | so one way second approach is based on works of books from one mike rate |
---|
0:12:13 | and can be remarked that as a matter this is a close to be diagonal |
---|
0:12:18 | there are close as you to their eigenvalue |
---|
0:12:22 | a diagonal matrix |
---|
0:12:23 | and so we perform following boxed on my we only |
---|
0:12:28 | performance measures training |
---|
0:12:31 | intended to optimize as a diagonal off if you transposed the scout are of long |
---|
0:12:37 | vowel |
---|
0:12:38 | and the mean value me |
---|
0:12:44 | then will introduce no anomaly an alternative to the logistic regression |
---|
0:12:50 | discriminative training |
---|
0:12:55 | we define a is spectral |
---|
0:12:59 | expanded vector or score of the trial |
---|
0:13:02 | i was all this one |
---|
0:13:05 | spectral where like to all |
---|
0:13:08 | with a one |
---|
0:13:10 | component for each dimension of cd |
---|
0:13:13 | eigenvoice subspace and the last component which is |
---|
0:13:18 | so was it your terms |
---|
0:13:21 | so the score is equal to this vector or dot product of this data and |
---|
0:13:25 | of a vector of ones |
---|
0:13:28 | the goal here is to replace this |
---|
0:13:31 | unique normal spectral |
---|
0:13:32 | the problem vector by the buses |
---|
0:13:35 | basis of discriminant axes are extracted by using fisher project |
---|
0:13:40 | then i |
---|
0:13:41 | we have extracted in |
---|
0:13:43 | one can but not one but |
---|
0:13:45 | several vectors we have to combine these buses |
---|
0:13:48 | basis of the control to fronted the unique normal a vector |
---|
0:13:53 | needed by speaker detection |
---|
0:13:58 | so we can use a one woman shucked italian two |
---|
0:14:02 | extract as the disk a discriminant axes |
---|
0:14:07 | in this space of expanded vector |
---|
0:14:11 | so we can see there are data set comprised of for trials target and non-target |
---|
0:14:15 | trials |
---|
0:14:17 | for each of one of those of them we |
---|
0:14:20 | by the expanded vector all |
---|
0:14:23 | of the destroyer |
---|
0:14:25 | so in these datasets we can compute the constrain the dimension |
---|
0:14:31 | we can compute the statistics of trial or a target and non-target trials |
---|
0:14:37 | the within class between class covariance matrices of |
---|
0:14:41 | this dataset |
---|
0:14:45 | in this case of two class classifier target non-target and we can extract is taxes |
---|
0:14:51 | you maximizing the fisher criterion |
---|
0:14:54 | of a question nine |
---|
0:15:01 | problem |
---|
0:15:02 | since you understand what the problem |
---|
0:15:05 | with two class |
---|
0:15:07 | the |
---|
0:15:08 | between just middle east forms one so we can only |
---|
0:15:12 | extractor one non you're |
---|
0:15:14 | value |
---|
0:15:16 | one axis only can be extracted because we are |
---|
0:15:21 | limit of is the number of class |
---|
0:15:25 | but some time ago we get a random it or of proposed them in order |
---|
0:15:30 | to extract marxism class is like using the fisher we do i am so different |
---|
0:15:35 | as middle bars also normal discriminative classifier |
---|
0:15:39 | since you was use the sometimes in face to face recognition |
---|
0:15:46 | to |
---|
0:15:47 | two cells and |
---|
0:15:49 | researchers use it in those errors |
---|
0:15:52 | the idea is in a given in this other reason we then a training corpus |
---|
0:15:56 | td off expanded vectors |
---|
0:15:59 | of scroll trial |
---|
0:16:01 | target non-target trials |
---|
0:16:03 | we compute the statistics we compute is are extracted vector maximize |
---|
0:16:10 | which maximizes as official italian |
---|
0:16:13 | and born as |
---|
0:16:15 | we project the data set onto the orthogonal subspace of is a vector |
---|
0:16:20 | so we extract a vector we have the background and we |
---|
0:16:24 | project data on the aeroplane of this electoral |
---|
0:16:31 | and we t right so we can extract more taxes |
---|
0:16:35 | then |
---|
0:16:37 | class classes |
---|
0:16:41 | can be that is that fisher returns the geometrical approach which doesn't need |
---|
0:16:48 | assumptions of ago sanity for vector corresponding latent all schools |
---|
0:16:53 | i'm not |
---|
0:16:55 | additionally |
---|
0:16:56 | distributed |
---|
0:16:58 | i can be shown that they follow independent each component of expanding score for one |
---|
0:17:04 | c dimension following dependent non sound toolkit you distributions with distant parameters |
---|
0:17:10 | for target trials and non-target trials |
---|
0:17:14 | can be more supposing that if you |
---|
0:17:17 | carry out an experiment using expanded vectors course whiskey to distribution |
---|
0:17:24 | we obtain exactly the sandwich you |
---|
0:17:26 | then we select a loss the idea that off cool |
---|
0:17:30 | because if you chew |
---|
0:17:32 | does not |
---|
0:17:33 | a new informations |
---|
0:17:36 | extract i-vectors of standard normal prior |
---|
0:17:40 | so this is a |
---|
0:17:41 | the we to put in a multifunctional score |
---|
0:17:44 | for look at you |
---|
0:17:46 | so that was on the same |
---|
0:17:49 | but if we use this method to extract a try to extract the |
---|
0:17:54 | discriminant axis |
---|
0:17:57 | or an menstrual to address is to combine this subspace of |
---|
0:18:02 | discriminant |
---|
0:18:03 | axis to |
---|
0:18:05 | to obtain the unique |
---|
0:18:07 | normal vector are needed by speaker detection we need only |
---|
0:18:11 | one vector to apply |
---|
0:18:14 | so we have to find weights to |
---|
0:18:18 | applied to each |
---|
0:18:19 | also no discrete on tech vectors |
---|
0:18:25 | that's proposed |
---|
0:18:27 | weights equal to the norms the spectral |
---|
0:18:30 | because by this way it can be shown that the variance of scores off |
---|
0:18:34 | the |
---|
0:18:37 | the axis |
---|
0:18:38 | i don't iteration |
---|
0:18:41 | the variance is decreasing |
---|
0:18:43 | and so this is this missile is similar to a singular value decomposition |
---|
0:18:48 | in which we extract the |
---|
0:18:51 | most important axes in terms of variability of scroll then |
---|
0:18:56 | the others |
---|
0:18:58 | with decreasing violence and remark that at the end |
---|
0:19:02 | the impact of the lasts and are |
---|
0:19:06 | discriminant vectors is negligible or in this in the score |
---|
0:19:11 | so |
---|
0:19:14 | question ten show that to a trial we can have to rotation by be computed |
---|
0:19:20 | expanded vector of g i g between two i-vectors |
---|
0:19:24 | and the price of the product |
---|
0:19:27 | of cs benedict always is |
---|
0:19:31 | discriminant axes with seizes is |
---|
0:19:33 | weighted sum of fisher could tie on |
---|
0:19:37 | axis |
---|
0:19:40 | for task training event if the dimension of expanded vector |
---|
0:19:46 | is folder or do you can not disk or |
---|
0:19:50 | we can of more than one hundred millions of non-target |
---|
0:19:56 | trials |
---|
0:19:57 | and since we have to compute the covariance matrix of |
---|
0:20:01 | set of more than |
---|
0:20:03 | and |
---|
0:20:05 | so i four hundred |
---|
0:20:07 | billions |
---|
0:20:09 | trials |
---|
0:20:11 | we can parameterize just cores that others statistics of |
---|
0:20:17 | the training set |
---|
0:20:18 | if we but make a pass training of the system things that can be expressed |
---|
0:20:22 | as linear combinations |
---|
0:20:24 | of statistics of subsets |
---|
0:20:26 | so it's possible to split the task |
---|
0:20:31 | i don't for experiments to split the task of computation of this you which |
---|
0:20:38 | current training dataset |
---|
0:20:41 | another remark |
---|
0:20:44 | which was not and done by the also has a nice old |
---|
0:20:48 | i |
---|
0:20:50 | the nist needs |
---|
0:20:51 | vertically to project data onto a to one answer space |
---|
0:20:55 | at each iteration |
---|
0:20:56 | and also if you are |
---|
0:20:59 | billions of data it's very long but the paper was an unruly to me |
---|
0:21:06 | extract i-vectors without |
---|
0:21:10 | the concern of projecting data at each iteration only by updating statistics |
---|
0:21:16 | it is possible to extract i-vectors without |
---|
0:21:19 | are effective |
---|
0:21:21 | where are projection of data at each iteration |
---|
0:21:26 | lines use |
---|
0:21:28 | of z recognition five |
---|
0:21:33 | of phone is the sorry the two thousand ten telephone extended |
---|
0:21:40 | with a vector provided by |
---|
0:21:44 | borrow university of technology so santana |
---|
0:21:47 | so as an eleven |
---|
0:21:49 | thanks to on the chernotsky and of a month ago |
---|
0:21:53 | for male set and from a set |
---|
0:21:55 | and of the first line for h and i is the baseline |
---|
0:22:00 | p lda |
---|
0:22:02 | first as the two approaches using logistic regression on coefficient of score of punitive parameters |
---|
0:22:09 | and the fourth line easier or something more discriminative classifier |
---|
0:22:15 | we can see first that logistic regression there is the approach is frightening improving the |
---|
0:22:20 | performance of p lda |
---|
0:22:23 | it's why that's why the of the weighting because the incentives the cup |
---|
0:22:30 | the corresponding is constrained |
---|
0:22:34 | maybe overfitting on data all |
---|
0:22:38 | although i don't know |
---|
0:22:40 | and as the results are not better than p lda |
---|
0:22:45 | maybe asked other links normalisation a vector r |
---|
0:22:50 | go shown |
---|
0:22:51 | it proves gaussianity |
---|
0:22:53 | and seuss logistic regression is enabled maybe |
---|
0:22:56 | to improve a getting |
---|
0:22:59 | the performance |
---|
0:23:01 | we remark that was more discriminative classifier is able to improve performance in terms of |
---|
0:23:08 | equal error rate |
---|
0:23:09 | and see it at all |
---|
0:23:12 | for all send us more than female |
---|
0:23:17 | not that's a to take into account and distortions in the television on the critical |
---|
0:23:22 | original false alarms |
---|
0:23:24 | it's able to learn or the only on is trials provide things the highest |
---|
0:23:32 | as a non-target trials providing the highest schools |
---|
0:23:38 | with the dentist and highest non-target |
---|
0:23:42 | trial scores |
---|
0:23:44 | we trained the thirty two |
---|
0:23:47 | be bitter done with or |
---|
0:23:50 | so the non-target set |
---|
0:23:56 | what is the recent speaker in the one and to silence |
---|
0:24:01 | you know evaluation which is a good way to assess what business of an approach |
---|
0:24:04 | covers the conditions are not controlled |
---|
0:24:08 | i'm with the real version noise short duration and mixing |
---|
0:24:12 | male female |
---|
0:24:14 | we can see that visit hardly are i-vector of |
---|
0:24:19 | that or d is able to improve slightly performance of p lda |
---|
0:24:26 | not just sets present indicated |
---|
0:24:30 | on all those of the |
---|
0:24:32 | official score board there are more suited our cruise the channels and their or and |
---|
0:24:37 | we applaud |
---|
0:24:39 | or this cost |
---|
0:24:40 | well in don't not correctly calibrate |
---|
0:24:45 | the discourse the development set |
---|
0:24:48 | and so as a result |
---|
0:24:51 | two versions |
---|
0:24:54 | future works well working on short duration of the utterance of a team use a |
---|
0:24:59 | desirable to improve slightly or |
---|
0:25:02 | sometimes more |
---|
0:25:03 | others ple baseline |
---|
0:25:06 | and particulars the speaker variabilities system issue is not very accurate |
---|
0:25:13 | as |
---|
0:25:14 | the ones for short duration |
---|
0:25:16 | and the also on i-vector like representations |
---|
0:25:22 | following |
---|
0:25:24 | whole v are which propose them |
---|
0:25:27 | to extract a lower want to probability factors for speaker diarization |
---|
0:25:32 | by using deep neural networks |
---|
0:25:35 | we showed that is p lda framework a is able to texas |
---|
0:25:42 | a new representation |
---|
0:25:45 | and to deal with system in addition |
---|
0:25:50 | thank you |
---|