0:00:16 | Changhuai |
---|
0:00:17 | you Haizhou Li |
---|
0:00:21 | and Ambikairajah Kong Aik Lee and |
---|
0:00:25 | oh |
---|
0:00:27 | presented by |
---|
0:00:48 | good afternoon every one |
---|
0:00:51 | the paper i would like to present is entitled |
---|
0:00:53 | Bhattacharyya based gmm |
---|
0:00:56 | SVM system with adaptive from |
---|
0:00:59 | relevance factor for pair language recognition |
---|
0:01:06 | and outline |
---|
0:01:07 | oh for this presentation is shown here |
---|
0:01:11 | in this pair language recognition system and we major focus by using |
---|
0:01:17 | a studying the three |
---|
0:01:20 | techniques including Bhattacharyya based gmm svm |
---|
0:01:25 | an adaptive relevance factor as well as strategies for pair language recognition |
---|
0:01:34 | given a specified language pair the task of |
---|
0:01:38 | recognition of |
---|
0:01:39 | language pair is to decide which of these |
---|
0:01:43 | two languages is in fact spoken in the specified in a given segment |
---|
0:01:50 | so we develop pair language recognition systems by studying bhattacharyya base gmm svm |
---|
0:01:59 | by introducing mean supervector and the covariance supervector and we merge these two kind of |
---|
0:02:07 | sub kernels together to form a better performance |
---|
0:02:12 | for this |
---|
0:02:13 | a hybrid system |
---|
0:02:15 | we also |
---|
0:02:17 | in order to compensate those duration effect |
---|
0:02:21 | and we introduce adaptive relevance factors |
---|
0:02:25 | alright |
---|
0:02:27 | of |
---|
0:02:27 | and MAP in gmm svm systems |
---|
0:02:31 | and for the purpose of pair language recognition we introduce two set of strategies |
---|
0:02:39 | for this a big |
---|
0:02:40 | condition purpose |
---|
0:02:42 | and we report our system design |
---|
0:02:45 | for each progress |
---|
0:02:47 | for LRE twenty eleven submission |
---|
0:02:54 | so |
---|
0:02:56 | in a speaker and language recognition system normally |
---|
0:03:01 | and there are two typical kernals for gmm svm they are |
---|
0:03:07 | kullback leibler kernel and bhattacharyya kernel |
---|
0:03:11 | used |
---|
0:03:12 | conventional kl kernel only includes mean information |
---|
0:03:19 | for recognition that modeling |
---|
0:03:22 | however |
---|
0:03:23 | a Symmetrized version of the k l |
---|
0:03:27 | can extend |
---|
0:03:28 | it to include the covariance term |
---|
0:03:33 | here |
---|
0:03:38 | so why we choose |
---|
0:03:40 | Bhattacha ryya based kernel for language pair |
---|
0:03:44 | recognition |
---|
0:03:46 | so based on many experiments |
---|
0:03:50 | for speaker and language recognition systems |
---|
0:03:54 | we observed the bhattacharyya based kernel has better performance than k. l. |
---|
0:04:01 | so |
---|
0:04:02 | in the bhattacharya kernel |
---|
0:04:05 | there are |
---|
0:04:07 | this kernal actually could be splitted |
---|
0:04:09 | can be splitted into three terms the first term |
---|
0:04:13 | can contribute is contributed by mean and covariance of |
---|
0:04:18 | gmm |
---|
0:04:21 | and the second term |
---|
0:04:22 | involves the covariance term only the third term is |
---|
0:04:27 | involves weight but |
---|
0:04:29 | parameter of gmm only |
---|
0:04:32 | so actually these three terms can be independently used to give |
---|
0:04:37 | the recognition decision score |
---|
0:04:40 | with different degree of information contribution |
---|
0:04:46 | so by using the first term of the Bhattacharyya kernel |
---|
0:04:50 | so with |
---|
0:04:51 | keeping covariance |
---|
0:04:54 | not updated |
---|
0:04:55 | that |
---|
0:04:56 | we can get the mean supervector train |
---|
0:04:59 | stress |
---|
0:05:00 | so |
---|
0:05:02 | so these kind of kernel could be independently used as a sub |
---|
0:05:08 | modeling |
---|
0:05:10 | and then |
---|
0:05:11 | second term only includes the covariance term |
---|
0:05:14 | ah so we can get the |
---|
0:05:18 | covariance supervectors from this term |
---|
0:05:21 | we only use |
---|
0:05:22 | the first two terms of the bhattacharyya kernel |
---|
0:05:26 | for our |
---|
0:05:28 | for our pair language recognition |
---|
0:05:31 | system design |
---|
0:05:35 | so the NAP for both |
---|
0:05:39 | a mean supervector and the covariance supervector of Bhattacharyya |
---|
0:05:47 | are trained by using different |
---|
0:05:49 | a database with |
---|
0:05:51 | a certain amount of overlap |
---|
0:05:53 | this purpose is to |
---|
0:05:56 | oh |
---|
0:05:57 | to increase those compensation factors |
---|
0:06:03 | so for this UBM database and the |
---|
0:06:07 | relevance factor database training |
---|
0:06:10 | we can |
---|
0:06:12 | use the common to both |
---|
0:06:15 | supervector mean and covariance |
---|
0:06:21 | so in order to compensate duration variability we introduce adaptive relevance factor |
---|
0:06:28 | sure |
---|
0:06:29 | and this adaptive relevance factor of MAP |
---|
0:06:33 | in gmm svm |
---|
0:06:35 | here we show the MAP position |
---|
0:06:38 | in gmm svm system |
---|
0:06:41 | so this equation is the mean updated |
---|
0:06:46 | of MAP |
---|
0:06:48 | so here the x_i is the first of sufficient |
---|
0:06:52 | statistic statistics |
---|
0:06:54 | so you can see the relevance factor gamma_i can indirectly affect the degree of update |
---|
0:07:02 | for the mean vectors of gmm |
---|
0:07:06 | so |
---|
0:07:09 | so we assume |
---|
0:07:12 | once we |
---|
0:07:13 | we have this relevance factor be a function of duration it is possible to do |
---|
0:07:19 | some compensation work |
---|
0:07:21 | in this |
---|
0:07:24 | mean update |
---|
0:07:27 | so far there are two types of relevance factors |
---|
0:07:30 | one is in the classical MAP |
---|
0:07:34 | usually we use fixed value of relevance factor |
---|
0:07:38 | so the relevance factor also can be data dependence by this question |
---|
0:07:45 | this equation is derived from |
---|
0:07:48 | from the factor analysis research |
---|
0:07:53 | here the phi is a diagonal matrix that can be trained by using development database |
---|
0:08:01 | so assume this relevance factor be function of k. is related to the number of |
---|
0:08:09 | features that is connected to duration |
---|
0:08:14 | so we can see the occupation |
---|
0:08:18 | count N_i |
---|
0:08:20 | we do the expectation |
---|
0:08:22 | on this occupation count and we can see this |
---|
0:08:26 | the expectation of the occupation count is directly |
---|
0:08:30 | proportional to proportional to the durations |
---|
0:08:34 | so if we choose this function as the duration function for |
---|
0:08:43 | for the relevance factor so we can have expectation of adaptation coefficient |
---|
0:08:51 | of MAP mean adaptation trends to a constant vector so we can get this |
---|
0:08:58 | adaptive relevance factor by this equation |
---|
0:09:03 | so this equation will result in |
---|
0:09:06 | g.m.m. being independent of duration |
---|
0:09:13 | now we go to the third point of our presentation |
---|
0:09:17 | we propose two strategies for pair language recognition the first one is one |
---|
0:09:25 | to all strategy |
---|
0:09:27 | also called core to pair modeling |
---|
0:09:32 | this modeling means we train gmm svm models for certain |
---|
0:09:38 | target language against all other target languages |
---|
0:09:42 | so we can have the score vectors here |
---|
0:09:45 | with this score vector and by using our development database for all the target |
---|
0:09:53 | languages and we can have the back |
---|
0:09:56 | the gaussian backend modelings |
---|
0:09:58 | for this the end |
---|
0:10:01 | for these N languages |
---|
0:10:04 | so |
---|
0:10:04 | finally |
---|
0:10:07 | and language pair scores can be obtained |
---|
0:10:10 | through the log likelihood ratios shown here |
---|
0:10:16 | so the second |
---|
0:10:17 | strategy is a pairwise strategy also called pair modeling |
---|
0:10:22 | this modeling is very simple just use |
---|
0:10:28 | two languages' database from the language pair |
---|
0:10:31 | directly train the model of gmm svm and we get |
---|
0:10:36 | this modeling |
---|
0:10:38 | and we get |
---|
0:10:39 | the scores |
---|
0:10:44 | for the fusion of the two strategies |
---|
0:10:46 | we only apply equal weights |
---|
0:10:48 | for this |
---|
0:10:50 | that means we assume |
---|
0:10:52 | that importance of the two strategies |
---|
0:10:54 | are the same |
---|
0:10:55 | so we get the final score by fusion the two strategies |
---|
0:11:03 | here we show a hybrid |
---|
0:11:05 | pair language recognition system |
---|
0:11:10 | we get the test utterance we can have |
---|
0:11:13 | Bhattacharyya mean supervector and covariance supervector |
---|
0:11:19 | together input to |
---|
0:11:21 | the two |
---|
0:11:22 | strategies |
---|
0:11:24 | and we get the merging of the two supervectors in each of the |
---|
0:11:31 | strategies |
---|
0:11:33 | finally we fusion these two strategies together and we get the final score |
---|
0:11:42 | we do the evaluation for our |
---|
0:11:45 | pair language recognition design |
---|
0:11:47 | by using |
---|
0:11:49 | NIST LRE 2011 platform |
---|
0:11:53 | here there are twenty-four target languages so totally |
---|
0:11:58 | there are |
---|
0:12:00 | two hundred and seventy six language pairs |
---|
0:12:03 | so we choose |
---|
0:12:05 | five hundred and twelve Gaussian components for gmm |
---|
0:12:09 | and ubm and |
---|
0:12:11 | oh we |
---|
0:12:13 | do these experiments |
---|
0:12:16 | and show the results based on thirty second task in this paper |
---|
0:12:22 | but we also do other duration parts in our experiments |
---|
0:12:29 | so here we use eighty dimensions MFCC SDC |
---|
0:12:36 | and this MFCC SDC features |
---|
0:12:39 | with energy based vad |
---|
0:12:42 | and the performance is computed |
---|
0:12:45 | as average cost |
---|
0:12:47 | for the N worst language pairs |
---|
0:12:51 | here we list |
---|
0:12:52 | the training data base |
---|
0:12:54 | for both CTS and BNBS |
---|
0:12:59 | set |
---|
0:13:02 | for our language pair recognition training |
---|
0:13:06 | now we show the experiment results |
---|
0:13:09 | by comparing firstly we compare the fixed relevance factor and adaptive relevance factor |
---|
0:13:16 | effect |
---|
0:13:17 | the table one shows |
---|
0:13:19 | under |
---|
0:13:21 | the core to pair |
---|
0:13:22 | strategy we show |
---|
0:13:25 | this |
---|
0:13:27 | fixed relevance factor set to three different |
---|
0:13:31 | value zero point two five eight thirty two and we give |
---|
0:13:36 | the eer and the minimum cost |
---|
0:13:39 | here and compare with arf that is |
---|
0:13:43 | adaptive relevance factor and we compare these two |
---|
0:13:48 | compare these data we can say |
---|
0:13:54 | the adaptive relevance facotr performs |
---|
0:13:56 | better than any of the |
---|
0:13:59 | fixed relevance factor settings |
---|
0:14:02 | so the similar observations |
---|
0:14:04 | found |
---|
0:14:08 | in this pair strategy |
---|
0:14:11 | here and say twelve point |
---|
0:14:13 | seven five percent for |
---|
0:14:15 | in terms of eer |
---|
0:14:17 | and |
---|
0:14:19 | and the other one is higher one |
---|
0:14:22 | with the relevance factor settings |
---|
0:14:28 | the second experiment we are doing |
---|
0:14:30 | is for |
---|
0:14:32 | the effect of the merging. |
---|
0:14:34 | the two sets of supervectors |
---|
0:14:36 | mean supervector and covariance supervector |
---|
0:14:40 | the blue color means the mean supervector |
---|
0:14:44 | the green color we present |
---|
0:14:46 | the |
---|
0:14:48 | Bhattacharyya covariance |
---|
0:14:49 | supervector with eighty dimension |
---|
0:14:52 | MFCC sdc features |
---|
0:14:54 | and arf is adaptive relevance factor |
---|
0:14:58 | so we |
---|
0:14:59 | we do this experiment |
---|
0:15:02 | under |
---|
0:15:03 | core to pair strategy and we show the red |
---|
0:15:10 | color |
---|
0:15:11 | this merging effect |
---|
0:15:12 | in the red color and we can see |
---|
0:15:15 | performance is obviously |
---|
0:15:18 | over the previous one that's mean and covariance |
---|
0:15:26 | this figure is based on |
---|
0:15:28 | N top |
---|
0:15:29 | language pairs that is |
---|
0:15:33 | the worst |
---|
0:15:36 | performance of EER |
---|
0:15:38 | with N times N minus one divided by two |
---|
0:15:42 | language pairs |
---|
0:15:45 | so the similar |
---|
0:15:47 | results |
---|
0:15:50 | can be found in the |
---|
0:15:52 | pair strategies |
---|
0:15:54 | also the red color always |
---|
0:15:57 | so |
---|
0:15:59 | most of the language pairs is lower it gives |
---|
0:16:03 | lower minimum detection cost |
---|
0:16:10 | finally |
---|
0:16:11 | we will show the fusion effect |
---|
0:16:17 | with |
---|
0:16:19 | the two pairs |
---|
0:16:20 | the first one |
---|
0:16:22 | the blue one is core to pair and the green one is for the pair |
---|
0:16:27 | strategies after we merging this two strategies we can get the final results |
---|
0:16:34 | with eer of |
---|
0:16:36 | ten point |
---|
0:16:37 | something percent |
---|
0:16:38 | and the minimum cost is zero point zero nine |
---|
0:16:46 | oh we come to conclusions for my presentation we have developed a hybrid |
---|
0:16:52 | Bhattacharyya based gmm-svm system for pair language recognition |
---|
0:16:57 | for the purpose of LRE twenty eleven submission |
---|
0:17:03 | performance after the merge of |
---|
0:17:06 | mean supervector and covariance supervector is obvious |
---|
0:17:10 | we compare to the fixed relevance factor |
---|
0:17:14 | and we aobserved the adaptive relevance factor is effective |
---|
0:17:18 | for the pair language recognition |
---|
0:17:21 | and |
---|
0:17:22 | finally |
---|
0:17:24 | we can say the fusion of core to pair and pair strategies |
---|
0:17:29 | is useful |
---|
0:17:32 | here we show some reference papers especially for the first one from patrick kenny he |
---|
0:17:39 | proposed this database |
---|
0:17:41 | data dependent relevance factor |
---|
0:17:44 | thank you |
---|
0:18:11 | oh okay |
---|
0:18:14 | firstly we choose these |
---|
0:18:16 | mean and covariance super vectors |
---|
0:18:20 | this means we don't want to merge |
---|
0:18:24 | this mean and covariance informations in one kernel |
---|
0:18:29 | we want to separate it because we find if we separate it |
---|
0:18:35 | we may get better performance after merging these two |
---|
0:18:39 | supervectors together |
---|
0:18:44 | we ever compared it |
---|
0:18:49 | so that is when we |
---|
0:18:53 | is |
---|
0:18:53 | when we do the kernel with the first term and the second term merging together |
---|
0:18:59 | to produce only one kernel and compare with the separated kernels that is mean kernel |
---|
0:19:05 | and covariance kernel after that fusion together |
---|
0:19:12 | the latter effect is better |
---|
0:19:17 | okay |
---|
0:19:24 | oh |
---|
0:19:24 | okay |
---|
0:19:28 | that is |
---|
0:19:32 | i think at least |
---|
0:19:37 | because it is based on different training and testing environment |
---|
0:19:42 | and database |
---|
0:19:44 | so totally the effect is obvious |
---|
0:19:51 | oh |
---|