0:00:18 | okay |
---|
0:00:33 | okay |
---|
0:00:33 | um the paper them go to put the is cost we of put acoustic model weights for area |
---|
0:00:42 | um um will fall this up |
---|
0:00:44 | uh first that would discuss the overview of bootstrap |
---|
0:00:47 | and the rich structure in B S i as acoustic modeling framework |
---|
0:00:52 | uh and discuss the motivation why we do cost in and why did on full covariance |
---|
0:00:58 | um then i would discuss how to do the cost training clean two parts |
---|
0:01:02 | the distance |
---|
0:01:04 | an immense investigated |
---|
0:01:06 | uh including entropy should K L by i are S zero and channel |
---|
0:01:11 | and uh some cost in a groups |
---|
0:01:13 | proposed and investigated |
---|
0:01:17 | and then uh how we discuss the experimental results on proposed cost in algorithms |
---|
0:01:24 | and uh the experiment read out some bs S size with full rents for more |
---|
0:01:30 | um finally uh is conclusion and future extension |
---|
0:01:37 | okay uh |
---|
0:01:38 | let's have some back ground of the bootstrap based acoustic modeling |
---|
0:01:43 | uh so it is basically |
---|
0:01:45 | um |
---|
0:01:46 | send point a training data |
---|
0:01:49 | uh in |
---|
0:01:50 | and subsets sets |
---|
0:01:51 | raise each subset covers a fraction |
---|
0:01:54 | the of original data |
---|
0:01:56 | uh we combine all the data together to try and it is she with lda and semi-tied covariance |
---|
0:02:03 | and for each subset training any subset |
---|
0:02:07 | we perform E N training |
---|
0:02:09 | in parallel on and subsets for and edge and then |
---|
0:02:12 | so |
---|
0:02:13 | we have and models and way |
---|
0:02:16 | agree get it |
---|
0:02:17 | them together |
---|
0:02:19 | and um |
---|
0:02:21 | obviously it is very large |
---|
0:02:23 | uh but it performs so well |
---|
0:02:26 | a but the problem is it is |
---|
0:02:28 | to a large N the restructuring is needed |
---|
0:02:31 | so there are |
---|
0:02:32 | to to digits here |
---|
0:02:34 | for the structure in |
---|
0:02:37 | uh the first one is |
---|
0:02:39 | a a S close |
---|
0:02:41 | uh uh note strategy |
---|
0:02:43 | could choose trend i know covariance modelling in all the steps |
---|
0:02:48 | uh and the second one is |
---|
0:02:51 | chan full covariance model at |
---|
0:02:54 | the all the steps on T of the last step |
---|
0:02:58 | so |
---|
0:03:00 | he a full covariance cost is needed and as use can see from the framework |
---|
0:03:05 | a a clustering is a critical step |
---|
0:03:08 | and |
---|
0:03:09 | oh doing this |
---|
0:03:11 | um |
---|
0:03:12 | it can remove the redundancy and the scaled down the model and so that we can |
---|
0:03:18 | put it a mobile device |
---|
0:03:20 | and it is flexible |
---|
0:03:22 | but this is an advantage for the clustering train |
---|
0:03:25 | because you can channel lot model |
---|
0:03:27 | and scale down to a i was size without a new training |
---|
0:03:31 | and here full co variance class train is needed for the |
---|
0:03:35 | that's |
---|
0:03:36 | P S plus for to diagnose strategy |
---|
0:03:41 | okay |
---|
0:03:42 | uh so let's |
---|
0:03:43 | take a look at the this sense measurements for clustering |
---|
0:03:46 | um |
---|
0:03:47 | so wait |
---|
0:03:48 | investigate several a distance measurements including entropy should be |
---|
0:03:52 | which measures a change of in be up to two distributions marriage |
---|
0:03:56 | and the kl that averages could use a symmetric kl damages define it this form |
---|
0:04:02 | and the but i to |
---|
0:04:03 | this step |
---|
0:04:07 | and the |
---|
0:04:08 | S a or which is measures |
---|
0:04:10 | the overlap of two distributions |
---|
0:04:13 | uh but there is no close form even for multi right couches |
---|
0:04:17 | so of a relational approach is applied based on the channel feast S |
---|
0:04:23 | uh a channel of this function can be viewed as a higher bond |
---|
0:04:27 | oh of the S zero |
---|
0:04:29 | yeah it is defined in this form |
---|
0:04:32 | and that the but has higher uh |
---|
0:04:35 | this sense is a special case for channel function weights as you could to zero point five |
---|
0:04:41 | so |
---|
0:04:42 | a a how to opt and the channel of this is |
---|
0:04:45 | the details is elaborated in another paper |
---|
0:04:49 | a reference in number two |
---|
0:04:51 | uh a why you can apply new to on a with them |
---|
0:04:55 | a about do you have to opt to the first and second order of the derivation |
---|
0:04:59 | or or or or you can use that D revision free approach |
---|
0:05:02 | based on the analytical form of the channel function |
---|
0:05:08 | i |
---|
0:05:09 | okay just now we discuss a sense measurements |
---|
0:05:12 | and now a it is is a |
---|
0:05:13 | i'll client for the investigated all algorithms |
---|
0:05:17 | um so |
---|
0:05:18 | the cost in is |
---|
0:05:20 | a a one is based on the bottom up or also can be called agglomerative clustering |
---|
0:05:25 | is greedy |
---|
0:05:27 | and and that's this sense refinement is proposed to improve the speed |
---|
0:05:31 | and uh uh some non greedy approaches is also proposed |
---|
0:05:35 | two |
---|
0:05:36 | for a global optimisation including the case that look ahead |
---|
0:05:40 | and search the best pass |
---|
0:05:42 | i finally a two-pass strategy is |
---|
0:05:44 | to improve the model structure |
---|
0:05:47 | uh let's review the problem again um |
---|
0:05:50 | so we have |
---|
0:05:51 | and gaussian |
---|
0:05:53 | mixture model |
---|
0:05:55 | we uh it comes from T models |
---|
0:05:58 | and we want it to compress to and models |
---|
0:06:01 | that N gaussian |
---|
0:06:03 | so |
---|
0:06:04 | if not you it in entropy we want the the entropy minimize the to be change between |
---|
0:06:11 | F and G which is our target |
---|
0:06:14 | this is a global to my addition target |
---|
0:06:16 | however this is extremely hard to to often and then |
---|
0:06:20 | the conventional met so they is |
---|
0:06:22 | which time |
---|
0:06:23 | a two most a steamy the counts as um "'cause" and have a a |
---|
0:06:27 | and at |
---|
0:06:28 | the in |
---|
0:06:29 | uh the hmmm |
---|
0:06:31 | step i a combine to one on the some so criterion |
---|
0:06:36 | so |
---|
0:06:37 | for in this idea it is actually minimize the |
---|
0:06:40 | it is actually a really a approach |
---|
0:06:44 | um |
---|
0:06:45 | so a a good global approach is |
---|
0:06:48 | supposed to be better |
---|
0:06:50 | uh a he here is a |
---|
0:06:53 | is the |
---|
0:06:54 | example of K step look ahead |
---|
0:06:57 | um basically if |
---|
0:06:59 | it is greedy approach |
---|
0:07:00 | and it will always choose the the first rank combination |
---|
0:07:05 | however if you |
---|
0:07:06 | take a look at two step of file for the |
---|
0:07:10 | we find the best |
---|
0:07:11 | uh combining has |
---|
0:07:13 | oh combining candidates is from the second best order |
---|
0:07:16 | from here the the red pass |
---|
0:07:20 | so um this is a gentle way too often a global were optimized |
---|
0:07:25 | without |
---|
0:07:29 | another the idea is |
---|
0:07:30 | uh search the bit optimize past which employees the bread it's first a search idea or which is a dynamic |
---|
0:07:37 | programming |
---|
0:07:38 | um |
---|
0:07:39 | so we you the beam is set to and |
---|
0:07:42 | at each layer you keep and candidates |
---|
0:07:44 | at each layer |
---|
0:07:45 | and you extend to it next layer from and candidate so you have an square |
---|
0:07:50 | possibilities |
---|
0:07:52 | and use |
---|
0:07:54 | pruning |
---|
0:07:54 | it |
---|
0:07:55 | back to N |
---|
0:07:57 | uh aft |
---|
0:07:58 | this |
---|
0:07:59 | searching process your |
---|
0:08:00 | font |
---|
0:08:02 | uh the |
---|
0:08:03 | corpus up global optimize |
---|
0:08:05 | point |
---|
0:08:06 | uh at an minus a later |
---|
0:08:10 | so uh if the beam is only it then |
---|
0:08:14 | the the real outs will be you |
---|
0:08:16 | um surely global optimized |
---|
0:08:19 | however this is an like and P problem and then |
---|
0:08:23 | and so we have to set a beam too |
---|
0:08:26 | to do this job |
---|
0:08:30 | um |
---|
0:08:33 | so uh |
---|
0:08:35 | the conventional so it is |
---|
0:08:37 | i have state had the same compression rate |
---|
0:08:40 | um |
---|
0:08:41 | so we could use not very optimized |
---|
0:08:44 | because |
---|
0:08:45 | yeah |
---|
0:08:46 | that was set it can have a lab or compression rate |
---|
0:08:49 | they |
---|
0:08:50 | this makes more sense |
---|
0:08:52 | so uh |
---|
0:08:53 | fisher information criteria uh is employed here |
---|
0:08:57 | uh and a two-pass pass idea is employed |
---|
0:08:59 | i in the first pass way try to keep |
---|
0:09:02 | to K plus one |
---|
0:09:04 | um |
---|
0:09:06 | compression rate candidate date |
---|
0:09:09 | um with the bic value |
---|
0:09:13 | and uh in the second step |
---|
0:09:15 | way you the second pass with fixed to the bic value for all the states |
---|
0:09:19 | and therefore for the the different compression rate is |
---|
0:09:23 | here |
---|
0:09:25 | so um |
---|
0:09:27 | is |
---|
0:09:27 | i i are applied to our clustering |
---|
0:09:30 | uh a algorithm |
---|
0:09:33 | so that comes to the X experiment setup |
---|
0:09:36 | a we did the X |
---|
0:09:38 | per meant um past till dataset |
---|
0:09:41 | oh ways one hundred and thirty five hours of training data |
---|
0:09:44 | ten hours of testing data |
---|
0:09:47 | uh the model is speaker independent and the those training and the testing data a spontaneous speech |
---|
0:09:53 | uh is um |
---|
0:09:56 | model |
---|
0:09:57 | we cost at from is |
---|
0:09:59 | combined with for team |
---|
0:10:01 | bootstrap strap model |
---|
0:10:02 | to that six K states and to the one point eight meeting as |
---|
0:10:07 | and this speak model has that whatever rate of thirty five point four six percent uh |
---|
0:10:12 | in full covariance |
---|
0:10:16 | um |
---|
0:10:17 | so it comes to |
---|
0:10:18 | to a |
---|
0:10:19 | problem like channel of and K L sense manage but just a very slow all ten |
---|
0:10:24 | from this figure you can see |
---|
0:10:26 | um |
---|
0:10:28 | K are use like ten six times slower the entropy |
---|
0:10:31 | yeah channel of like |
---|
0:10:32 | twenty your thirty times slower than entropy |
---|
0:10:35 | so uh simple idea here |
---|
0:10:38 | is |
---|
0:10:39 | so entropy should be is fast and effective why don't wear use and to be do find and best candidate |
---|
0:10:44 | pairs |
---|
0:10:45 | and use channel for K to recalculate the distance |
---|
0:10:49 | to speed up the process |
---|
0:10:51 | so |
---|
0:10:52 | uh i aft |
---|
0:10:53 | a plane this idea the the speed improvement is significant |
---|
0:10:58 | and the the word error rate |
---|
0:11:00 | also improving |
---|
0:11:02 | yeah that's take |
---|
0:11:03 | the K L |
---|
0:11:04 | quickly clear the a the baseline vice thirty six point |
---|
0:11:08 | twenty three and aft |
---|
0:11:10 | using in the entropy stacked to that |
---|
0:11:12 | ten best |
---|
0:11:14 | and them um |
---|
0:11:15 | but |
---|
0:11:16 | there is improvement to city six point |
---|
0:11:19 | the roof for |
---|
0:11:21 | so the that we be had this is |
---|
0:11:24 | maybe a a a and B are suggest that with it this S |
---|
0:11:28 | can be put a show improvement because entropy |
---|
0:11:31 | uh |
---|
0:11:32 | please it like can see the the |
---|
0:11:35 | the weighting between the mixtures |
---|
0:11:38 | so i i i tried to the |
---|
0:11:40 | we it by a target do sense |
---|
0:11:42 | compared with the about is says |
---|
0:11:44 | and the compared with a B R |
---|
0:11:47 | approach approach |
---|
0:11:48 | um on uh compressed to one hundred K gaussian well and uh what fifty K gaussians |
---|
0:11:54 | so from this figure we can see that we did |
---|
0:11:57 | this sense it's better than |
---|
0:11:59 | now we did is sense |
---|
0:12:01 | which means the weighting is very important |
---|
0:12:03 | and N B R approach is |
---|
0:12:06 | that are then |
---|
0:12:07 | the weighted is a |
---|
0:12:09 | and and then the observation is that fifty K has |
---|
0:12:14 | roger improvement |
---|
0:12:16 | oh which makes sense to because |
---|
0:12:19 | um |
---|
0:12:20 | i becoming more and more important in a |
---|
0:12:23 | if you compression |
---|
0:12:24 | rate is high |
---|
0:12:28 | so here are some X |
---|
0:12:30 | results for global my addition |
---|
0:12:33 | so let's first take a look at the using a and to be criteria and um measure the |
---|
0:12:38 | so we or entropy change |
---|
0:12:40 | between compression be before |
---|
0:12:43 | if compression and the compression F and G |
---|
0:12:46 | so that the two looking had has a |
---|
0:12:49 | tiny improvement like zero point zero four |
---|
0:12:52 | i about the search approach |
---|
0:12:54 | has a |
---|
0:12:55 | roger improvement likes is something |
---|
0:12:58 | X thirty |
---|
0:12:59 | which means uh our approach is |
---|
0:13:02 | effective |
---|
0:13:04 | that |
---|
0:13:04 | the the speed is slow because |
---|
0:13:07 | you all want to search the the past and that |
---|
0:13:10 | uh it is a a twenty |
---|
0:13:13 | times slower than the baseline |
---|
0:13:16 | um um |
---|
0:13:17 | when you value where is the what error rate |
---|
0:13:20 | uh i one how can a fifty K the proposed approach is a |
---|
0:13:25 | better |
---|
0:13:26 | a positive of improvement |
---|
0:13:28 | that's the improvement is small |
---|
0:13:30 | in |
---|
0:13:32 | we on a higher compression rate the the |
---|
0:13:35 | difference between our proposed approach |
---|
0:13:38 | and uh the baseline that approaches larger |
---|
0:13:41 | which means |
---|
0:13:42 | um |
---|
0:13:45 | so that |
---|
0:13:47 | this work it |
---|
0:13:48 | effective |
---|
0:13:51 | um |
---|
0:13:53 | she is and |
---|
0:13:55 | experimental results on to pass structure up my addition |
---|
0:13:59 | for one hasn't to pass again them the two pass is always better than the one pass |
---|
0:14:05 | uh |
---|
0:14:06 | oh though the improvement is |
---|
0:14:07 | small |
---|
0:14:11 | so |
---|
0:14:12 | here |
---|
0:14:13 | is uh |
---|
0:14:14 | a in figure of |
---|
0:14:16 | uh uh the three |
---|
0:14:18 | approach the baseline the |
---|
0:14:20 | strapping |
---|
0:14:22 | raised is diagonal covariance the street huge |
---|
0:14:25 | the bs S plus diagnosed no G and the the bs plus |
---|
0:14:29 | or to diagonal not conversion strategy |
---|
0:14:32 | and uh from this figure |
---|
0:14:34 | there are we evaluated on bows |
---|
0:14:37 | not likelihood that |
---|
0:14:39 | and is |
---|
0:14:39 | discriminative training |
---|
0:14:43 | and the the the |
---|
0:14:44 | but we so is pretty imp |
---|
0:14:46 | uh interesting and the |
---|
0:14:48 | the improvement is |
---|
0:14:50 | but large |
---|
0:14:53 | if we compare mean ways of four two diagonal conversion compared ways |
---|
0:14:58 | a training all the process in |
---|
0:15:00 | using dark no |
---|
0:15:02 | covariance |
---|
0:15:04 | um so like one percent in |
---|
0:15:08 | a in a maximum likelihood |
---|
0:15:10 | and uh |
---|
0:15:11 | uh like to were point seven percent |
---|
0:15:13 | for discriminative training |
---|
0:15:18 | a place so um |
---|
0:15:21 | for future extension |
---|
0:15:23 | uh the search based |
---|
0:15:25 | approach |
---|
0:15:26 | uh the the beam |
---|
0:15:29 | can be out to adaptive uh the beam we are using used um |
---|
0:15:34 | yeah um |
---|
0:15:34 | it's for the beginning the beam use small it but for the for ending |
---|
0:15:39 | would be is large because you want to capture |
---|
0:15:42 | or word candidates |
---|
0:15:45 | and here |
---|
0:15:47 | okay use out adaptive idea to two |
---|
0:15:50 | uh optimize the beam |
---|
0:15:52 | and the case step look ahead and search optimized pass can be ease a general approach in optimisation |
---|
0:15:58 | oh can be applied to to other tasks such as decision tree |
---|
0:16:03 | and the |
---|
0:16:04 | for two pass model structure to addition we can try |
---|
0:16:07 | different criteria such as and the L you set of P C |
---|
0:16:11 | um |
---|
0:16:13 | so this is the reference |
---|
0:16:15 | and uh |
---|
0:16:16 | and we questions |
---|
0:16:18 | i Q |
---|
0:16:19 | i |
---|
0:16:25 | and |
---|
0:16:28 | he we got wouldn't |
---|
0:16:30 | is up for for the mic |
---|
0:16:37 | thanks |
---|
0:16:38 | i two questions the first one is how how do you divide |
---|
0:16:41 | the training set |
---|
0:16:43 | in into a different class or seen the in the very beginning |
---|
0:16:45 | "'cause" a second question see if i understand correctly |
---|
0:16:49 | uh each model we are have it's on scene tree structure so called a you if if this is true |
---|
0:16:54 | how can you decide |
---|
0:16:56 | each two states |
---|
0:16:57 | for example can be can be moved |
---|
0:17:00 | i think okay |
---|
0:17:01 | okay um |
---|
0:17:02 | so the first uh so uh the first question sounds uh is |
---|
0:17:06 | uh |
---|
0:17:07 | each subset is used random sampling |
---|
0:17:10 | without replacement |
---|
0:17:12 | so with a something rate R you to seventy percent |
---|
0:17:16 | and the second question is we |
---|
0:17:19 | i to the actually a share the same decision tree |
---|
0:17:22 | so here way |
---|
0:17:24 | um |
---|
0:17:25 | come by all the bootstrap data together |
---|
0:17:28 | to to an I Ds in so |
---|
0:17:31 | is no problem you you mentioned |
---|
0:17:35 | thanks |
---|
0:17:37 | i you are doing for go so the combines costing in in this case |
---|
0:17:42 | i i a uh and according to me my experience that maybe be on the cost of that has that |
---|
0:17:47 | read small number of |
---|
0:17:48 | components |
---|
0:17:50 | so had do you have any |
---|
0:17:51 | and a nash sure to for this small cost |
---|
0:17:55 | um |
---|
0:17:57 | and and actually the agglomerative clustering the is |
---|
0:18:00 | you come by |
---|
0:18:01 | to to it most similar gauss as together |
---|
0:18:05 | so |
---|
0:18:06 | um |
---|
0:18:08 | the next step you you |
---|
0:18:10 | after this step you have and minus one girls as right |
---|
0:18:13 | and then |
---|
0:18:15 | and i think wait it's very important here too |
---|
0:18:19 | to uh to avoid the C as you are mentioned |
---|
0:18:23 | so i in or you you are using uh |
---|
0:18:26 | it is no mess or uh using to explicitly a you nice right you do it you you do not |
---|
0:18:32 | have that |
---|
0:18:33 | small and a small number of a small cost |
---|
0:18:38 | um um |
---|
0:18:39 | i i D and the measure of the small cost us like but |
---|
0:18:43 | that |
---|
0:18:44 | i means small cost to the way it is |
---|
0:18:47 | the |
---|
0:18:49 | the weight is to represent if it is small right |
---|
0:18:53 | but the mixtures weight |
---|
0:18:54 | uh |
---|
0:18:56 | uh |
---|
0:18:58 | yeah but i you have just a |
---|
0:19:01 | for example you you've have just the one "'cause" one one one component in one class |
---|
0:19:06 | so that's a isolate the from the art of L do is almost how do you |
---|
0:19:10 | how do you do this |
---|
0:19:14 | a isolated with the |
---|
0:19:30 | do if you don't know that i |
---|
0:19:33 | we don't need to buy it is all that |
---|
0:19:34 | so so you mean cross state cost in |
---|
0:19:37 | i not i i i and just a a and you to do a class team when it is |
---|
0:19:41 | is |
---|
0:19:42 | some of the task just a house |
---|
0:19:44 | a a very small number |
---|
0:19:45 | a number of components |
---|
0:19:48 | is sometimes as a uh i do you |
---|
0:19:50 | i a four class and then |
---|
0:19:53 | a the task to do um a you treating some |
---|
0:19:56 | some |
---|
0:19:57 | then each ring the as some models on one local lines example that's late the lady you create problem |
---|
0:20:04 | right so so i and i don't have this small yeah i i think that weight is very important here |
---|
0:20:09 | as that i showed that we it it's then is better than um we it is sense |
---|
0:20:14 | so uh the weight ease |
---|
0:20:16 | the represent the station off a small or large |
---|
0:20:20 | cost and right |
---|
0:20:21 | so from my perspective |
---|
0:20:24 | and |
---|
0:20:24 | so so okay uh thank you thank you |
---|
0:20:28 | been |
---|