0:00:13 | models |
---|
0:00:13 | but i should each time |
---|
0:00:15 | but also in young can |
---|
0:00:16 | and it would be uh present a very uh she general |
---|
0:00:22 | so you G U |
---|
0:00:23 | okay |
---|
0:00:24 | uh uh not talk is right features based i'm are are speaker adaptation with that in models |
---|
0:00:31 | okay uh |
---|
0:00:33 | there it all line first uh that we introduce the back wrong |
---|
0:00:36 | of of the red it and the small amount data a the that patient problem |
---|
0:00:41 | uh this will include to part than the few uh the feature based the and are that patient and the |
---|
0:00:47 | by or models |
---|
0:00:49 | and then let's go to the combination of these two part so |
---|
0:00:53 | that's what we T |
---|
0:00:54 | um to combine the feature based i'm are are a and the body and models to do the adaptation work |
---|
0:01:00 | and then |
---|
0:01:01 | that's have a look at our experiment results and a last but it's a conclusion |
---|
0:01:07 | okay |
---|
0:01:08 | no the |
---|
0:01:09 | one of the is in the asr it the mismatch between the training data and the testing data that its |
---|
0:01:15 | distribution |
---|
0:01:16 | distribution between |
---|
0:01:18 | that this tour process |
---|
0:01:20 | so that's why we introduce the adaptation work |
---|
0:01:24 | oh |
---|
0:01:24 | a a we have a a a a people have proposed that will model based adaptation master |
---|
0:01:30 | uh we kind |
---|
0:01:31 | that we have three basic categories of the |
---|
0:01:34 | the first one is the speaker clustering |
---|
0:01:36 | this is the eigenspace space based master are and then it the best base man |
---|
0:01:41 | and the server one |
---|
0:01:43 | is the uh i i are |
---|
0:01:44 | so |
---|
0:01:45 | we can put |
---|
0:01:46 | oh on the model base of that patient but to |
---|
0:01:49 | no now we have for in on the red it that but that patient because in the model base of |
---|
0:01:53 | that should be have a lot of parameters to |
---|
0:01:56 | uh compute |
---|
0:01:58 | and then |
---|
0:01:58 | one uh and the speaker or uh when the number of the one B can you'd are the number of |
---|
0:02:04 | speakers so |
---|
0:02:06 | a new model parameter files we were you crazy further to grammatical is so this is not a a rapid |
---|
0:02:12 | adaptation so will be have to turn to the feature or us based adaptation |
---|
0:02:16 | but we can only use |
---|
0:02:18 | uh metrics |
---|
0:02:19 | uh beach we |
---|
0:02:20 | uh uh used on the of their we you the but feature so that we would be |
---|
0:02:26 | a much it it will be read |
---|
0:02:28 | or |
---|
0:02:29 | uh |
---|
0:02:31 | that's good to the small amount data for |
---|
0:02:33 | uh of a small amount data |
---|
0:02:35 | oh uh a means |
---|
0:02:37 | we have to not really to it's and number of parameters |
---|
0:02:41 | oh oh we have to compute |
---|
0:02:44 | are we have to find out |
---|
0:02:45 | uh reach uh a parameter or be factor it's of the most important so we |
---|
0:02:51 | can do some can space |
---|
0:02:53 | uh i guess but its based complexion and the rank uh and the rank the and battle |
---|
0:02:59 | uh and then we for one backed or it of the most important so this uh |
---|
0:03:06 | a then in prime her space side |
---|
0:03:09 | so we can also |
---|
0:03:11 | i variance some constraints on on |
---|
0:03:14 | a a a a a or more or all compassion for example |
---|
0:03:18 | uh we have uh |
---|
0:03:20 | Q are we have a a lot of story |
---|
0:03:24 | a help that are gonna all an or i'm our all our map are are |
---|
0:03:29 | so we have tried i have my i'll a mess there are so it also performs the uh from score |
---|
0:03:35 | that's |
---|
0:03:36 | so uh |
---|
0:03:37 | one be used |
---|
0:03:39 | i my i i are we should know of the prior you formation because i my means |
---|
0:03:43 | maximum up close to a so we have to now when we have to now |
---|
0:03:48 | uh the prior distribution if |
---|
0:03:50 | the assumed up a are distribution is the same as the row |
---|
0:03:55 | this mess a real key you a good performance but this is not |
---|
0:03:59 | or rest |
---|
0:04:00 | the real case |
---|
0:04:01 | so |
---|
0:04:02 | assuming we |
---|
0:04:03 | to to |
---|
0:04:05 | the prior distributions so uh |
---|
0:04:08 | uh we have to |
---|
0:04:11 | okay case speech and not though the prior information |
---|
0:04:14 | then we propose the by linear mass are so in or mess are we can |
---|
0:04:19 | uh us any are lee |
---|
0:04:22 | for one dollar |
---|
0:04:25 | and that i have and also important factor |
---|
0:04:28 | i some |
---|
0:04:29 | a like that a can it was and then we don't need |
---|
0:04:33 | the prior that the is be in here so last |
---|
0:04:37 | have a look at of the by or models |
---|
0:04:40 | okay let's first uh have a look at that |
---|
0:04:43 | i i R |
---|
0:04:44 | so |
---|
0:04:44 | uh |
---|
0:04:45 | this mass is very easy to |
---|
0:04:47 | expressed uh are an or waiting the after knowing the up so the feature and power |
---|
0:04:54 | time that time are able so a them knowing the of the relations we have a mac six |
---|
0:05:00 | uh |
---|
0:05:01 | uh work on these the metrics is a a and the we also have a a be used still the |
---|
0:05:06 | combine the metrics |
---|
0:05:08 | which is a and times and plus one matrix is that do |
---|
0:05:11 | so the past what means the uh ours |
---|
0:05:14 | so |
---|
0:05:15 | uh |
---|
0:05:16 | also following the traditional pipeline we have the |
---|
0:05:19 | all the other functions |
---|
0:05:21 | a us the a and then what we need to do in the following it's to maximise the uh seat |
---|
0:05:27 | still uh uh in this |
---|
0:05:29 | uh uh you this function |
---|
0:05:30 | uh |
---|
0:05:31 | a video |
---|
0:05:32 | it a is the low uh uh and the very well a both of the utterance |
---|
0:05:37 | for information metrics that a up a little |
---|
0:05:40 | and uh |
---|
0:05:41 | K a and E G as some but relations of the of the feature observed feature |
---|
0:05:49 | okay |
---|
0:05:49 | hmmm |
---|
0:05:50 | okay let's go to buy models |
---|
0:05:52 | uh you by in models we assume |
---|
0:05:55 | the of the rate the observed features depends on |
---|
0:05:59 | to a kind of of |
---|
0:06:00 | of |
---|
0:06:01 | a a a a factor as for example in speech recognition is |
---|
0:06:04 | we have a a a a of the relation but this observation may depends on |
---|
0:06:09 | the speaker |
---|
0:06:10 | and the same at ten hours day |
---|
0:06:12 | depends on the environment so we have to fact |
---|
0:06:16 | or wouldn't be need to do is to decompose a this |
---|
0:06:19 | um |
---|
0:06:20 | this feature why into this two factors i i and the B |
---|
0:06:26 | and mean they'll we have couple things |
---|
0:06:28 | pitch and a a B so that a video it's the coupling matrix so we that actually i a uh |
---|
0:06:34 | a a do it this or |
---|
0:06:37 | to the super metrics put uh |
---|
0:06:39 | because we have |
---|
0:06:40 | uh why is the |
---|
0:06:43 | a why it's a actors so |
---|
0:06:45 | and the super metrics stop a has the |
---|
0:06:48 | number of the number of mac |
---|
0:06:51 | in |
---|
0:06:52 | equivalent to the number |
---|
0:06:54 | i |
---|
0:06:55 | uh elements in by |
---|
0:06:58 | so this is the same actually could uh this is so close to match in a model because a a |
---|
0:07:02 | and a B I the |
---|
0:07:04 | because nobody oh i'd independent |
---|
0:07:06 | oh |
---|
0:07:06 | i i and B |
---|
0:07:07 | so |
---|
0:07:09 | and the the form a also a symmetric four |
---|
0:07:12 | but generally can not fun dollars that independent |
---|
0:07:15 | metrics stop video |
---|
0:07:17 | a you maybe you most cases is stop do have had to read a is a all this be assuming |
---|
0:07:23 | a double of read utterance |
---|
0:07:25 | or always have the right utterance with a so this is that is a matter eek but in models |
---|
0:07:30 | so we |
---|
0:07:31 | multiply apply |
---|
0:07:33 | i E and that we then be obtained the final or peak i i still a big i is the |
---|
0:07:39 | speaker to |
---|
0:07:42 | combination |
---|
0:07:43 | so that i that big a is speaker dependent the combination of factor i and the transfer matrix W |
---|
0:07:51 | so |
---|
0:07:51 | uh uh what a need to do is to obtain a and B |
---|
0:07:55 | after knowing the prior information of the transformation |
---|
0:08:00 | for example |
---|
0:08:01 | uh |
---|
0:08:02 | you know |
---|
0:08:03 | a a for we know a transformation matrix |
---|
0:08:06 | uh be she'd north has a |
---|
0:08:09 | in this the slide |
---|
0:08:10 | so |
---|
0:08:11 | oh we can use svd D to decompose a |
---|
0:08:14 | this metric |
---|
0:08:16 | that uh two |
---|
0:08:17 | two factors U and V U and V |
---|
0:08:20 | can be considered a as the speaker of faction and environment of action |
---|
0:08:26 | so in the middle part as |
---|
0:08:28 | is the |
---|
0:08:29 | uh is the coupling between this two part and ask you have to S P D as it also the |
---|
0:08:36 | singular bad it'll |
---|
0:08:38 | mac |
---|
0:08:38 | so this is uh uh something like a a you can but it was so |
---|
0:08:43 | i to run king |
---|
0:08:44 | and to rear at uh |
---|
0:08:46 | after re the single about bad or the singular battle according to the |
---|
0:08:51 | uh according to their size |
---|
0:08:54 | so |
---|
0:08:55 | a uh we can now |
---|
0:08:56 | the importance of a can or the important |
---|
0:09:00 | of a speaker information and the environment information |
---|
0:09:04 | so how to decrease the uh parameters space |
---|
0:09:08 | uh |
---|
0:09:09 | we can only at dot |
---|
0:09:11 | because only a of first uh maybe of false fly well first ten important and to single but it was |
---|
0:09:17 | uh according to the these |
---|
0:09:19 | we can |
---|
0:09:20 | uh estimate ten hours the layout pain the simplified fight U and V |
---|
0:09:25 | then |
---|
0:09:25 | uh be real |
---|
0:09:27 | uh |
---|
0:09:28 | and then be can decrease the parameter space i'd |
---|
0:09:31 | so |
---|
0:09:32 | then be multiple you and uh |
---|
0:09:35 | ask me can of ten and |
---|
0:09:38 | we can out end of uh the final form of the by in model a and B |
---|
0:09:44 | so |
---|
0:09:44 | uh was we need to do is to find out |
---|
0:09:47 | the a and B in the speech recognition case |
---|
0:09:51 | so |
---|
0:09:52 | um |
---|
0:09:53 | okay |
---|
0:09:54 | okay here is the pipeline |
---|
0:09:55 | for the combination of by the model and i i M a |
---|
0:09:59 | so that a blue is the uh a speaker transformation matrix in a i i'm a and uh it |
---|
0:10:06 | you close to a park that be as the buyers and uh a a i J at the speaker |
---|
0:10:12 | if for a a at the speaker |
---|
0:10:14 | the transform matrix |
---|
0:10:16 | so the final the motion is |
---|
0:10:19 | i times deep plus one |
---|
0:10:21 | so |
---|
0:10:22 | and and in the seconds type we find the average of the transform metrics stuff they are |
---|
0:10:27 | so |
---|
0:10:28 | uh and the we use each transfer matrix stop do my nose |
---|
0:10:34 | i do we have to still and the B can |
---|
0:10:37 | oh and uh |
---|
0:10:38 | um we can a the stack to that |
---|
0:10:41 | we can and the stack the transfer matrix |
---|
0:10:44 | still |
---|
0:10:45 | so these to the |
---|
0:10:46 | uh a high dimensional tree |
---|
0:10:50 | this that axe the different is that the matrix from different speakers |
---|
0:10:54 | together |
---|
0:10:55 | to |
---|
0:10:56 | uh compose these super metrics |
---|
0:10:59 | and then we put will that perform svd to find the speaker |
---|
0:11:04 | information and the maybe that you environment the related information |
---|
0:11:08 | so and the single about it was |
---|
0:11:12 | i |
---|
0:11:13 | uh |
---|
0:11:13 | so |
---|
0:11:15 | a a a and the B are the decompose the and the top loop the are or it is the |
---|
0:11:20 | by a uh it is that have a if it that have a leash for a is that have research |
---|
0:11:24 | the transfer matrix |
---|
0:11:26 | uh |
---|
0:11:27 | should be divided for a should we stuff |
---|
0:11:30 | the stops the track from the room at the the top lip |
---|
0:11:33 | so |
---|
0:11:34 | uh |
---|
0:11:36 | a i it just speaker dependent and B it's only environment a dependent that and so uh in the decoding |
---|
0:11:43 | stage at ten noise in a new uh up to a new be cursed at is is coming |
---|
0:11:50 | need to compute it a and be actually be V have or a up C D |
---|
0:11:56 | so the new you uh and you speaker information is only related to be i |
---|
0:12:00 | so how to compute i |
---|
0:12:02 | the uh for in the some a in the should in a tradition no i i'm our so see to |
---|
0:12:08 | a given is the uh of the or function and uh |
---|
0:12:13 | we replace the |
---|
0:12:16 | uh we sat key two i E and the B S P for you to this out |
---|
0:12:20 | the low of functions |
---|
0:12:22 | and uh |
---|
0:12:23 | then be re move with that |
---|
0:12:25 | root mean let's remote all terms |
---|
0:12:27 | this regard use of a |
---|
0:12:29 | uh uh and then be a ten and uh |
---|
0:12:32 | following forms |
---|
0:12:33 | so this is a a a a batteries three for about the mathematical our or uh uh uh operation |
---|
0:12:40 | okay |
---|
0:12:42 | okay |
---|
0:12:42 | then me last make uh last make the |
---|
0:12:46 | you have a here well uh of the up to their function be street respect to a a and make |
---|
0:12:51 | these uh the year to you to be there are |
---|
0:12:54 | so we can help and we kind of and the solution of these functions |
---|
0:12:59 | so |
---|
0:13:00 | uh i think that uh |
---|
0:13:02 | uh uh i think all these mathematical the |
---|
0:13:05 | operation can be found in a reference paper |
---|
0:13:08 | the name to the i'm i are i think of the read them by would land |
---|
0:13:15 | okay |
---|
0:13:16 | so |
---|
0:13:18 | uh okay this is the solution of this method or okay a problem the how to select a J J |
---|
0:13:24 | is the uh |
---|
0:13:26 | and you and the the after svd we have got a uh so is of single but it was and |
---|
0:13:32 | we only want to keep |
---|
0:13:34 | thus |
---|
0:13:35 | such the a subset |
---|
0:13:37 | of of these but it was so |
---|
0:13:39 | the the the set there's the estimation of this stuff that is J |
---|
0:13:43 | so we only need to keep the first the J important but it was |
---|
0:13:47 | so but how to select these J they are a group of master |
---|
0:13:52 | first one uh that selecting a according to the amount of the that checked that adaptation data |
---|
0:13:58 | so they are some experimental results shown that are shown that |
---|
0:14:02 | and the bad of J |
---|
0:14:04 | the log uh has a lot the relationship |
---|
0:14:07 | that be two in the experiment a the has a lot relationship between in is the |
---|
0:14:13 | the training amount of the adaptation data so we can use this relation log relationship and then be can also |
---|
0:14:20 | use the single but it was maybe |
---|
0:14:22 | uh |
---|
0:14:24 | a to a threshold |
---|
0:14:25 | uh uh can't one the singular better |
---|
0:14:28 | equates grammatical only |
---|
0:14:30 | and that this the base is this threshold then be kind |
---|
0:14:34 | C lack the single about it was before this threshold |
---|
0:14:38 | okay and we can also use you've atoms that you've V we have also route of staff so as uh |
---|
0:14:44 | only we can and use the we can pass |
---|
0:14:48 | maybe we can assume a so it better off J and then G minus one my notes one mine of |
---|
0:14:53 | fun to compute i totally uh i i need to compute the |
---|
0:14:58 | oh the they are a function that is also the and how objective function |
---|
0:15:02 | to |
---|
0:15:03 | find out which a |
---|
0:15:05 | max seem might the uh |
---|
0:15:07 | objective function this they also semester |
---|
0:15:09 | i didn't this that in this paper we use the uh a final my sir |
---|
0:15:13 | two |
---|
0:15:14 | uh i i T be computed the and are |
---|
0:15:17 | objective function but now we whole |
---|
0:15:19 | some simple or mess for example |
---|
0:15:22 | to use the on |
---|
0:15:24 | a the relay |
---|
0:15:25 | a a of the amount of adaptation data |
---|
0:15:28 | you at the experiment results don't them |
---|
0:15:32 | what we want to see that's are in the second experiment the mandarin |
---|
0:15:36 | uh and and voice uh way say H the search data so |
---|
0:15:40 | um |
---|
0:15:41 | the and this is about i |
---|
0:15:43 | uh |
---|
0:15:44 | in this test data the sure that the is is better short uh here is that |
---|
0:15:49 | the |
---|
0:15:50 | oh a six seconds or ten seconds all |
---|
0:15:53 | a can only a question it's use here only a questions that it's use here so after is it matters |
---|
0:15:58 | for so we use a traditional i have a i are are |
---|
0:16:02 | the wer is fifteen point two per but bell |
---|
0:16:06 | be turn to the body in model |
---|
0:16:08 | uh |
---|
0:16:10 | the absolute wer |
---|
0:16:12 | the decrease by a one point five percent |
---|
0:16:18 | okay |
---|
0:16:19 | a a a a a a a a conclusion that the uh to our conclusions but but but in a |
---|
0:16:24 | models can fact that incorporates the |
---|
0:16:27 | as to be sat in in that before for the prior information and the lack pretty read the number of |
---|
0:16:33 | parameters space |
---|
0:16:35 | and the filter work is |
---|
0:16:37 | the and the first |
---|
0:16:38 | is to select G |
---|
0:16:40 | is to select the |
---|
0:16:42 | single that the number of single or it was J in a single right |
---|
0:16:46 | and the second is |
---|
0:16:48 | uh |
---|
0:16:49 | you our work |
---|
0:16:50 | we have |
---|
0:16:51 | or seen by the linear models this that the speaker or information and uh for example or the environment information |
---|
0:16:58 | but actually we have been know the speaker information be them know the environment information so the second filter work |
---|
0:17:05 | is |
---|
0:17:06 | uh |
---|
0:17:07 | we can |
---|
0:17:09 | was not one |
---|
0:17:10 | if we compare snow |
---|
0:17:11 | the speaker information and the new bound of information here it is that the class dependent to information so |
---|
0:17:19 | oh was he can do further |
---|
0:17:20 | to you you but uh to increase the performance |
---|
0:17:23 | of this bilinear model |
---|
0:17:25 | oh |
---|
0:17:26 | and the |
---|
0:17:27 | this so one is the how to control the speaker number |
---|
0:17:31 | still |
---|
0:17:32 | uh |
---|
0:17:33 | that's and uh |
---|
0:17:34 | that's our kids work |
---|
0:17:36 | so if you have a |
---|
0:17:37 | do have a fast race |
---|
0:17:41 | a a quick question |
---|
0:17:46 | okay so if you want to not some details about this work please right to |
---|
0:17:50 | i have john has sinned dot I B M calm |
---|
0:17:54 | and you very much |
---|
0:17:55 | you |
---|