0:00:13models
0:00:13but i should each time
0:00:15but also in young can
0:00:16and it would be uh present a very uh she general
0:00:22so you G U
0:00:23okay
0:00:24uh uh not talk is right features based i'm are are speaker adaptation with that in models
0:00:31okay uh
0:00:33there it all line first uh that we introduce the back wrong
0:00:36of of the red it and the small amount data a the that patient problem
0:00:41uh this will include to part than the few uh the feature based the and are that patient and the
0:00:47by or models
0:00:49and then let's go to the combination of these two part so
0:00:53that's what we T
0:00:54um to combine the feature based i'm are are a and the body and models to do the adaptation work
0:01:00and then
0:01:01that's have a look at our experiment results and a last but it's a conclusion
0:01:07okay
0:01:08no the
0:01:09one of the is in the asr it the mismatch between the training data and the testing data that its
0:01:15distribution
0:01:16distribution between
0:01:18that this tour process
0:01:20so that's why we introduce the adaptation work
0:01:24oh
0:01:24a a we have a a a a people have proposed that will model based adaptation master
0:01:30uh we kind
0:01:31that we have three basic categories of the
0:01:34the first one is the speaker clustering
0:01:36this is the eigenspace space based master are and then it the best base man
0:01:41and the server one
0:01:43is the uh i i are
0:01:44so
0:01:45we can put
0:01:46oh on the model base of that patient but to
0:01:49no now we have for in on the red it that but that patient because in the model base of
0:01:53that should be have a lot of parameters to
0:01:56uh compute
0:01:58and then
0:01:58one uh and the speaker or uh when the number of the one B can you'd are the number of
0:02:04speakers so
0:02:06a new model parameter files we were you crazy further to grammatical is so this is not a a rapid
0:02:12adaptation so will be have to turn to the feature or us based adaptation
0:02:16but we can only use
0:02:18uh metrics
0:02:19uh beach we
0:02:20uh uh used on the of their we you the but feature so that we would be
0:02:26a much it it will be read
0:02:28or
0:02:29uh
0:02:31that's good to the small amount data for
0:02:33uh of a small amount data
0:02:35oh uh a means
0:02:37we have to not really to it's and number of parameters
0:02:41oh oh we have to compute
0:02:44are we have to find out
0:02:45uh reach uh a parameter or be factor it's of the most important so we
0:02:51can do some can space
0:02:53uh i guess but its based complexion and the rank uh and the rank the and battle
0:02:59uh and then we for one backed or it of the most important so this uh
0:03:06a then in prime her space side
0:03:09so we can also
0:03:11i variance some constraints on on
0:03:14a a a a a or more or all compassion for example
0:03:18uh we have uh
0:03:20Q are we have a a lot of story
0:03:24a help that are gonna all an or i'm our all our map are are
0:03:29so we have tried i have my i'll a mess there are so it also performs the uh from score
0:03:35that's
0:03:36so uh
0:03:37one be used
0:03:39i my i i are we should know of the prior you formation because i my means
0:03:43maximum up close to a so we have to now when we have to now
0:03:48uh the prior distribution if
0:03:50the assumed up a are distribution is the same as the row
0:03:55this mess a real key you a good performance but this is not
0:03:59or rest
0:04:00the real case
0:04:01so
0:04:02assuming we
0:04:03to to
0:04:05the prior distributions so uh
0:04:08uh we have to
0:04:11okay case speech and not though the prior information
0:04:14then we propose the by linear mass are so in or mess are we can
0:04:19uh us any are lee
0:04:22for one dollar
0:04:25and that i have and also important factor
0:04:28i some
0:04:29a like that a can it was and then we don't need
0:04:33the prior that the is be in here so last
0:04:37have a look at of the by or models
0:04:40okay let's first uh have a look at that
0:04:43i i R
0:04:44so
0:04:44uh
0:04:45this mass is very easy to
0:04:47expressed uh are an or waiting the after knowing the up so the feature and power
0:04:54time that time are able so a them knowing the of the relations we have a mac six
0:05:00uh
0:05:01uh work on these the metrics is a a and the we also have a a be used still the
0:05:06combine the metrics
0:05:08which is a and times and plus one matrix is that do
0:05:11so the past what means the uh ours
0:05:14so
0:05:15uh
0:05:16also following the traditional pipeline we have the
0:05:19all the other functions
0:05:21a us the a and then what we need to do in the following it's to maximise the uh seat
0:05:27still uh uh in this
0:05:29uh uh you this function
0:05:30uh
0:05:31a video
0:05:32it a is the low uh uh and the very well a both of the utterance
0:05:37for information metrics that a up a little
0:05:40and uh
0:05:41K a and E G as some but relations of the of the feature observed feature
0:05:49okay
0:05:49hmmm
0:05:50okay let's go to buy models
0:05:52uh you by in models we assume
0:05:55the of the rate the observed features depends on
0:05:59to a kind of of
0:06:00of
0:06:01a a a a factor as for example in speech recognition is
0:06:04we have a a a a of the relation but this observation may depends on
0:06:09the speaker
0:06:10and the same at ten hours day
0:06:12depends on the environment so we have to fact
0:06:16or wouldn't be need to do is to decompose a this
0:06:19um
0:06:20this feature why into this two factors i i and the B
0:06:26and mean they'll we have couple things
0:06:28pitch and a a B so that a video it's the coupling matrix so we that actually i a uh
0:06:34a a do it this or
0:06:37to the super metrics put uh
0:06:39because we have
0:06:40uh why is the
0:06:43a why it's a actors so
0:06:45and the super metrics stop a has the
0:06:48number of the number of mac
0:06:51in
0:06:52equivalent to the number
0:06:54i
0:06:55uh elements in by
0:06:58so this is the same actually could uh this is so close to match in a model because a a
0:07:02and a B I the
0:07:04because nobody oh i'd independent
0:07:06oh
0:07:06i i and B
0:07:07so
0:07:09and the the form a also a symmetric four
0:07:12but generally can not fun dollars that independent
0:07:15metrics stop video
0:07:17a you maybe you most cases is stop do have had to read a is a all this be assuming
0:07:23a double of read utterance
0:07:25or always have the right utterance with a so this is that is a matter eek but in models
0:07:30so we
0:07:31multiply apply
0:07:33i E and that we then be obtained the final or peak i i still a big i is the
0:07:39speaker to
0:07:42combination
0:07:43so that i that big a is speaker dependent the combination of factor i and the transfer matrix W
0:07:51so
0:07:51uh uh what a need to do is to obtain a and B
0:07:55after knowing the prior information of the transformation
0:08:00for example
0:08:01uh
0:08:02you know
0:08:03a a for we know a transformation matrix
0:08:06uh be she'd north has a
0:08:09in this the slide
0:08:10so
0:08:11oh we can use svd D to decompose a
0:08:14this metric
0:08:16that uh two
0:08:17two factors U and V U and V
0:08:20can be considered a as the speaker of faction and environment of action
0:08:26so in the middle part as
0:08:28is the
0:08:29uh is the coupling between this two part and ask you have to S P D as it also the
0:08:36singular bad it'll
0:08:38mac
0:08:38so this is uh uh something like a a you can but it was so
0:08:43i to run king
0:08:44and to rear at uh
0:08:46after re the single about bad or the singular battle according to the
0:08:51uh according to their size
0:08:54so
0:08:55a uh we can now
0:08:56the importance of a can or the important
0:09:00of a speaker information and the environment information
0:09:04so how to decrease the uh parameters space
0:09:08uh
0:09:09we can only at dot
0:09:11because only a of first uh maybe of false fly well first ten important and to single but it was
0:09:17uh according to the these
0:09:19we can
0:09:20uh estimate ten hours the layout pain the simplified fight U and V
0:09:25then
0:09:25uh be real
0:09:27uh
0:09:28and then be can decrease the parameter space i'd
0:09:31so
0:09:32then be multiple you and uh
0:09:35ask me can of ten and
0:09:38we can out end of uh the final form of the by in model a and B
0:09:44so
0:09:44uh was we need to do is to find out
0:09:47the a and B in the speech recognition case
0:09:51so
0:09:52um
0:09:53okay
0:09:54okay here is the pipeline
0:09:55for the combination of by the model and i i M a
0:09:59so that a blue is the uh a speaker transformation matrix in a i i'm a and uh it
0:10:06you close to a park that be as the buyers and uh a a i J at the speaker
0:10:12if for a a at the speaker
0:10:14the transform matrix
0:10:16so the final the motion is
0:10:19i times deep plus one
0:10:21so
0:10:22and and in the seconds type we find the average of the transform metrics stuff they are
0:10:27so
0:10:28uh and the we use each transfer matrix stop do my nose
0:10:34i do we have to still and the B can
0:10:37oh and uh
0:10:38um we can a the stack to that
0:10:41we can and the stack the transfer matrix
0:10:44still
0:10:45so these to the
0:10:46uh a high dimensional tree
0:10:50this that axe the different is that the matrix from different speakers
0:10:54together
0:10:55to
0:10:56uh compose these super metrics
0:10:59and then we put will that perform svd to find the speaker
0:11:04information and the maybe that you environment the related information
0:11:08so and the single about it was
0:11:12i
0:11:13uh
0:11:13so
0:11:15a a a and the B are the decompose the and the top loop the are or it is the
0:11:20by a uh it is that have a if it that have a leash for a is that have research
0:11:24the transfer matrix
0:11:26uh
0:11:27should be divided for a should we stuff
0:11:30the stops the track from the room at the the top lip
0:11:33so
0:11:34uh
0:11:36a i it just speaker dependent and B it's only environment a dependent that and so uh in the decoding
0:11:43stage at ten noise in a new uh up to a new be cursed at is is coming
0:11:50need to compute it a and be actually be V have or a up C D
0:11:56so the new you uh and you speaker information is only related to be i
0:12:00so how to compute i
0:12:02the uh for in the some a in the should in a tradition no i i'm our so see to
0:12:08a given is the uh of the or function and uh
0:12:13we replace the
0:12:16uh we sat key two i E and the B S P for you to this out
0:12:20the low of functions
0:12:22and uh
0:12:23then be re move with that
0:12:25root mean let's remote all terms
0:12:27this regard use of a
0:12:29uh uh and then be a ten and uh
0:12:32following forms
0:12:33so this is a a a a batteries three for about the mathematical our or uh uh uh operation
0:12:40okay
0:12:42okay
0:12:42then me last make uh last make the
0:12:46you have a here well uh of the up to their function be street respect to a a and make
0:12:51these uh the year to you to be there are
0:12:54so we can help and we kind of and the solution of these functions
0:12:59so
0:13:00uh i think that uh
0:13:02uh uh i think all these mathematical the
0:13:05operation can be found in a reference paper
0:13:08the name to the i'm i are i think of the read them by would land
0:13:15okay
0:13:16so
0:13:18uh okay this is the solution of this method or okay a problem the how to select a J J
0:13:24is the uh
0:13:26and you and the the after svd we have got a uh so is of single but it was and
0:13:32we only want to keep
0:13:34thus
0:13:35such the a subset
0:13:37of of these but it was so
0:13:39the the the set there's the estimation of this stuff that is J
0:13:43so we only need to keep the first the J important but it was
0:13:47so but how to select these J they are a group of master
0:13:52first one uh that selecting a according to the amount of the that checked that adaptation data
0:13:58so they are some experimental results shown that are shown that
0:14:02and the bad of J
0:14:04the log uh has a lot the relationship
0:14:07that be two in the experiment a the has a lot relationship between in is the
0:14:13the training amount of the adaptation data so we can use this relation log relationship and then be can also
0:14:20use the single but it was maybe
0:14:22uh
0:14:24a to a threshold
0:14:25uh uh can't one the singular better
0:14:28equates grammatical only
0:14:30and that this the base is this threshold then be kind
0:14:34C lack the single about it was before this threshold
0:14:38okay and we can also use you've atoms that you've V we have also route of staff so as uh
0:14:44only we can and use the we can pass
0:14:48maybe we can assume a so it better off J and then G minus one my notes one mine of
0:14:53fun to compute i totally uh i i need to compute the
0:14:58oh the they are a function that is also the and how objective function
0:15:02to
0:15:03find out which a
0:15:05max seem might the uh
0:15:07objective function this they also semester
0:15:09i didn't this that in this paper we use the uh a final my sir
0:15:13two
0:15:14uh i i T be computed the and are
0:15:17objective function but now we whole
0:15:19some simple or mess for example
0:15:22to use the on
0:15:24a the relay
0:15:25a a of the amount of adaptation data
0:15:28you at the experiment results don't them
0:15:32what we want to see that's are in the second experiment the mandarin
0:15:36uh and and voice uh way say H the search data so
0:15:40um
0:15:41the and this is about i
0:15:43uh
0:15:44in this test data the sure that the is is better short uh here is that
0:15:49the
0:15:50oh a six seconds or ten seconds all
0:15:53a can only a question it's use here only a questions that it's use here so after is it matters
0:15:58for so we use a traditional i have a i are are
0:16:02the wer is fifteen point two per but bell
0:16:06be turn to the body in model
0:16:08uh
0:16:10the absolute wer
0:16:12the decrease by a one point five percent
0:16:18okay
0:16:19a a a a a a a a conclusion that the uh to our conclusions but but but in a
0:16:24models can fact that incorporates the
0:16:27as to be sat in in that before for the prior information and the lack pretty read the number of
0:16:33parameters space
0:16:35and the filter work is
0:16:37the and the first
0:16:38is to select G
0:16:40is to select the
0:16:42single that the number of single or it was J in a single right
0:16:46and the second is
0:16:48uh
0:16:49you our work
0:16:50we have
0:16:51or seen by the linear models this that the speaker or information and uh for example or the environment information
0:16:58but actually we have been know the speaker information be them know the environment information so the second filter work
0:17:05is
0:17:06uh
0:17:07we can
0:17:09was not one
0:17:10if we compare snow
0:17:11the speaker information and the new bound of information here it is that the class dependent to information so
0:17:19oh was he can do further
0:17:20to you you but uh to increase the performance
0:17:23of this bilinear model
0:17:25oh
0:17:26and the
0:17:27this so one is the how to control the speaker number
0:17:31still
0:17:32uh
0:17:33that's and uh
0:17:34that's our kids work
0:17:36so if you have a
0:17:37do have a fast race
0:17:41a a quick question
0:17:46okay so if you want to not some details about this work please right to
0:17:50i have john has sinned dot I B M calm
0:17:54and you very much
0:17:55you