0:00:13 | so that and give you a a a uh a um all of you up the whole to locate and |
---|
0:00:18 | just going to you give a brief description of how the |
---|
0:00:20 | a to model various acts to model classes or |
---|
0:00:23 | organise just to give you a flavour of file |
---|
0:00:25 | what is meant by the court is modular |
---|
0:00:28 | and parts that don't need to know about each of the north |
---|
0:00:32 | um um |
---|
0:00:34 | so just tool |
---|
0:00:36 | re rate um |
---|
0:00:39 | uh |
---|
0:00:40 | the thing that we support currently it's |
---|
0:00:42 | it's mainly the |
---|
0:00:43 | the standard max in the cute training of acoustic models together with a gmms and in the kind of max |
---|
0:00:48 | that cute framework |
---|
0:00:50 | um we have the usual in your transforms like lda to |
---|
0:00:54 | and S T C |
---|
0:00:55 | um |
---|
0:00:56 | we also support speaker adaptation |
---|
0:00:59 | currently if a are is |
---|
0:01:01 | a we have tested it in the recipes |
---|
0:01:04 | mllr lower court is there it's |
---|
0:01:05 | you mean tested um |
---|
0:01:07 | this still |
---|
0:01:08 | any to right so somebody needs to write the the cable |
---|
0:01:12 | and and on them um |
---|
0:01:14 | so |
---|
0:01:15 | mllr is not in the recipe |
---|
0:01:17 | almost done |
---|
0:01:19 | um |
---|
0:01:20 | and well |
---|
0:01:21 | uh and uh |
---|
0:01:22 | and leather obviously has but it it's with which trees then if and lower |
---|
0:01:26 | has to |
---|
0:01:27 | variations of one it's it's just a global transform or with which trees |
---|
0:01:32 | a |
---|
0:01:34 | uh yeah and i |
---|
0:01:35 | this is the point |
---|
0:01:36 | which |
---|
0:01:37 | and once uh then can mention that that |
---|
0:01:39 | we had some discussion whether two |
---|
0:01:41 | a sub for um things like uh do you known type systems are be take models where |
---|
0:01:46 | uh and uh for now |
---|
0:01:49 | uh things are fairly simple |
---|
0:01:51 | um we decided not to do it now |
---|
0:01:54 | maybe if the need is felt in feature and sometimes P |
---|
0:01:57 | also |
---|
0:01:58 | for the course of |
---|
0:01:59 | this development |
---|
0:02:00 | a a couple of times a part |
---|
0:02:01 | my |
---|
0:02:02 | be good to have a system like that |
---|
0:02:04 | but currently when a gmm it's |
---|
0:02:06 | it's |
---|
0:02:07 | uh a very specific thing with means and covariances |
---|
0:02:11 | uh and i'm going to |
---|
0:02:12 | just be few also see how the gmms are implemented |
---|
0:02:15 | um |
---|
0:02:16 | and yeah the sims in the thing with is gmms we also have the |
---|
0:02:20 | uh if from lower adaptation court phrase gmms uh and a little bit |
---|
0:02:24 | uh |
---|
0:02:25 | um that there are few results we had previously published which are still lot in this new code base but |
---|
0:02:30 | there |
---|
0:02:31 | uh going to be added |
---|
0:02:34 | so |
---|
0:02:35 | this is |
---|
0:02:36 | this is already been talked about we have a |
---|
0:02:39 | gmm class and uh it knows really in about nothing else other than |
---|
0:02:44 | and what what it contains uh |
---|
0:02:46 | that is the parameters |
---|
0:02:48 | and there is that acoustic stick model class which is just a vector of gmms |
---|
0:02:51 | and for implementation reason |
---|
0:02:54 | or pointers but |
---|
0:02:55 | not that |
---|
0:02:56 | uh interesting uh a thing but uh uh a the green of in this |
---|
0:03:00 | slides would |
---|
0:03:01 | uh signify this |
---|
0:03:03 | technical term called knows about where hit which is and |
---|
0:03:07 | it it could be a so it's so we have a did um as much of inheritance has because |
---|
0:03:13 | uh so |
---|
0:03:15 | um most of the time things are not uh inherited things |
---|
0:03:19 | if |
---|
0:03:20 | uh uh uh object needs to |
---|
0:03:22 | cheap |
---|
0:03:23 | uh track of another object it's |
---|
0:03:25 | either |
---|
0:03:26 | by keeping a once preference uh it |
---|
0:03:29 | that's that case otherwise |
---|
0:03:30 | yeah |
---|
0:03:31 | specific fake uh that will take just pointers and modified that |
---|
0:03:35 | um |
---|
0:03:36 | so knows was about is in that sense that you can think that |
---|
0:03:39 | you know if you have to write the code you have to be to the head or four |
---|
0:03:42 | this on the thing right |
---|
0:03:44 | um |
---|
0:03:47 | uh so so |
---|
0:03:48 | so |
---|
0:03:49 | the gmms are parametrized |
---|
0:03:51 | um |
---|
0:03:52 | using the natural parameters which is a which |
---|
0:03:55 | a natural parameters in the sense of um the that's of parameters of an mention distribution |
---|
0:04:00 | where uh if you right of the |
---|
0:04:02 | like your got you get |
---|
0:04:04 | um |
---|
0:04:05 | this too |
---|
0:04:06 | i think that the |
---|
0:04:08 | uh them |
---|
0:04:08 | the there is a |
---|
0:04:09 | uh the mean time |
---|
0:04:11 | the inverse of the covariance and the inverse of the covariance of the natural parameters of few M |
---|
0:04:15 | and the reason for doing that is then you can do the like your calculation |
---|
0:04:18 | using just |
---|
0:04:20 | two |
---|
0:04:20 | matrix vector multiplication locations because it or if you have diagonal covariance system |
---|
0:04:25 | you have your and |
---|
0:04:26 | you have the mean times |
---|
0:04:28 | in this covariance is the vector and say |
---|
0:04:30 | you five components are i mean |
---|
0:04:32 | i components |
---|
0:04:33 | and you have your data vector and |
---|
0:04:35 | you just |
---|
0:04:36 | do this to make exact vector |
---|
0:04:38 | but |
---|
0:04:40 | and |
---|
0:04:41 | there are last ratings for doing that obviously |
---|
0:04:43 | yeah a to blast |
---|
0:04:45 | is |
---|
0:04:46 | yeah not the most optimize thing but |
---|
0:04:48 | i mean it's still |
---|
0:04:49 | uh a nice |
---|
0:04:50 | um |
---|
0:04:51 | uh we of doing things |
---|
0:04:53 | so um |
---|
0:04:56 | so uh uh uh a graphical uh overview of uh what dan has already said that |
---|
0:05:01 | uh uh we have this as to model class but |
---|
0:05:04 | when it in to the decoder it contracts with this decodable |
---|
0:05:08 | uh object |
---|
0:05:09 | and uh the decoder knows only about uh this the court of an interface and |
---|
0:05:13 | for each type of acoustic model we need to implement the project us |
---|
0:05:17 | as with the able |
---|
0:05:18 | uh interface uh for that model right |
---|
0:05:22 | and the decodable |
---|
0:05:23 | uh object is the one which all some about features |
---|
0:05:26 | and um |
---|
0:05:27 | just that isn't you'd of the like computation |
---|
0:05:30 | and this is |
---|
0:05:31 | exactly how the decoder interface looks like |
---|
0:05:34 | so |
---|
0:05:35 | so but when i be avoid yeah using uh in here dense |
---|
0:05:39 | this is the only exception which would be uh |
---|
0:05:42 | when V have interfaces which we have a |
---|
0:05:45 | you |
---|
0:05:45 | for features for portable and |
---|
0:05:47 | a few of the things |
---|
0:05:49 | uh and these are actually pure interfaces |
---|
0:05:52 | uh so that |
---|
0:05:54 | what B |
---|
0:05:55 | a a a that's only case where we hate |
---|
0:05:58 | um so as you can see it's |
---|
0:06:01 | a simple E |
---|
0:06:02 | the main function is that like you good combination |
---|
0:06:04 | and uh the decoder can know that but there |
---|
0:06:07 | at |
---|
0:06:08 | there no more frames |
---|
0:06:09 | and yeah |
---|
0:06:11 | how many states essentially you have |
---|
0:06:17 | so |
---|
0:06:17 | a for every other model type you then in heard from this end |
---|
0:06:20 | uh in |
---|
0:06:22 | not |
---|
0:06:23 | so um |
---|
0:06:24 | that was the decoding for training we similarly have a object for |
---|
0:06:29 | spring that matters |
---|
0:06:30 | and uh |
---|
0:06:31 | for the gmms and |
---|
0:06:33 | uh in in the same way that the acoustic model is just a vector of gmms the |
---|
0:06:37 | uh the |
---|
0:06:38 | acoustic model trainer is just a vector of |
---|
0:06:40 | uh objects with screen that you |
---|
0:06:43 | and uh |
---|
0:06:49 | yeah yeah |
---|
0:06:51 | okay yes sure this this yeah that my slides are not compatible |
---|
0:06:56 | yeah |
---|
0:06:57 | so |
---|
0:06:58 | um |
---|
0:07:00 | yeah |
---|
0:07:02 | ah |
---|
0:07:02 | um and and and the red arrow means that uh this classes with modified those classes |
---|
0:07:08 | obviously modifies it implies it also knows about and |
---|
0:07:11 | typically modification it doesn't keep |
---|
0:07:14 | any or an object up the other class pictures |
---|
0:07:18 | it has a method which will |
---|
0:07:19 | um take that object and |
---|
0:07:21 | do the modification |
---|
0:07:25 | um so how do you adaptation adaptation for that |
---|
0:07:28 | say uh for feature space mllr um |
---|
0:07:33 | and so it's |
---|
0:07:34 | if it's global it's implemented as as |
---|
0:07:36 | as a |
---|
0:07:37 | simple matrix |
---|
0:07:38 | uh |
---|
0:07:39 | and |
---|
0:07:40 | the matrix doesn't need to know what it as like a a it's it's only the estimation which makes it |
---|
0:07:44 | that from the ladder |
---|
0:07:45 | so the estimator knows about acoustic model nodes |
---|
0:07:49 | about revision too if you're using the version three |
---|
0:07:51 | and if you're using regression P |
---|
0:07:54 | the timber object has just multiple transform |
---|
0:07:57 | um |
---|
0:07:58 | and similarly to so that it from another object then however doesn't know about |
---|
0:08:02 | uh regression feed this concept |
---|
0:08:04 | it just has a bunch of transforms it's a decodable object which |
---|
0:08:08 | nose |
---|
0:08:09 | hoping to read this thing |
---|
0:08:14 | a similarly with mllr |
---|
0:08:16 | uh obviously that has to know that "'cause" model and them a lower |
---|
0:08:20 | uh can either |
---|
0:08:21 | uh you can |
---|
0:08:22 | it can acoustic model and tell it give me an adapted models are to just |
---|
0:08:26 | a all the means and give you and you model |
---|
0:08:28 | uh a i it can do it lazy so that every you can |
---|
0:08:33 | um um so the decodable |
---|
0:08:35 | the decoder will as the D portable to |
---|
0:08:37 | get the lack you'd from an out of date model the |
---|
0:08:39 | the decodable will |
---|
0:08:41 | quite either the M other object which |
---|
0:08:43 | then we'll see fit |
---|
0:08:46 | has already completed this |
---|
0:08:48 | i mean it catches the mean |
---|
0:08:49 | if not then will |
---|
0:08:51 | uh a the mean from the acoustic model and i weekly see that |
---|
0:08:55 | then convert it right |
---|
0:08:56 | so which |
---|
0:08:58 | which is |
---|
0:08:59 | how you would use it can practical uh situation |
---|
0:09:05 | there's gmms |
---|
0:09:06 | have very similar structure |
---|
0:09:08 | again |
---|
0:09:09 | yeah there is that the able |
---|
0:09:10 | uh on the is gmm |
---|
0:09:12 | oh it |
---|
0:09:14 | that should say S |
---|
0:09:17 | jim |
---|
0:09:18 | and the gmm class |
---|
0:09:20 | um it the is gmm model it has |
---|
0:09:22 | this you switch |
---|
0:09:25 | um that's why needs to know about |
---|
0:09:27 | the gmm classes as well |
---|
0:09:29 | right and |
---|
0:09:30 | just for |
---|
0:09:32 | yeah the |
---|
0:09:32 | convenience of coding |
---|
0:09:34 | there's gmm up for the gmm classes that can lead to send out dating |
---|
0:09:38 | class is the same |
---|
0:09:39 | for is you rooms they different because |
---|
0:09:41 | there many uh a big |
---|
0:09:42 | method |
---|
0:09:43 | used in is |
---|
0:09:47 | yeah and things sort nets so am |
---|
0:09:50 | and uh so |
---|
0:09:51 | so the first bullet point there from lower basis for for you miss already |
---|
0:09:54 | published |
---|
0:09:55 | like know |
---|
0:09:57 | to your own work on most |
---|
0:09:58 | uh it's in the old code base |
---|
0:10:00 | new |
---|
0:10:01 | we need to put it in the new one |
---|
0:10:03 | um |
---|
0:10:04 | partially actually done |
---|
0:10:05 | um |
---|
0:10:06 | then |
---|
0:10:08 | a couple of is back then present the symmetric extension of is gmms |
---|
0:10:13 | um |
---|
0:10:14 | so at you can |
---|
0:10:15 | people keep an asking what's summit at means |
---|
0:10:18 | uh |
---|
0:10:19 | um uh uh uh so so that that's also partially done |
---|
0:10:23 | um |
---|
0:10:24 | and then has then mention that |
---|
0:10:26 | we of reading for um that generation to finished |
---|
0:10:29 | and we can out of the this thing things |
---|
0:10:32 | um |
---|
0:10:34 | yes there but parts and discussions and debates and this |
---|
0:10:38 | um and on |
---|
0:10:40 | supporting multiple feature transforms |
---|
0:10:42 | currently you only have |
---|
0:10:45 | global transform send their just |
---|
0:10:47 | put into one chain |
---|
0:10:53 | a regression class yeah i i you can have regression classes for M F and alarms |
---|
0:10:58 | but then you can compose it with any other transform which has multiple |
---|
0:11:02 | john some as well |
---|
0:11:03 | so yeah so |
---|
0:11:05 | so that when i say |
---|
0:11:12 | no yeah no |
---|
0:11:16 | so |
---|
0:11:16 | that's the thing with that |
---|
0:11:18 | but would feature transforms and |
---|
0:11:20 | okay that is |
---|
0:11:21 | to multiple here |
---|
0:11:23 | first of for for each type there are multiple transforms and then my |
---|
0:11:27 | that's types |
---|
0:11:27 | composed of good |
---|
0:11:29 | and i don't know |
---|
0:11:30 | for the roof feel the need for a but when me to the need for a will think about four |
---|
0:11:33 | to do this |
---|
0:11:34 | i and probably will be handled in something like a decodable |
---|
0:11:38 | uh object level because |
---|
0:11:39 | nothing |
---|
0:11:41 | else needs to know about |
---|
0:11:42 | uh how the compose |
---|
0:11:45 | so that's the end of |
---|
0:11:46 | we would be you of |
---|
0:11:48 | a models |
---|
0:11:50 | i |
---|
0:11:55 | i |
---|