0:00:13 | i |
---|
0:00:14 | i think this is gonna be the official theme tune L T |
---|
0:00:19 | a |
---|
0:00:21 | okay |
---|
0:00:21 | so i gonna be talking about yeah |
---|
0:00:24 | how to rank of the |
---|
0:00:26 | and |
---|
0:00:27 | so basically will be going |
---|
0:00:29 | we have to recipes on on uh |
---|
0:00:31 | yeah |
---|
0:00:32 | the come our stuff one is results man as one a most wall street journal |
---|
0:00:36 | we gonna go through some all the results management recipe |
---|
0:00:39 | and well |
---|
0:00:40 | well have a few digressions to explain |
---|
0:00:43 | as much of the internal the cal as you need to know it's a kind of a understand that |
---|
0:00:48 | that |
---|
0:00:48 | we those to the installation process i'll describe the unix one because that's one most people will use but |
---|
0:00:54 | it also has a visual studio windows one |
---|
0:00:56 | uh |
---|
0:00:57 | the scripts i I scripts or all and bash again for popularity reasons |
---|
0:01:02 | D is kind of agnostic about the shell there's nothing unit that's really specific to anyone shell |
---|
0:01:07 | no |
---|
0:01:08 | suppose you want to download cal than you want to run it |
---|
0:01:11 | a |
---|
0:01:12 | you probably first go to this |
---|
0:01:14 | location kaldi don't source forge don't net |
---|
0:01:17 | i F will also work |
---|
0:01:18 | we have a page of documentation that explains the |
---|
0:01:22 | we much everything county relate to |
---|
0:01:24 | uh |
---|
0:01:25 | we use a source control program called sub ocean the command name is S the N |
---|
0:01:30 | it'll typically be installed on most of the system you will have |
---|
0:01:35 | uh |
---|
0:01:36 | it's a S U N as a little bit like C V S but it's a more modern implementation |
---|
0:01:41 | so to check out cal the you would just type this command on |
---|
0:01:44 | and this will a |
---|
0:01:46 | the a lot of stuff go by and the screen it will check out a bunch of directories the code |
---|
0:01:50 | the screen |
---|
0:01:52 | an installation instructions the documentation source |
---|
0:01:55 | so on |
---|
0:01:56 | uh |
---|
0:01:57 | the installation instructions you just |
---|
0:01:59 | look at the install file |
---|
0:02:00 | and the installation is pretty simple that's like change direct to here |
---|
0:02:04 | ron installed a S say |
---|
0:02:06 | C D to here running |
---|
0:02:08 | one can figure run me |
---|
0:02:10 | there is an rather nonzero probability that something will go wrong |
---|
0:02:14 | because it does kind of hope that set things there install |
---|
0:02:17 | but we kind of provide instructions |
---|
0:02:19 | a the common cases |
---|
0:02:21 | and if it doesn't and stop please ask me and i'll tried the |
---|
0:02:24 | help you to get it to install |
---|
0:02:26 | uh |
---|
0:02:28 | that there is a directory of kind of external tools and |
---|
0:02:31 | we try to have a script to configure and uh |
---|
0:02:34 | to download and make |
---|
0:02:36 | all of these external tools so that you don't have to |
---|
0:02:39 | worry about that yourself |
---|
0:02:40 | these include as P H two right |
---|
0:02:43 | two |
---|
0:02:43 | because sphere files |
---|
0:02:45 | i yeah rest M is a language modeling toolkit |
---|
0:02:49 | uh |
---|
0:02:49 | with a that them as i mentioned before we chose this because it has a right of be open license |
---|
0:02:54 | very |
---|
0:02:55 | limited features |
---|
0:02:56 | openfst at such |
---|
0:02:59 | so |
---|
0:03:00 | but you done that you checked it doubt you try to install it |
---|
0:03:04 | uh |
---|
0:03:05 | no scratch that sense |
---|
0:03:07 | uh |
---|
0:03:08 | so |
---|
0:03:10 | a and the future we gonna have version numbers and everything currently because we haven't yet come to version one |
---|
0:03:15 | point are we just have trunk |
---|
0:03:17 | which is the kind of version control thing for whatever your current code is |
---|
0:03:21 | uh inside then you'll find |
---|
0:03:24 | rules which is a place where we gonna download and compile various external to |
---|
0:03:29 | the find has sars the source which is |
---|
0:03:31 | well all of us source code is including the source for our documentation |
---|
0:03:36 | and these are the subdirectories in there |
---|
0:03:38 | that these are these are the names of the things that i showed you on that funny uh slide with |
---|
0:03:42 | the rectangles |
---|
0:03:44 | so this this is all the subdirectories of code |
---|
0:03:46 | and E G the directory a script |
---|
0:03:49 | they contain the results management and |
---|
0:03:52 | and wall street journal scripts |
---|
0:03:53 | the you can probably see it's it's and of hubris and the naming scheme here we we got we went |
---|
0:03:59 | for their deep naming scheme because we |
---|
0:04:01 | believe that eventually will be tons of script |
---|
0:04:04 | and those directories |
---|
0:04:06 | i |
---|
0:04:08 | so |
---|
0:04:10 | i was you've uh |
---|
0:04:12 | then the installation in tools and that that's we just wanna get script |
---|
0:04:17 | you got the source to |
---|
0:04:18 | configure |
---|
0:04:20 | the the configure script that sometimes configure script so these vast |
---|
0:04:24 | scripts that also generated by things like to make |
---|
0:04:27 | or whatever it is |
---|
0:04:29 | but this one is just a hand generated wanted tries to find where you're |
---|
0:04:32 | like atlas library or steal a pack libraries and if it finds it |
---|
0:04:36 | then a composite with that |
---|
0:04:38 | and it and it detects certain like |
---|
0:04:40 | certain systems like |
---|
0:04:42 | cygwin and |
---|
0:04:43 | mac os that have |
---|
0:04:45 | particular setups that are common and then it handles those a separate |
---|
0:04:50 | uh |
---|
0:04:51 | uh it's good to talk minus J for when you make |
---|
0:04:54 | decode because there's a lot of uh |
---|
0:04:56 | tools and the code is rather template and so the compilation is a little bit slow |
---|
0:05:00 | this makes it in parallel |
---|
0:05:03 | you don't make test the ghost all the subdirectories and |
---|
0:05:06 | runs all the programs that and with dashed test |
---|
0:05:09 | we have a lot of testing programs |
---|
0:05:11 | they're mostly uh |
---|
0:05:12 | units S |
---|
0:05:13 | to make sure that |
---|
0:05:15 | all of the code is working things like |
---|
0:05:17 | you have a matrix |
---|
0:05:18 | multiplication or something you do the multiplication and you |
---|
0:05:21 | verified that the answer was right |
---|
0:05:23 | like that |
---|
0:05:25 | uh |
---|
0:05:27 | and there's also you if you can also type make well grind it runs a program called well grind to |
---|
0:05:31 | check for memory error |
---|
0:05:33 | and that that would |
---|
0:05:34 | i mean right now there's no error but that would detect |
---|
0:05:37 | if there with things like |
---|
0:05:38 | and allocated memory |
---|
0:05:41 | so suppose you've done and you to make you type make test and |
---|
0:05:45 | i thing one wrong |
---|
0:05:47 | so |
---|
0:05:48 | you C D two |
---|
0:05:49 | E jeez are S one and it's is where example script uh |
---|
0:05:53 | this just seems that you |
---|
0:05:56 | you know you member of the L D C a what have and you have |
---|
0:05:58 | you have access to be |
---|
0:06:00 | you did think this corpus i think for members that's like three hundred dollars the a lot of sites will |
---|
0:06:05 | have it already |
---|
0:06:06 | so the results management corpus it the a all really simple corpus |
---|
0:06:10 | but |
---|
0:06:11 | uh |
---|
0:06:13 | we use a because it's really fast to run and it |
---|
0:06:15 | because it's kind of an lvcsr like task is really medium vocabulary but |
---|
0:06:19 | because it contains that words |
---|
0:06:21 | and has a lexicon and everything it kind of but haze like a typical lvcsr system even though it's |
---|
0:06:26 | or |
---|
0:06:27 | uh |
---|
0:06:29 | so that's be on some directory and you have to figure out what that directory is |
---|
0:06:34 | at some point you have to pass it to one of the scrip |
---|
0:06:37 | as a bunch of come than here that you're supposed to write you know you're not real expect the run |
---|
0:06:41 | this directly it will just X it on you if you do that |
---|
0:06:44 | it's |
---|
0:06:45 | it's just a sequence of commands you're expected to run by had |
---|
0:06:48 | because there's a high enough probability that any given one of the most failed that |
---|
0:06:53 | you thought it wasn't good to a be over optimistic can just make it a single script |
---|
0:06:57 | i mean the failure is the gonna be do to simple things like |
---|
0:07:00 | maybe the wrong directory as some |
---|
0:07:02 | but |
---|
0:07:03 | anyway |
---|
0:07:03 | so i i'm gonna go through what this run done S age that |
---|
0:07:08 | the first thing is data preparation |
---|
0:07:10 | and |
---|
0:07:11 | so you will |
---|
0:07:13 | you'll see that the door called data probably B to there |
---|
0:07:16 | you know you know what the directory if your results management uh |
---|
0:07:20 | data data is |
---|
0:07:21 | this is up this is the ldc the |
---|
0:07:24 | give it that uh |
---|
0:07:25 | and it'll just do a bunch of stuff basic with convert thing whatever format this |
---|
0:07:29 | corpus has |
---|
0:07:31 | in to a format that deal like |
---|
0:07:34 | and you know |
---|
0:07:35 | Q waiting lists of file names like that |
---|
0:07:38 | these |
---|
0:07:39 | you C D out |
---|
0:07:40 | uh |
---|
0:07:41 | just for things that are created by this |
---|
0:07:44 | that was the bunch of stuff actually in this directory that it |
---|
0:07:47 | create |
---|
0:07:48 | here's an example |
---|
0:07:49 | S C P file |
---|
0:07:51 | so it contains the utterance i'd B |
---|
0:07:54 | and then |
---|
0:07:55 | if is the pipe come on |
---|
0:07:58 | from his the here's apply can man so this can is gonna be run whenever some program tries to read |
---|
0:08:02 | this |
---|
0:08:03 | now and C P far this of the concept that |
---|
0:08:06 | the that's okay had is not really quite the same as C K's notion of an S C P file |
---|
0:08:11 | not be explaining later exactly what that is |
---|
0:08:13 | another think that's created here is |
---|
0:08:16 | a decoding graph in fst format |
---|
0:08:18 | in in some other scripts like in the wall street journal script |
---|
0:08:22 | this stage wouldn't be creating any fsts it would just create an arpa |
---|
0:08:25 | the because our M doesn't use an arc a we do like this |
---|
0:08:30 | so uh |
---|
0:08:30 | uh |
---|
0:08:32 | and stage of data preparation uh |
---|
0:08:34 | oh yeah it's is created in that directory to it comes from |
---|
0:08:38 | stuff that's in the results management this |
---|
0:08:40 | it'll create a lexicon for you in this form |
---|
0:08:43 | is pretty of obvious |
---|
0:08:45 | and will to ten into an F C |
---|
0:08:47 | the call tools don't we deal with this directly |
---|
0:08:50 | we deal with fst so the lexicon that you give to count |
---|
0:08:53 | is gonna be an open |
---|
0:08:54 | i |
---|
0:08:55 | format |
---|
0:08:56 | fast |
---|
0:08:57 | that there also some uh is the speaker matt |
---|
0:09:00 | so |
---|
0:09:01 | this of that are inside the this a speaker I D |
---|
0:09:04 | the file that contains a lot of the |
---|
0:09:06 | this this is how to the |
---|
0:09:08 | you know maps utterances just because and vice versa |
---|
0:09:11 | there's no notion of like |
---|
0:09:12 | masks of comments or |
---|
0:09:15 | uh_huh |
---|
0:09:17 | so |
---|
0:09:17 | yeah that's content about turns idea is quite important important and D |
---|
0:09:21 | in never there's no notion of like parsing file in thing like the last element is the utterance idea what's |
---|
0:09:27 | of |
---|
0:09:27 | you have to have an explicit uh list |
---|
0:09:30 | and all of these that C P files than R kaiser index by this utterance are inside the you have |
---|
0:09:34 | to decide on |
---|
0:09:35 | uh |
---|
0:09:37 | we are |
---|
0:09:38 | that script also create a text format of the transcriptions but will convert this into an integer format kaldi eli |
---|
0:09:44 | just just so the cal doesn't need to have |
---|
0:09:47 | for all of the program some kind of match between the |
---|
0:09:50 | text an integer form of the uh |
---|
0:09:53 | the the uh word |
---|
0:09:55 | so this is the transcript the text format of the trans |
---|
0:09:57 | oops |
---|
0:09:59 | uh |
---|
0:10:00 | next step after to that it the prep stuff we |
---|
0:10:02 | the pair the graphs |
---|
0:10:04 | there's is gonna be a |
---|
0:10:06 | a bunch of openfst if commands and here like scripts to convert |
---|
0:10:10 | from uh |
---|
0:10:11 | from the lexicon to the fst format |
---|
0:10:15 | the |
---|
0:10:16 | the lexicon actually contains the silent |
---|
0:10:18 | and the scrip |
---|
0:10:20 | the the script kind of at and it's not something that |
---|
0:10:22 | very deeply embedded |
---|
0:10:24 | in county |
---|
0:10:25 | these these little files if you've ever used at indy toolkit or openfst you'll know what these are |
---|
0:10:31 | there |
---|
0:10:32 | symbol tables |
---|
0:10:33 | so so it's uh this uh |
---|
0:10:36 | this is the text form of zero the text form of one et cetera |
---|
0:10:39 | and E P S for epsilon is always zero |
---|
0:10:43 | this is kind of uh |
---|
0:10:45 | a common thing an fst toolkits knows of that idea zero |
---|
0:10:48 | so |
---|
0:10:49 | so this is why phones or one based because |
---|
0:10:52 | zero is always reserved for epsilon |
---|
0:10:56 | uh |
---|
0:10:58 | so |
---|
0:10:59 | the the |
---|
0:11:00 | create |
---|
0:11:01 | yeah all of the F so openfst does have a capability to put |
---|
0:11:05 | to put symbol tables on the fst so the fsts we kind of know what the words were |
---|
0:11:10 | we haven't used that because it |
---|
0:11:13 | it quickly becomes very difficult once you decide to have simple tables on the fsts we've |
---|
0:11:17 | we basically use integer format throughout the data |
---|
0:11:21 | which |
---|
0:11:23 | uh |
---|
0:11:23 | it outputs these files this is the |
---|
0:11:26 | gee the grammar or that could be not the language model used for decoding |
---|
0:11:30 | so the lexicon |
---|
0:11:31 | L L just got this one big is the lexicon the disambiguation symbols |
---|
0:11:35 | and if anyone has read the papers of uh |
---|
0:11:38 | more riyadh i'll the described the standard recipe for fst based uh |
---|
0:11:43 | yes uh |
---|
0:11:44 | i don't know what that is |
---|
0:11:46 | little symbols like hash one hashed to |
---|
0:11:49 | that they put on the lexicon of the ends of words |
---|
0:11:52 | to ensure that term eyes ability |
---|
0:11:54 | uh |
---|
0:11:56 | i i i i but i'm not going to that in more detail or it's gonna |
---|
0:11:59 | suck up the entire time of the tall |
---|
0:12:04 | uh |
---|
0:12:06 | pairing integer list of silence and nonsilence phones i we we we created little files |
---|
0:12:11 | tape things like this |
---|
0:12:13 | isn't needed later on by the scripts because occasionally a scribble need to know what the I D's of the |
---|
0:12:17 | silence phones |
---|
0:12:19 | and because the kaldi tools will at integer formats it's gonna need that |
---|
0:12:22 | as a list of integers |
---|
0:12:25 | uh computing remote okay so this is |
---|
0:12:29 | this is just a command to |
---|
0:12:31 | and vocal kind of other script |
---|
0:12:33 | that uh |
---|
0:12:35 | compute the mfcc |
---|
0:12:37 | and and here is the command and i believe actually this before |
---|
0:12:40 | it's uh |
---|
0:12:43 | it basically write cm mfcc to some disk |
---|
0:12:46 | and then this is gonna be a text file |
---|
0:12:48 | that |
---|
0:12:49 | contains |
---|
0:12:51 | on each line is gonna be utterance id |
---|
0:12:53 | and then |
---|
0:12:54 | the law that this |
---|
0:12:56 | filename |
---|
0:12:57 | cool on |
---|
0:12:58 | integer offset so it so it can kind of |
---|
0:13:01 | directly go to that |
---|
0:13:03 | part of the file using F C |
---|
0:13:07 | okay i think this is what i just said |
---|
0:13:09 | this is the uh |
---|
0:13:10 | script format that i mentioned |
---|
0:13:12 | of course |
---|
0:13:12 | the script format is very generic this whole thing doesn't have to be of this formant it's any |
---|
0:13:18 | is anyone of our extended filenames might include a real file something of this form |
---|
0:13:23 | pi whatever |
---|
0:13:25 | i i |
---|
0:13:26 | i'm showing you what the archive format looks like that really is binary data so that you can |
---|
0:13:31 | see |
---|
0:13:32 | uh |
---|
0:13:34 | yeah |
---|
0:13:35 | but but in some cases there would be text i'm you you |
---|
0:13:38 | you can give it the option to write in text and and very often it'll be a nice line by |
---|
0:13:42 | line format |
---|
0:13:45 | yeah i |
---|
0:13:47 | uh |
---|
0:13:48 | yeah so |
---|
0:13:50 | i think i mentioned this before |
---|
0:13:51 | the script |
---|
0:13:53 | this key is an important concept |
---|
0:13:55 | because there's this concept of uh |
---|
0:13:58 | a collection of objects indexed by key in this case the string |
---|
0:14:02 | think of a little bit like an S T L map |
---|
0:14:05 | where you know it would be a map from string to whatever object |
---|
0:14:10 | so |
---|
0:14:11 | the archives in the script both the kind of make use of this concept |
---|
0:14:15 | and i think that concept a little bit more detail in the next slide but the script format is the |
---|
0:14:19 | key and then |
---|
0:14:20 | some kind of extended filename blah blah blah |
---|
0:14:24 | i think i mentioned this before but the types of extended filenames include actual file |
---|
0:14:30 | a command |
---|
0:14:31 | piping output |
---|
0:14:33 | pipe symbol then the command which is like in |
---|
0:14:36 | which is the input motion and out but this is only a pretty |
---|
0:14:41 | very inputting from applied |
---|
0:14:43 | an an offset into a file which is uh |
---|
0:14:47 | which is useful where you where you want to write a big archive but have random access into |
---|
0:14:52 | uh |
---|
0:14:52 | so |
---|
0:14:54 | this might seem like a very |
---|
0:14:55 | in minus things i think it's important that |
---|
0:14:57 | if you as you want to ever use count it's and to understand this how this work |
---|
0:15:01 | so |
---|
0:15:02 | there's the concept of a table |
---|
0:15:04 | and this table doesn't really correspond to any like concrete objects or class it |
---|
0:15:08 | some a generic comes that the idea or is |
---|
0:15:11 | a collection of objects of some known type it's type known and of of |
---|
0:15:15 | all indexed by |
---|
0:15:17 | key which is the string |
---|
0:15:19 | we we define a key is the non empty space free string for |
---|
0:15:23 | that |
---|
0:15:24 | and i was we have to make its space free so |
---|
0:15:26 | otherwise we get it all kinds of issues |
---|
0:15:28 | a |
---|
0:15:29 | so |
---|
0:15:30 | so |
---|
0:15:31 | there was a street template plated class of that somehow relate to tables |
---|
0:15:35 | is the table right ear |
---|
0:15:36 | sequential table read or and random access table with that |
---|
0:15:40 | so this two ways you this three ways you can do something with a table |
---|
0:15:44 | you can write a table |
---|
0:15:46 | and what you like you do with this is you |
---|
0:15:49 | you'd say write me something with this key |
---|
0:15:52 | and this object |
---|
0:15:53 | that's gonna write it to the table |
---|
0:15:56 | a any in you keep doing that |
---|
0:15:58 | you can read a table chilly |
---|
0:16:01 | which means repeatedly give the next key and giving the next subject |
---|
0:16:05 | are you can random act you can do random access on a table which means |
---|
0:16:09 | do you have a object this key and so no if so |
---|
0:16:12 | give me the object |
---|
0:16:13 | that's how you interact at is the templates the template it on |
---|
0:16:17 | i gonna describe the next |
---|
0:16:19 | like what they're ten it on |
---|
0:16:21 | then not actually template on the object |
---|
0:16:23 | it's it would be most natural to template |
---|
0:16:26 | on the object that's in the table |
---|
0:16:28 | but the problem is that doesn't work very well with uh |
---|
0:16:32 | kind of fundamental types like integers and so on |
---|
0:16:34 | because it "'cause" that normal cal the object |
---|
0:16:37 | they have a read function and a right function have a particular |
---|
0:16:41 | behavior |
---|
0:16:42 | it's common all of them |
---|
0:16:43 | but we can't just to see using the everything we want to read and write will have that form |
---|
0:16:47 | because how would be writing to do is a how would write as T L like |
---|
0:16:52 | and it and it would be ridiculous and my pin to somehow have to derive a class that's an integer |
---|
0:16:57 | and give it a |
---|
0:16:58 | thank |
---|
0:16:59 | of the integers not class |
---|
0:17:00 | last |
---|
0:17:01 | so |
---|
0:17:02 | we tend like um what we call a holder |
---|
0:17:04 | a hold class as a cost that has set and read and write functions |
---|
0:17:09 | uh |
---|
0:17:10 | and it has a type that T inside it |
---|
0:17:13 | that |
---|
0:17:13 | is the actual type of the table whole |
---|
0:17:16 | so |
---|
0:17:18 | you know knowing all of this stuff |
---|
0:17:20 | is |
---|
0:17:20 | if you if the i lost to by not because you know a C plus plus are really doesn't matter |
---|
0:17:25 | because this is i'm just it's how the channels of this like as am works but uh |
---|
0:17:31 | you don't need to know this to understand the how the whole thing work |
---|
0:17:35 | so |
---|
0:17:37 | i think that's as an example of how |
---|
0:17:39 | i the C plus plus level you use that the table comes |
---|
0:17:43 | so |
---|
0:17:44 | we we introduce things of terminology here that may seem a bit annoying but |
---|
0:17:48 | eventually becomes clarifying |
---|
0:17:50 | and i are specify or is a string that tells the table code had to read a table of check |
---|
0:17:56 | uh |
---|
0:17:57 | and his an example of one |
---|
0:17:59 | uh |
---|
0:18:00 | yeah yeah K call on this finally |
---|
0:18:02 | so |
---|
0:18:03 | the table code is gonna part this and |
---|
0:18:06 | when it reads this it's as okay |
---|
0:18:08 | yeah telling me that this is an arc |
---|
0:18:10 | um thing that has the |
---|
0:18:11 | key object key object |
---|
0:18:14 | and this is an extended file name but tells you had to open a pipe of |
---|
0:18:18 | or |
---|
0:18:18 | open a tree |
---|
0:18:21 | so |
---|
0:18:23 | now this is the tight name |
---|
0:18:24 | sequential table read template it on this holder tie |
---|
0:18:28 | so this is |
---|
0:18:29 | if were reading something of type in thirty two |
---|
0:18:33 | so this is the use of the object name |
---|
0:18:36 | the and initialize that we're giving it this string |
---|
0:18:39 | so |
---|
0:18:40 | it's soon as you initialise the object it it's |
---|
0:18:42 | opening the high |
---|
0:18:43 | it's say we gonna read from this |
---|
0:18:46 | so |
---|
0:18:47 | now we now we using the subject with thing what |
---|
0:18:49 | for blah blah about |
---|
0:18:52 | what is what this code is doing that's getting each key and to and from the sequential table read |
---|
0:18:58 | and of course this and since this is the sequential table read that's what this subject |
---|
0:19:02 | expect us to do |
---|
0:19:03 | so the point is that |
---|
0:19:06 | the maybe error it's right |
---|
0:19:07 | some of the objects may not be there |
---|
0:19:09 | sometimes you know something may go wrong |
---|
0:19:12 | this |
---|
0:19:13 | the template it code is gonna handle that so you're kind of |
---|
0:19:16 | user level code |
---|
0:19:18 | just see that as a |
---|
0:19:19 | sequential access |
---|
0:19:22 | i think this |
---|
0:19:23 | uh |
---|
0:19:25 | a stuff that have already told you |
---|
0:19:28 | a there is some things that the table code has to do there were little bit tricky |
---|
0:19:32 | one one of these things as |
---|
0:19:34 | a very often you |
---|
0:19:35 | once to do random access on objects that are |
---|
0:19:38 | in an archive in that our K may maybe in a pi |
---|
0:19:41 | as use a lot of high |
---|
0:19:43 | and and the problem is that |
---|
0:19:44 | suppose to some reason you ask you query a key that was not in the arc |
---|
0:19:49 | in order it's of tell you know it wasn't in the arc |
---|
0:19:52 | it's gonna have to read each one in the archive |
---|
0:19:54 | go to the end of the pie and then saying no |
---|
0:19:57 | but that means that i doesn't know that you're not gonna ask for something else to so has got the |
---|
0:20:01 | store all of that stuff and member |
---|
0:20:03 | so |
---|
0:20:04 | in in order to uh |
---|
0:20:07 | stop it from having to do this |
---|
0:20:09 | you can specify and the are specified thing a little |
---|
0:20:12 | common S calm cs S |
---|
0:20:14 | a options that what tell it |
---|
0:20:15 | this archive is sorted on key |
---|
0:20:18 | are we gonna call this archive in sorted or |
---|
0:20:22 | so basically that gives the code enough information to know that |
---|
0:20:25 | i it doesn't have to store all the stuff and memory in it can still kind of |
---|
0:20:28 | be correct |
---|
0:20:30 | i'm gonna go a little bit fast is reduced |
---|
0:20:33 | uh |
---|
0:20:34 | i think we went through this computing mfccs |
---|
0:20:38 | monophone training |
---|
0:20:40 | so |
---|
0:20:42 | you would invoke this script |
---|
0:20:44 | uh |
---|
0:20:46 | we gonna go through the script a little bit |
---|
0:20:48 | it's set some some very than bash the directory were what are you doing your experiment |
---|
0:20:53 | the features i think we so one of the strings for |
---|
0:20:57 | this is |
---|
0:20:57 | and are specified that i mentioned before |
---|
0:21:00 | this |
---|
0:21:01 | this part tell the that |
---|
0:21:03 | we're gonna and separate this stream as an archive this tells the had to open the stream |
---|
0:21:08 | and of course this is a i that's another colour the command has its own thing |
---|
0:21:12 | sometimes it can can even be nested but beyond one level of nesting |
---|
0:21:16 | be the shell escaping would become to thing |
---|
0:21:19 | that |
---|
0:21:23 | hi this is applied |
---|
0:21:25 | what's so in fact this is an output is always that puts on the right |
---|
0:21:28 | so what this is a it i think that out says |
---|
0:21:31 | it's reading in this |
---|
0:21:33 | this script file that says where the features a |
---|
0:21:36 | and its output thing to an are kind of on the standard up |
---|
0:21:39 | so and then this says that this whole thing is a pie |
---|
0:21:43 | so this park gets interpreted by the program that |
---|
0:21:46 | is given that |
---|
0:21:50 | yeah |
---|
0:21:51 | you can used to it |
---|
0:21:52 | as you i |
---|
0:21:53 | oh |
---|
0:21:55 | huh |
---|
0:21:55 | so |
---|
0:21:56 | and that |
---|
0:21:57 | is going to the monophone training script |
---|
0:22:00 | uh we create a file called the X |
---|
0:22:02 | slash last let's top L |
---|
0:22:04 | that specifies the hitch an apology |
---|
0:22:07 | to be uh |
---|
0:22:09 | to the uh |
---|
0:22:10 | the cow |
---|
0:22:12 | so |
---|
0:22:13 | i mean you you can this file for a fairly self explanatory a script repeat that |
---|
0:22:19 | uh |
---|
0:22:21 | there is uh |
---|
0:22:22 | is it of the three state and then this is the kind of final state that |
---|
0:22:26 | call of that the last state always has an X a probability of one |
---|
0:22:32 | uh |
---|
0:22:33 | this is a week amount to initialize the uh G M and |
---|
0:22:37 | initialize that with the dimension of thirty nine outputs puts the here |
---|
0:22:41 | and this also outputs a tree very trivial tree that doesn't really have any splits and it |
---|
0:22:46 | and that's how we handle a monophone system |
---|
0:22:48 | even a monophone system has a decision tree |
---|
0:22:51 | it's just so that you don't have you know all the code is you five |
---|
0:22:55 | uh |
---|
0:22:57 | see if we have |
---|
0:23:02 | okay |
---|
0:23:02 | creating decoding graphs for training |
---|
0:23:05 | or all of the kind of training script have a command of this form that |
---|
0:23:09 | it creates an archive that have what has all of the fsts one for each are |
---|
0:23:13 | and we do this |
---|
0:23:14 | as a separate come "'cause" otherwise it would be too slow we'd only do on each iteration |
---|
0:23:19 | a little bit too slow so |
---|
0:23:21 | i take that the initial model |
---|
0:23:23 | the lexicon a fist C |
---|
0:23:26 | uh |
---|
0:23:27 | trained a all this of the transcriptions an integer format |
---|
0:23:31 | and that the put goes to this sprite that it just use that it and puts it in a |
---|
0:23:36 | and that file |
---|
0:23:38 | so uh |
---|
0:23:39 | this is just the format of the dot track not try file it's just an integer at uh |
---|
0:23:44 | transcription where we've can all of the strings so their integer |
---|
0:23:48 | numbers |
---|
0:23:50 | no of people like that |
---|
0:23:52 | a |
---|
0:23:56 | you okay so |
---|
0:23:57 | the very first stage of uh monophone training is the flat start where |
---|
0:24:01 | you uh |
---|
0:24:02 | and of in |
---|
0:24:03 | divide the utterance equally |
---|
0:24:05 | a to the number of phones or whatsoever |
---|
0:24:07 | and uh |
---|
0:24:08 | create a an alignment a once to that so |
---|
0:24:13 | yeah output of this program is something called alignment |
---|
0:24:16 | which is |
---|
0:24:17 | basically for each utterance it's a vector of integer |
---|
0:24:20 | in to those integers is an id D that i touched on earlier we call a transition i D |
---|
0:24:25 | it's something that behaves roughly similar to the P D F |
---|
0:24:30 | index of P D |
---|
0:24:31 | i D |
---|
0:24:32 | but it has a little bit more information so you know the phone you know what the transition lot |
---|
0:24:36 | so it kind of contains sufficient |
---|
0:24:38 | information to to to update data |
---|
0:24:42 | so we put that into this |
---|
0:24:43 | program gmm max that |
---|
0:24:45 | a light the suffix a means that it read an alignment |
---|
0:24:49 | "'cause" the different versions of this program that we alignments that read in uh |
---|
0:24:53 | posteriors gaussian in little posters and different thing |
---|
0:24:57 | so |
---|
0:24:59 | it takes the model it take the feature this of the shell variable is good bye |
---|
0:25:04 | it read than this stuff from the input put an input |
---|
0:25:07 | and the outputs of this |
---|
0:25:10 | so but by the way |
---|
0:25:13 | whenever something has a arc on it or or the C P O |
---|
0:25:16 | that |
---|
0:25:17 | that's an are specify or or doubly specify that means that as a collection of objects being passed around indexed |
---|
0:25:23 | by key |
---|
0:25:24 | but if you don't see that |
---|
0:25:26 | like here and is just a file is just a single stream |
---|
0:25:29 | is not there's no notion of index |
---|
0:25:34 | a |
---|
0:25:35 | there the |
---|
0:25:36 | i think a cover this |
---|
0:25:38 | a this oh you and that's is the gmm mm update |
---|
0:25:41 | so it takes the |
---|
0:25:44 | the original late to outputs the you model |
---|
0:25:50 | so |
---|
0:25:50 | the that this is the viterbi stage of training |
---|
0:25:53 | what what we do during training is on so on selected to rate it's iterations we redo the alignment |
---|
0:25:59 | we don't necessarily do that every iteration simply because |
---|
0:26:02 | this is the |
---|
0:26:03 | this is the thing that takes most to the time |
---|
0:26:06 | and and it "'cause" it |
---|
0:26:09 | if you have multiple gaussian Z |
---|
0:26:11 | uh |
---|
0:26:12 | this is not the only thing that's going on in training so it makes sense to uh |
---|
0:26:16 | not do it every |
---|
0:26:17 | so |
---|
0:26:19 | i think this is pretty obvious that should be to that she's here |
---|
0:26:23 | it |
---|
0:26:24 | i you give it the beam |
---|
0:26:25 | with the model this is the yeah this is this stream that that has all the fsts on it |
---|
0:26:32 | features |
---|
0:26:33 | and uh |
---|
0:26:35 | it's gonna right |
---|
0:26:36 | it's gonna |
---|
0:26:37 | sorry oh that's a as an option i mentioned briefly options |
---|
0:26:41 | on these are specify or or in this case a double is just five it so that a right in |
---|
0:26:44 | text format |
---|
0:26:46 | the default is binary but you could do common be if you want to emphasise that |
---|
0:26:51 | uh |
---|
0:26:51 | you monophone training we re align on almost every iteration because |
---|
0:26:56 | thing i found that that would better or something thing or maybe it's because you usually have single gaussian |
---|
0:27:01 | uh during right |
---|
0:27:03 | i i think |
---|
0:27:04 | after that you system is they do to but you don't have to |
---|
0:27:07 | we so often so typically during it kind of you pocket triphone training we'd only |
---|
0:27:12 | realigned three or four time |
---|
0:27:14 | uh |
---|
0:27:15 | so mix up to increase the number of gaussian |
---|
0:27:17 | is maybe slightly against the whole called a philosophy but |
---|
0:27:20 | it's just an option to the update program |
---|
0:27:24 | uh |
---|
0:27:25 | the way we allocate gas since we don't have a constant number of gaussians per state |
---|
0:27:30 | we |
---|
0:27:31 | we provide uh |
---|
0:27:33 | it it's a power law it's proportional to the count |
---|
0:27:36 | and this shouldn't be no but by the should be not point to i don't know why that |
---|
0:27:40 | way |
---|
0:27:41 | uh |
---|
0:27:42 | it it's just slightly better than having a constant number |
---|
0:27:46 | so yeah just schedule we used to allocate the guest in that's typically |
---|
0:27:50 | you start from a set the number |
---|
0:27:52 | you linearly increase |
---|
0:27:54 | and then it |
---|
0:27:55 | levels out it would probably be more natural to increase with the log |
---|
0:28:00 | kind of increase of the power law something but |
---|
0:28:02 | it just didn't work as well |
---|
0:28:03 | was but it to do a linear |
---|
0:28:08 | i |
---|
0:28:09 | uh |
---|
0:28:10 | okay |
---|
0:28:11 | so a triphone training |
---|
0:28:13 | the first stage is we |
---|
0:28:15 | we align all of the data of that we uh |
---|
0:28:18 | for the monophone we use the subset because this is no point |
---|
0:28:22 | so just small system |
---|
0:28:23 | so we re all of the data and we've output alignment |
---|
0:28:27 | we |
---|
0:28:27 | we we Q my a special kind of stats for training the decision tree |
---|
0:28:31 | what this is |
---|
0:28:32 | for each unique tries |
---|
0:28:34 | triphone context in this case |
---|
0:28:36 | it's gonna a malay single gaussian |
---|
0:28:39 | well the stats for a single gaussian and this is gonna and was to train the tree the standard way |
---|
0:28:44 | so that the just stuff in the script that kind of |
---|
0:28:47 | automatically that some automatic clustering produces question |
---|
0:28:51 | we don't use hundred or questions is as the hassle |
---|
0:28:54 | find them |
---|
0:28:56 | and and this |
---|
0:28:58 | it a these a producing various files that will be read like D |
---|
0:29:01 | so a lot of the actual control of how the tree get set up is some of the script level |
---|
0:29:07 | a building the tree this is the colour command the bill the tree |
---|
0:29:12 | what that's actually does is the it goes that to fifteen hundred leaves |
---|
0:29:15 | and then it kind of clutches it like down a little bit |
---|
0:29:18 | but by nonpredictable amount because |
---|
0:29:20 | yeah chills threshold it uses to |
---|
0:29:23 | you the clustering of to the initial splitting |
---|
0:29:26 | is the same as the kind of last successful split |
---|
0:29:29 | so you can't quite predict have big it'll be but normally it's tricks by twenty percent |
---|
0:29:34 | or is what you |
---|
0:29:35 | it's give you |
---|
0:29:37 | so you initialise the model |
---|
0:29:39 | for this tree this this this program doesn't know if it's gonna be a gmm or of for for various |
---|
0:29:44 | gmm or S gmm you gonna create |
---|
0:29:47 | oh to separate program to initialize the model |
---|
0:29:50 | uh |
---|
0:29:52 | is a nice feature of the whole alignment |
---|
0:29:54 | onset |
---|
0:29:55 | you can actually take can a and produce for one model |
---|
0:29:58 | and converted it to kind of be valid for another model |
---|
0:30:01 | so that means that |
---|
0:30:03 | you can avoid it's a certain amount of uh we generating a |
---|
0:30:10 | okay so if you want to decode you have to build the decoding graph |
---|
0:30:14 | this is a this is the how we and be a graph generation |
---|
0:30:18 | and the think that is doing first to compose is L with G |
---|
0:30:22 | it's a as minimize is you get a L G |
---|
0:30:25 | that's an |
---|
0:30:26 | this some |
---|
0:30:27 | so stuff for disambiguation symbols going on |
---|
0:30:30 | uh |
---|
0:30:31 | if if you and are gonna go through that |
---|
0:30:35 | then you have to compose the in the context of christian |
---|
0:30:38 | it kind of expands the file that a context-dependent phone |
---|
0:30:42 | and |
---|
0:30:43 | that's a kind of dynamic generation of uh |
---|
0:30:46 | the context of T going on that happens within member in here |
---|
0:30:50 | and not gonna go |
---|
0:30:51 | and more do sell uh what's going on here |
---|
0:30:54 | and then this last one |
---|
0:30:56 | uh make eight trends use so that this hey jeff T that |
---|
0:30:59 | on the |
---|
0:31:01 | a basic we expand that the hey jim and |
---|
0:31:03 | so on the right to the context-dependent phones on the left |
---|
0:31:06 | you got the P D S but uh adds all the stuff that network |
---|
0:31:09 | so this is grading to see that does that and the last just to uh |
---|
0:31:14 | compose hates with C L G |
---|
0:31:18 | uh |
---|
0:31:19 | it's M and eyes i |
---|
0:31:20 | yeah |
---|
0:31:21 | oh in we did that without self loop so at the end |
---|
0:31:24 | i this is to to just |
---|
0:31:26 | make these more memory efficient |
---|
0:31:28 | we don't we wait till the very end had the self |
---|
0:31:34 | so |
---|
0:31:34 | this just goes prefix prefixes |
---|
0:31:36 | for the decoding script |
---|
0:31:38 | we create a shell variable that tells that what the features will be |
---|
0:31:42 | then we invoke this |
---|
0:31:43 | uh |
---|
0:31:44 | program that's three decoders and |
---|
0:31:47 | this what the affect the uh come man it could be jen and decode code |
---|
0:31:50 | simple faster or D |
---|
0:31:52 | it's the the kind of medium one |
---|
0:31:54 | there |
---|
0:31:54 | the G and decode simple is mainly that for debugging |
---|
0:31:57 | "'cause" it's so simple that |
---|
0:31:59 | you know the can be anything wrong with it sorry |
---|
0:32:01 | we just we just compare the to |
---|
0:32:05 | i |
---|
0:32:05 | yeah |
---|
0:32:06 | so |
---|
0:32:08 | decoding coding be missed twenty it's is the beam min kind of language model |
---|
0:32:12 | scale |
---|
0:32:13 | we only a the acoustic the language model |
---|
0:32:16 | uh |
---|
0:32:17 | this is just a get more human readable out put |
---|
0:32:21 | the model |
---|
0:32:22 | the F S T |
---|
0:32:25 | features |
---|
0:32:26 | this isn't a W specify specifies had to write transcription |
---|
0:32:31 | it's says do it |
---|
0:32:31 | and text for format |
---|
0:32:33 | but D the can be integers |
---|
0:32:35 | we're gonna have to change them to uh |
---|
0:32:38 | the text before scoring if one |
---|
0:32:40 | format |
---|
0:32:41 | and this is a this is the the alignment |
---|
0:32:44 | this is |
---|
0:32:45 | it is a really useful like Q just decoding |
---|
0:32:47 | but |
---|
0:32:48 | i you might want to do adaptation late to using using that |
---|
0:32:51 | decoding us that these supervision |
---|
0:32:53 | so it just kind of |
---|
0:32:54 | we just always produce a "'cause" it doesn't cost |
---|
0:33:02 | okay so |
---|
0:33:04 | i think that's basically comes to the end of this talk |
---|
0:33:07 | a given you a very vague idea of have the scripts work what's and them |
---|
0:33:11 | or we see there's a lot more details that you'd the |
---|
0:33:13 | to find out before using them but a lot of that stuff is in the documentation |
---|
0:33:17 | you have to kind of dig around the documentation |
---|
0:33:20 | i've been told that it's not very uh clear where to star |
---|
0:33:23 | but there is a lot of it there so if you just willing to read at all |
---|
0:33:27 | a the thought also it it have heavily cross reference |
---|
0:33:31 | so |
---|
0:33:31 | if you five something that kind of related to what you need |
---|
0:33:34 | that usually be a link that you can click on that will take it to what you do |
---|
0:33:39 | i |
---|
0:33:41 | okay so that's it |
---|
0:33:49 | for any question |
---|
0:33:52 | uh |
---|
0:33:55 | yeah |
---|
0:34:03 | uh |
---|
0:34:04 | you never really deals directly with |
---|
0:34:07 | with any of those symbols because all in integers |
---|
0:34:10 | so |
---|
0:34:11 | it really doesn't matter as as can and what it is so yeah i seen that you could do any |
---|
0:34:16 | t-f eight |
---|
0:34:21 | well those are those have to be |
---|
0:34:24 | does have to contain no white space and been on them |
---|
0:34:28 | so |
---|
0:34:29 | i don't i don't |
---|
0:34:31 | yeah it's not is not gonna worry but you T F a long is not white space i think that |
---|
0:34:35 | in never checks but it's actually ask |
---|
0:34:38 | but i mean |
---|
0:34:39 | i think you to if a is that's that if it's not |
---|
0:34:43 | but is no we gonna be white speech "'cause" it's all always more than a hundred and twenty eight |
---|
0:34:47 | i don't |
---|
0:34:49 | but it it should be a a for any have a |
---|
0:34:51 | i i don't really think is a good idea to put you T F A in those things |
---|
0:34:54 | because i mean |
---|
0:34:58 | don't |
---|
0:34:59 | you have old fashion |
---|
0:35:04 | uh_huh |
---|
0:35:08 | yeah |
---|
0:35:09 | i i i think it should work but |
---|
0:35:11 | you gonna be concerned about this |
---|
0:35:13 | about this |
---|
0:35:14 | shell the "'cause" it could be that the shell of doing some kind of |
---|
0:35:17 | manipulation on the lines of that foundation go weird characters |
---|
0:35:21 | i don't know with the will work |
---|
0:35:23 | you know but |
---|
0:35:26 | but it's |
---|
0:35:27 | it should be easily changeable to handle that if it really becomes an issue |
---|
0:35:31 | uh |
---|
0:35:38 | uh_huh |
---|
0:35:42 | uh |
---|
0:35:44 | i believe that can i think our i reasoning right |
---|
0:35:46 | that that i don't recall call it things ever been tested |
---|
0:35:49 | a pulse go |
---|
0:35:49 | so it it's two percent |
---|
0:35:53 | well the five that i shall can have those probabilities but |
---|
0:35:56 | i think the perl script that |
---|
0:35:58 | great the lexicon actually |
---|
0:36:00 | has a flag but is post except them |
---|
0:36:02 | or least that that one point but |
---|
0:36:04 | there of that the but it's just it's just that you know that you line for script so it's not |
---|
0:36:08 | like |
---|
0:36:13 | yeah yeah so it that really care whether a lexicon has probability is just an F T |
---|
0:36:18 | so yeah |
---|
0:36:23 | the |
---|
0:36:25 | uh |
---|
0:36:26 | yeah |
---|
0:36:37 | ooh |
---|
0:36:38 | well uh |
---|
0:36:40 | they can get very large and |
---|
0:36:42 | i i mean you with ones that we like |
---|
0:36:44 | one and a half ago |
---|
0:36:45 | i think that was with a a somewhat and trigram lm |
---|
0:36:50 | they do get very big but at some point we gonna create coders that |
---|
0:36:53 | a from that problem |
---|
0:36:56 | i mean i think the festive |
---|
0:36:57 | from a as a it's great because he D back its simple but |
---|
0:37:02 | maybe the memories |
---|
0:37:04 | we're gonna work on |
---|
0:37:08 | if there's no more questions i guess we can call it a day |
---|
0:37:11 | oh one more |
---|
0:37:14 | i guess for but if there |
---|
0:37:20 | oh |
---|
0:37:21 | you have to redo them music they own oh oh |
---|
0:37:26 | oh |
---|
0:37:28 | yeah |
---|
0:37:32 | yeah |
---|