0:00:06 | you have to do |
---|
0:00:07 | um that well yeah i eyepiece entity |
---|
0:00:10 | it's not a so parallel acoustic model adaptation for improving alphabetic language recognition |
---|
0:00:18 | um |
---|
0:00:19 | in general |
---|
0:00:20 | um phonotactic um language recognition system um that you move to complement |
---|
0:00:25 | the first one you start our recognition of one ten |
---|
0:00:28 | in which then maybe |
---|
0:00:29 | what a single phone recogniser or a little thing uh for recognising patterns |
---|
0:00:34 | we wish oh we use it for the |
---|
0:00:36 | uh for that information extraction |
---|
0:00:39 | and the second one you say |
---|
0:00:40 | and classifier |
---|
0:00:42 | that |
---|
0:00:43 | oh use the extracted |
---|
0:00:45 | oh from type information |
---|
0:00:47 | two |
---|
0:00:47 | to distinguish between target language |
---|
0:00:50 | um |
---|
0:00:51 | in politics uh language recognition |
---|
0:00:54 | the idea of feature down |
---|
0:00:56 | i would first implication |
---|
0:00:58 | is that well i don't |
---|
0:01:00 | but example include a |
---|
0:01:02 | using parallel recognise a uniform and |
---|
0:01:05 | and the second yourself |
---|
0:01:06 | using multiple high level |
---|
0:01:08 | in the phone lattice decoding |
---|
0:01:13 | two we use the speaker |
---|
0:01:14 | and um section |
---|
0:01:16 | i used to be but |
---|
0:01:17 | in the uh speech data |
---|
0:01:19 | generally i involving the telephone speech |
---|
0:01:22 | something like that |
---|
0:01:23 | um |
---|
0:01:24 | and now adaptation and |
---|
0:01:26 | speaker adaptive training S A T |
---|
0:01:28 | a parallel to to a phone lattice decoding |
---|
0:01:31 | is used |
---|
0:01:32 | has gone and it must be posted |
---|
0:01:34 | seriously |
---|
0:01:35 | so in this piece of our work |
---|
0:01:38 | um we would like to investigate |
---|
0:01:40 | different types of uh |
---|
0:01:41 | adaptation techniques |
---|
0:01:43 | and we |
---|
0:01:44 | with um |
---|
0:01:45 | quantitatively master |
---|
0:01:47 | that i was working |
---|
0:01:48 | i between two sets of |
---|
0:01:50 | phonotactic features |
---|
0:01:51 | and finally |
---|
0:01:52 | oh we investigate |
---|
0:01:54 | but the |
---|
0:01:54 | hello acoustic model adaptation |
---|
0:01:57 | can provide for the |
---|
0:01:58 | oh feature diversification |
---|
0:02:01 | and in particular the |
---|
0:02:02 | we will work on the mean only mllr the station |
---|
0:02:05 | and the variance on the and a rotation |
---|
0:02:13 | yeah slows down |
---|
0:02:14 | it struck general structure |
---|
0:02:16 | of a |
---|
0:02:17 | all three |
---|
0:02:18 | um |
---|
0:02:18 | food addict |
---|
0:02:19 | a language recognition system |
---|
0:02:23 | that you want a two component |
---|
0:02:25 | that i mentioned before |
---|
0:02:26 | the parallel phone recogniser |
---|
0:02:29 | and also the backend |
---|
0:02:30 | uh in the back and we can use a |
---|
0:02:32 | oh vectors |
---|
0:02:33 | space modelling |
---|
0:02:35 | or at the end where modelling |
---|
0:02:36 | uh you know about |
---|
0:02:38 | in our experiment we use the |
---|
0:02:39 | uh we have to model |
---|
0:02:41 | double curved space modelling |
---|
0:02:47 | i'm sorry |
---|
0:02:48 | the |
---|
0:02:48 | the reason there's some problem |
---|
0:02:50 | on it i don't know |
---|
0:02:52 | the |
---|
0:02:52 | but uh anyway |
---|
0:02:53 | oh it was so so |
---|
0:02:55 | there is a lot um a value |
---|
0:02:57 | school |
---|
0:02:58 | on the if of a yes uh model |
---|
0:03:00 | and then we would like to combine them |
---|
0:03:02 | to get up in the school that |
---|
0:03:05 | and |
---|
0:03:06 | that and in fact they say at here |
---|
0:03:08 | and the F represent different |
---|
0:03:10 | different our phone recogniser |
---|
0:03:12 | and we combine the school |
---|
0:03:15 | and also we have |
---|
0:03:16 | so i have a phone recogniser so we combine the at school |
---|
0:03:27 | and you know |
---|
0:03:28 | our work |
---|
0:03:28 | we were uh |
---|
0:03:30 | uh we should use wall |
---|
0:03:32 | all features are for |
---|
0:03:34 | i diversification |
---|
0:03:35 | using a different uh model adaptation |
---|
0:03:38 | you can see that yeah at a phone recogniser |
---|
0:03:41 | oh |
---|
0:03:42 | and for each formica and so we have to |
---|
0:03:44 | if uh |
---|
0:03:45 | mobile application |
---|
0:03:46 | so for yeah yeah |
---|
0:03:49 | um organiser and |
---|
0:03:50 | maybe is that we use that eight he was |
---|
0:03:53 | two |
---|
0:03:53 | and then they are to have well |
---|
0:03:55 | score from the reassembled |
---|
0:04:00 | and you know experiment |
---|
0:04:02 | where you try to set a |
---|
0:04:04 | the |
---|
0:04:04 | and you go to one that we that means we |
---|
0:04:07 | we use are |
---|
0:04:08 | a single form organised |
---|
0:04:10 | you know experiments |
---|
0:04:13 | but all of this and see we |
---|
0:04:15 | we were uh for the whole experiment we find that |
---|
0:04:18 | uh using the other from a fellow that |
---|
0:04:21 | uh well know that the patient |
---|
0:04:24 | yeah i can still get into |
---|
0:04:26 | when we use a paraffin recognise |
---|
0:04:38 | um to to further up to we use the speaker and the session induced variation |
---|
0:04:42 | oh we use the N R or and |
---|
0:04:45 | the um uh that |
---|
0:04:47 | uh i patient |
---|
0:04:48 | uh |
---|
0:04:49 | in in in the phone lattice decoding |
---|
0:04:52 | um |
---|
0:04:53 | the transformation can be |
---|
0:04:56 | for me data |
---|
0:04:57 | but these two impatient |
---|
0:04:58 | yeah |
---|
0:04:59 | eight B and H is the transform to be computed |
---|
0:05:03 | and the meal and uh |
---|
0:05:05 | signal |
---|
0:05:06 | is the |
---|
0:05:07 | gaussian mean ankle very informative |
---|
0:05:13 | yeah so well |
---|
0:05:15 | the different types of adaptation technique we test |
---|
0:05:18 | by the way we also test the each radius and not the patient |
---|
0:05:21 | and also |
---|
0:05:22 | oh adaptation with multiple |
---|
0:05:24 | regression classes |
---|
0:05:26 | that's how we found that not all of this uh improvement can be found at |
---|
0:05:29 | so we did a report the results |
---|
0:05:31 | in details you know people |
---|
0:05:33 | you know |
---|
0:05:39 | well that was the mobile application to class |
---|
0:05:42 | uh decoding using wap |
---|
0:05:43 | and the post process is |
---|
0:05:45 | first of all we generate a single bad so |
---|
0:05:48 | sequence |
---|
0:05:48 | and then we estimate the transform eighty and all eight |
---|
0:05:52 | and then based on the transformed but |
---|
0:05:54 | i i was the model we generate |
---|
0:05:56 | the the format |
---|
0:06:08 | up a second uh uh who's model adaptation in the test |
---|
0:06:11 | uh test data |
---|
0:06:12 | we cannot fight uh |
---|
0:06:14 | speaker adaptive training |
---|
0:06:16 | in the training data |
---|
0:06:17 | all the |
---|
0:06:18 | of the uh phone recogniser |
---|
0:06:20 | oh |
---|
0:06:22 | and |
---|
0:06:23 | in in which of that feature level and all times well |
---|
0:06:26 | is a pilot to each other |
---|
0:06:29 | um |
---|
0:06:30 | uh training utterance |
---|
0:06:31 | in a uniform recogniser |
---|
0:06:33 | and |
---|
0:06:34 | do we test |
---|
0:06:35 | our experiment |
---|
0:06:36 | oh |
---|
0:06:37 | three types of adaptation technique |
---|
0:06:40 | we have right |
---|
0:06:48 | in the U S N um vector space um although |
---|
0:06:51 | i can um |
---|
0:06:53 | the phone like this |
---|
0:06:55 | is uh on |
---|
0:06:57 | is a commercial |
---|
0:06:58 | two |
---|
0:07:00 | to to expect that and run a |
---|
0:07:02 | and he's expert you are |
---|
0:07:04 | very much |
---|
0:07:04 | um we use that and all that |
---|
0:07:07 | uh and rambled on tree |
---|
0:07:09 | and then |
---|
0:07:10 | it is converted to a high dimensional |
---|
0:07:13 | a remote that features |
---|
0:07:15 | that contains on unigram bigram line |
---|
0:07:18 | trigram forms |
---|
0:07:19 | uh for uh statistic |
---|
0:07:22 | and |
---|
0:07:23 | this |
---|
0:07:24 | the size of this L I dimensional phonotactic feature |
---|
0:07:28 | the the uh that the dimension S |
---|
0:07:31 | is determined by the name brand |
---|
0:07:33 | uh all the and |
---|
0:07:34 | and also the phone set size |
---|
0:07:36 | she |
---|
0:07:39 | after we generate uh |
---|
0:07:41 | the high dimension |
---|
0:07:42 | phonotactic feature |
---|
0:07:45 | we put it into the |
---|
0:07:47 | svm training for the S R O the reassemble inside |
---|
0:07:53 | moreover we also define |
---|
0:07:54 | the diversity pitching to to to set up |
---|
0:07:58 | between two or phonotactic feature |
---|
0:08:01 | oh |
---|
0:08:02 | using at that you uh you could be |
---|
0:08:05 | yeah idea is that |
---|
0:08:06 | um |
---|
0:08:08 | between uh |
---|
0:08:10 | that |
---|
0:08:11 | the the feature |
---|
0:08:12 | C A S E |
---|
0:08:13 | be |
---|
0:08:14 | based on their nonzero uh and bram |
---|
0:08:17 | a statistic |
---|
0:08:19 | and |
---|
0:08:20 | you are |
---|
0:08:21 | but you have to use |
---|
0:08:23 | you uh means that the set of anger and statistic |
---|
0:08:26 | which is nonzero in blue |
---|
0:08:28 | both C N C P |
---|
0:08:30 | and and you |
---|
0:08:31 | use those |
---|
0:08:32 | uh size |
---|
0:08:33 | of the set you |
---|
0:08:41 | our system |
---|
0:08:41 | has been talking about it |
---|
0:08:43 | uh using the thirty second tar |
---|
0:08:45 | in two thousand i snap and this uh language recognition evaluation |
---|
0:08:50 | you michelle fourteen target languages are involved |
---|
0:08:53 | in the detection cost |
---|
0:08:55 | um the system |
---|
0:08:56 | determine whether the |
---|
0:08:58 | target language is spoken |
---|
0:09:00 | in the speech |
---|
0:09:01 | uh huh |
---|
0:09:02 | and |
---|
0:09:03 | at least equal error rate |
---|
0:09:05 | which is |
---|
0:09:06 | calculate the from the eer of each target |
---|
0:09:09 | target language could easily ported |
---|
0:09:11 | oh we use this that i've page uh |
---|
0:09:14 | he are used to ensure that |
---|
0:09:16 | oh is |
---|
0:09:16 | target language has very |
---|
0:09:18 | has an equal contribution to the match |
---|
0:09:24 | on examination people of a single organiser |
---|
0:09:28 | is used |
---|
0:09:29 | you know one and |
---|
0:09:31 | um |
---|
0:09:32 | forty nine uh dimension mfcc feature |
---|
0:09:35 | or standard three state |
---|
0:09:37 | left to right hmm |
---|
0:09:38 | thirty two gaussian components per state is used |
---|
0:09:41 | in all acoustic model |
---|
0:09:45 | um |
---|
0:09:46 | for the training data |
---|
0:09:47 | fifteen hours of uh |
---|
0:09:49 | switchboard one set or the uh english uh data |
---|
0:09:52 | use use that to train do some recogniser |
---|
0:09:55 | and |
---|
0:09:56 | a full |
---|
0:09:57 | um on the phone loop grammar used used in the decoding |
---|
0:10:03 | of all the training data of the target languages |
---|
0:10:06 | we use the close friend |
---|
0:10:08 | ooh so uh corpora and also the training data set of uh |
---|
0:10:12 | this uh L R E zero two thousand and seven training data |
---|
0:10:18 | in those |
---|
0:10:19 | in the first experiment |
---|
0:10:21 | we compare |
---|
0:10:22 | if and if adaptation techniques |
---|
0:10:25 | and with this for uh |
---|
0:10:27 | uh what model but |
---|
0:10:28 | and these uh |
---|
0:10:31 | the |
---|
0:10:32 | yes i |
---|
0:10:32 | uh speaker independent and S A T multiple model |
---|
0:10:38 | um but so first of all |
---|
0:10:40 | oh we found that uh or adaptation techniques |
---|
0:10:42 | for white input but |
---|
0:10:44 | oh |
---|
0:10:45 | you can see that a system able we didn't do any adaptation technique |
---|
0:10:50 | and all the others |
---|
0:10:50 | we use on different kind of adaptation technique |
---|
0:10:53 | and |
---|
0:10:54 | maybe using A C T model |
---|
0:10:56 | and what S |
---|
0:10:58 | S I |
---|
0:11:00 | S I phone model |
---|
0:11:02 | yeah now adaptation and and |
---|
0:11:04 | mean only and uh |
---|
0:11:06 | adaptation performed the best |
---|
0:11:09 | and also you can find that |
---|
0:11:13 | a further improvement can be |
---|
0:11:15 | can be obtained |
---|
0:11:16 | when we use a |
---|
0:11:17 | yeah i say to you for model |
---|
0:11:24 | secondly are we test whether |
---|
0:11:26 | um to phonotactic system with different types of |
---|
0:11:30 | add that to uh also more uh uh that that that |
---|
0:11:33 | model |
---|
0:11:34 | provide complementary information to the uh to each other |
---|
0:11:38 | and better |
---|
0:11:38 | the corresponding system user |
---|
0:11:41 | um cookbook for white a further system uh |
---|
0:11:44 | input what |
---|
0:11:46 | by considering |
---|
0:11:47 | oh |
---|
0:11:48 | curacy whistle at the table that eight |
---|
0:11:50 | sis |
---|
0:11:51 | phonotactic system |
---|
0:11:52 | uh we can combine them |
---|
0:11:54 | oh |
---|
0:11:55 | can can generate twenty eight possible uh to assist on a user |
---|
0:12:00 | and then we plot |
---|
0:12:02 | yeah their corresponding |
---|
0:12:03 | average uh |
---|
0:12:05 | featured a varsity |
---|
0:12:06 | and also that |
---|
0:12:07 | oh be out in the fields the system |
---|
0:12:09 | and you can find that |
---|
0:12:13 | that's a |
---|
0:12:15 | you can also that |
---|
0:12:16 | system using mean only |
---|
0:12:19 | mean only adaptation and bayesian adaptation |
---|
0:12:22 | i i my |
---|
0:12:23 | here |
---|
0:12:24 | both of them |
---|
0:12:25 | um they can provide relatively higher uh |
---|
0:12:28 | oh |
---|
0:12:29 | diversity |
---|
0:12:30 | and also |
---|
0:12:31 | you can see the trend all over all |
---|
0:12:33 | twenty eight possible combination |
---|
0:12:36 | you can see that when you are |
---|
0:12:37 | uh when you update |
---|
0:12:39 | oh higher |
---|
0:12:40 | oh |
---|
0:12:41 | feature directly and then you can all take |
---|
0:12:44 | uh |
---|
0:12:44 | low uh yeah |
---|
0:12:53 | you know the last experiment |
---|
0:12:54 | or refuse to a system using mean only and that the only adaptation |
---|
0:12:59 | that need |
---|
0:13:00 | system at a cheaper so |
---|
0:13:02 | eighty and eighty four and B |
---|
0:13:04 | she too and petri |
---|
0:13:07 | you can see the result |
---|
0:13:08 | you need only |
---|
0:13:10 | here |
---|
0:13:10 | and then the fusion result |
---|
0:13:18 | and we also that just use a lot so |
---|
0:13:21 | system with |
---|
0:13:24 | uh can provide uh obvious improvement |
---|
0:13:26 | for example |
---|
0:13:27 | uh when |
---|
0:13:28 | when aside model use use |
---|
0:13:30 | and |
---|
0:13:31 | a tree and a four is used |
---|
0:13:34 | it can all hold form |
---|
0:13:36 | to the system be one which are |
---|
0:13:38 | S A T model is used |
---|
0:13:41 | and also |
---|
0:13:42 | when we use A S A T model |
---|
0:13:45 | um p2p plus P V |
---|
0:13:47 | we can provide |
---|
0:13:48 | a four door |
---|
0:13:50 | um improvement |
---|
0:13:52 | and you know vol |
---|
0:13:53 | when you compare |
---|
0:13:54 | this result using S A P model |
---|
0:13:57 | and comparing with uh a one before any |
---|
0:14:00 | uh adaptation techniques |
---|
0:14:02 | we can provide overall uh around forty percent relative improvement |
---|
0:14:12 | one two seven |
---|
0:14:13 | uh we have studied |
---|
0:14:14 | a different types of C and uh and and uh adaptation techniques |
---|
0:14:19 | for the phonotactic language |
---|
0:14:20 | recognition |
---|
0:14:22 | oh yeah yeah that's true |
---|
0:14:23 | uh illustrate |
---|
0:14:25 | oh yeah that a mistake model adaptation |
---|
0:14:28 | and we found that |
---|
0:14:30 | um and then only and no adaptation which polite and uh the phonotactic feature |
---|
0:14:35 | so i |
---|
0:14:35 | can provide a complementary information to the one using |
---|
0:14:39 | mean only mllr |
---|
0:14:40 | cation |
---|
0:14:41 | and our ongoing work include |
---|
0:14:43 | uh |
---|
0:14:44 | to see the interaction with a recogniser fun and |
---|
0:14:48 | and also we we investigate more sophisticated |
---|
0:14:51 | adaptation technique |
---|
0:14:54 | and that's all all all my temptation |
---|
0:14:56 | fig |
---|
0:15:03 | let's see |
---|
0:15:11 | you could use |
---|
0:15:13 | hmmm |
---|
0:15:16 | you mean for a second all test data |
---|
0:15:18 | yeah |
---|
0:15:19 | yes |
---|
0:15:19 | text |
---|
0:15:20 | we used |
---|
0:15:23 | hmmm |
---|
0:15:25 | fig no i didn't do it |
---|
0:15:26 | but uh |
---|
0:15:27 | in a room |
---|
0:15:28 | where motif on the first exactly |
---|
0:15:30 | you know |
---|
0:15:31 | but that that would be a problem if you we test it on the feedback and all kinds i control |
---|
0:15:37 | yeah but is likely to be no |
---|
0:15:41 | and in this movie also |
---|
0:15:42 | sure |
---|
0:15:43 | to think about this |
---|
0:15:44 | moving paul |
---|
0:15:45 | so that i i thought about the most of you |
---|
0:15:47 | data that |
---|
0:15:49 | yeah |
---|
0:15:50 | you see this |
---|
0:15:54 | no |
---|
0:15:55 | hmmm |
---|
0:15:57 | with extreme |
---|
0:16:00 | hmmm |
---|
0:16:01 | yeah yeah sure sure |
---|
0:16:02 | mixture |
---|
0:16:02 | sure |
---|
0:16:03 | sure |
---|
0:16:04 | exactly |
---|
0:16:05 | yeah but i |
---|
0:16:05 | it is in this moment |
---|
0:16:07 | in all |
---|
0:16:08 | in the very study we found that even using the simple most convenient |
---|
0:16:12 | uh |
---|
0:16:13 | commas a new method we can still get some improvement |
---|
0:16:15 | but school of course you are right |
---|
0:16:17 | we can do some more in public uh interpolation we have some |
---|
0:16:21 | some uh |
---|
0:16:22 | like that |
---|
0:16:23 | a universal |
---|
0:16:25 | adaptation trans |
---|
0:16:31 | is one |
---|
0:16:33 | yeah |
---|
0:16:34 | and your your |
---|
0:16:36 | you |
---|
0:16:39 | sorry |
---|
0:16:40 | you |
---|
0:16:40 | hmmm |
---|
0:16:41 | hmmm |
---|
0:16:42 | i |
---|
0:16:42 | oh |
---|
0:16:46 | you you mean i using |
---|
0:16:48 | from a practical |
---|
0:16:49 | acoustic or |
---|
0:16:50 | well as well |
---|
0:16:52 | uh |
---|
0:16:52 | to |
---|
0:16:55 | yeah |
---|
0:16:56 | hmmm |
---|
0:16:58 | oh oh you mean a and five test |
---|
0:17:00 | diffusion with a |
---|
0:17:02 | system no i didn't |
---|
0:17:05 | yes |
---|
0:17:08 | hmmm |
---|
0:17:11 | yeah sure sure sure |
---|
0:17:13 | but i didn't make a number so that depends on it |
---|
0:17:16 | yeah |
---|
0:17:22 | questions |
---|
0:17:29 | okay |
---|