0:00:18X
0:00:21we
0:00:21yeah
0:00:22first you want
0:00:23depart
0:00:24like
0:00:24it an engineering
0:00:28great
0:00:29uh my presentation days and i
0:00:31you or improving the row
0:00:34section
0:00:35the
0:00:36uh
0:00:38the next day
0:00:40and uh
0:00:41a component note that you
0:00:44the big for line of my speech to the first we start by presenting the rover system
0:00:49uh then outlining the our proposed approach to improve the the system
0:00:54followed by some experiment results and then
0:00:57you with the somebody and future direction
0:01:01so the motivation of our work is uh
0:01:03do you to the way you could just the
0:01:05use of large vocabulary uh continuous speech recognition system
0:01:09and uh but the abundance of application using this type of a U K of systems
0:01:14the is an three requirement for high to see robustness
0:01:18and the my
0:01:19speech the code is
0:01:21so some of the common solutions to these uh problems are
0:01:25enhancing the feature extraction we
0:01:29we can also combine
0:01:31the uh
0:01:32speech feature
0:01:33the from and
0:01:34you
0:01:35we can also be my
0:01:37two
0:01:38and then
0:01:39yeah
0:01:40that is
0:01:41as are combined
0:01:43we don't know
0:01:44on this
0:01:45solutions
0:01:48so the or a combination
0:01:50uh
0:01:51means that uh a some of the common i mean a
0:01:55to and this type of a approach is out
0:01:58the recogniser output voting error reduction the rover system
0:02:02a there is also the confusion network combination issue
0:02:05and a minimum time frame framework error rate
0:02:08the idea is to combine
0:02:09different output coming from different the speech decode there's into one single composite output
0:02:15that
0:02:16hopefully will lead to and the that use the word error rate
0:02:21so we're gonna be focusing on the
0:02:23rule or
0:02:24we're trying to improve the this over system
0:02:27or different the rover a what has been divided by john and first this and nineteen ninety seven by uh
0:02:32within nist
0:02:34and the goal is to produce a scope was the asr to use the word error rate
0:02:38this technique is now a uh uh
0:02:41no as a is it's a baseline technique and if knew to the combination technique of the code there's
0:02:47is compared
0:02:48to uh to the rover technique
0:02:53so the process of for is it's a two sticks process
0:02:56first it starts of by creating a composite
0:02:59word transition network from
0:03:01the different a speech decoders
0:03:04then this network is brought was by uh of voting algorithm
0:03:09that's rice to select the best the
0:03:12a word at each slot and the uh a word transition network
0:03:16to do this that are i mean the people but has presented the back and night
0:03:20seven
0:03:21i see that the three voting uh scheme
0:03:24first of all you
0:03:25one of them uh uses the only the frequency of a is that each
0:03:29slot of the network
0:03:30and the two other
0:03:32the work of
0:03:36so basically this is the main the court like equation of the rover system
0:03:41uh we have a
0:03:43oh
0:03:44work
0:03:45i
0:03:46we have
0:03:48the ah
0:03:48so that more
0:03:49that's low
0:03:50uh
0:03:52so i
0:03:53they
0:03:54that uh we you
0:03:56uh
0:03:57but
0:03:58use
0:03:59one
0:04:00oh
0:04:01or
0:04:05there are
0:04:06if you for the voting because
0:04:08we don't need to worry about these for now
0:04:10the some of the shortcomings of the rover system is that this scoring mechanism the voting mechanism
0:04:16uh only works if the different the headers
0:04:19that that are coming from each decode that are different from each other
0:04:23otherwise the if we even we compile them together that is no
0:04:26being we don't gain anything because we will end up with the same i don't
0:04:30and the also that i the that the combination of these of the transition network
0:04:35it doesn't get an T the optimal
0:04:37a a result
0:04:38so if you combine to put that a and B
0:04:41the result is different from the N
0:04:44this technique is also or never be but wonder
0:04:47two of the voting mechanism are using the confidence value
0:04:50which is i mean uh are still not a reliable and the speech recognition and yeah
0:04:55uh also we a more than one best the sequence
0:04:59uh from each recognizer
0:05:01and we we it's and they but to i'll for the error on S asr
0:05:05systems one only one sink asr outputs the codec
0:05:09sequence of words
0:05:11so i what i mean some and all that i mean
0:05:13or uh several uh works has a and had been done
0:05:17uh a to try to fact these uh uh these problems
0:05:20especially the use of machine learning uh techniques
0:05:23and the voting mechanism
0:05:25uh but still i mean the performance of the system as a each of that to and
0:05:30is very difficult to
0:05:31uh uh to use a of the word at or eight
0:05:34so our our proposed approach here is to
0:05:37try tried to
0:05:39and reject the context sure well context what analysis
0:05:42B for the voting mechanism to try to feed that out and remove move the at ours
0:05:47from the composite word transition network
0:05:49before applying the voting make is
0:05:53oh with a with a wide but in the composite network then we do move the error and then we
0:05:58apply the usual voting uh
0:06:01oh also over
0:06:04that's start by presenting the at our detection a technique first so we have a five few terms than a
0:06:09but would of a word is the set of words
0:06:11the left and right context
0:06:13the P in my of the work is nothing else that a the of the probability of these were happening
0:06:18together
0:06:19divided by the product
0:06:21uh
0:06:22really
0:06:23a probably we can get that from
0:06:25a a a a large corpus
0:06:26the number of a as of the word i divide by the number
0:06:29then once we have a the in my in my four point
0:06:32uh
0:06:33why it information
0:06:35once we have the my we can
0:06:37we the same and the coherence of values
0:06:40and was uh
0:06:41how to money mean from the M my i i mean like
0:06:45or each
0:06:47yeah i but it some of this error classify error detection as i is as follows
0:06:51so given a sentence as i got five or
0:06:53the the neighborhood
0:06:55then we we be he's in my scores
0:06:57for uh all of the pair of words and that's sentence
0:07:01then we use the segment X score as we showed before using the either that harmonic mean maximum or summation
0:07:08uh
0:07:09once we could be same ending score as we create that we define the a of all these stores
0:07:14and then how we plan that one uh the uh what is an ad
0:07:18if the same and text for of that word is less
0:07:22uh uh i be but but by this average that mean that are
0:07:26that what is an adder or otherwise it will be dark as a correct uh output
0:07:32so the second part of that approach is and to get eating this thing within the rover process
0:07:37so we have a
0:07:38that
0:07:39by a think the word transition network
0:07:41so this is the work of the network
0:07:44we have the four
0:07:46see
0:07:46the second one
0:07:49and one
0:07:51so we want on this network we you to do
0:07:55and
0:07:56the more yeah
0:07:57what we use more than one at a classifier
0:08:00then
0:08:01and
0:08:02we
0:08:03yeah
0:08:04and
0:08:05oh
0:08:06of the net
0:08:07this one
0:08:08then
0:08:09my
0:08:10oh
0:08:10oh
0:08:11i we could work position network
0:08:15so that i could them or its as follows
0:08:17we can that that with it more than one
0:08:19no
0:08:20of the network
0:08:21we have my and a plastic five to remove the word
0:08:25and they're of them by the null transition
0:08:27and then we apply the voting algorithm
0:08:31so some experimental results
0:08:32uh the kind of a frame we had to use the the E also be nine the latest for
0:08:37uh uh speech recorded from your
0:08:40and then there C and you uh open source sphinx four or java bayes the mean
0:08:45uh a speech coder
0:08:47we used the have for the thing more
0:08:49uh we try to a a a a a to in three type of for decoder is so V nine
0:08:54with this language model
0:08:56and then sphinx four with two different language model
0:09:00that yeah my counts as you a but we before we had some probability so we had to use
0:09:04the huge but was we use the would one really and
0:09:07uh where the open corpus
0:09:09we use the seventeen million unigram and three hundred four
0:09:13many by guns
0:09:14to get those frequencies
0:09:16uh
0:09:18the measure you the word error rate
0:09:20the number of deletions the fusion
0:09:23i search and divided by the number of
0:09:25and words that have been out of by the recognizer
0:09:28precision and
0:09:29one one was it
0:09:31is for a negative for for that
0:09:32i a to the harmonic mean of precision and recall
0:09:36and then the naked uh
0:09:38but that that
0:09:38where they
0:09:39and the or that the the the fine
0:09:43maybe the value of
0:09:44predictive value
0:09:47so let's first uh show the uh a a says i mean that uh i would ever five before we
0:09:52had to get it within the rover this thing
0:09:54you we have not a measure
0:09:56or that's that's if i
0:09:58and we have not that there is a a be the threshold
0:10:01this is i mean the the the the threshold
0:10:04we uh
0:10:06but and is it's that they are how i get a that is the filtering of the ad
0:10:10uh we also a lot of the different time for the i-th deviation of the P M i for all
0:10:15the same and uh
0:10:17uh stored
0:10:18and i know this here that most of the aggregation you the same
0:10:23i
0:10:24or of the same
0:10:26i
0:10:26the when the change
0:10:28some of them
0:10:31better results
0:10:35the project rate
0:10:37so
0:10:38a
0:10:38so we the next day
0:10:42yeah uh
0:10:43we are
0:10:45what
0:10:45and uh
0:10:47again
0:10:48because we are tackling what export as a
0:10:51uh incorrect words
0:10:52so now that the C of our uh assessment we have done to experiments we have applied to at or
0:10:57detect on
0:10:59uh all the words and then
0:11:01on uh uh all the words but the stop words we move the stop words
0:11:05i i would explain why later on
0:11:07so a to uh we go there we have pretty settings
0:11:11so we and we have a report experiments for the
0:11:14or
0:11:16well my engine
0:11:17so we can see that we you know a to one point five percent
0:11:21but
0:11:22a at a rate
0:11:24uh we know that when we might my at are that's i one
0:11:28uh
0:11:29oh or
0:11:31because
0:11:32by the definition of a stop words
0:11:34that's a word that lacks i mean segment that meaning
0:11:36and then i and if you see that's the form uh for the pay in my
0:11:40it try to to see i mean
0:11:42each that what is an outlier
0:11:44but uh use you stop word it's very difficult to to see whether it's an outlier in the sentence or
0:11:49not
0:11:50all
0:11:52oh
0:11:53that's
0:11:54no for this C D uh D code or a complete H
0:11:58uh
0:12:00that
0:12:03the
0:12:04the one
0:12:05i
0:12:06why
0:12:07so what we have you have a
0:12:09oh
0:12:10a that we only have a at a classifier
0:12:13E
0:12:14uh
0:12:16i mean we don't a fine
0:12:18two of the asr
0:12:20uh output
0:12:21i i on so on a my when the video
0:12:26when we do that we see that
0:12:28get
0:12:30so
0:12:30what
0:12:31a
0:12:35or somebody we have the proposed in this paper or of an approach to improve the rovers and we
0:12:41oh X you are over
0:12:43a we have one
0:12:45a
0:12:46a a context what analysis
0:12:47the yeah the use of the and a classifier and now that
0:12:51i
0:12:52at are classifier
0:12:53and we have a have to one point five percent and what
0:12:57at a rate reduction
0:12:59future that action
0:13:00we can use uh uh uh all uh
0:13:02i i know at our classifier
0:13:04like the and that's i the latent let semantic and X and had a classifier
0:13:09a a we can also oh my that's fine to compensate for the low decision right
0:13:16i
0:13:18we can find
0:13:19i
0:13:21using a
0:13:23i
0:13:23we have
0:13:25again
0:13:26additional complexity
0:13:27of the C of a and the scalability ability of this system
0:13:31you
0:13:35you you know questions
0:13:37this
0:13:40yeah these
0:13:43so you you've presentation um
0:13:45but you try to be computed scores on the words
0:13:49yeah that of that it's a good question i mean uh and this paper we only use the one of
0:13:54the voting uh scheme the frequency once you one
0:13:57because most of the confidence value if not all of them from
0:14:01to
0:14:02a speech to decoder
0:14:03uh a or B use list they are a of it this value
0:14:07so we like we don't use of them so
0:14:09when you have a sentence all the words have a confidence value of one
0:14:13so it's basically use we cannot not use and see the impact of using the confidence value
0:14:19uh uh with this thing
0:14:22using this
0:14:23but i
0:14:24but uh uh i mean this supply this uh approach can be applied to
0:14:29what do we are talking about was the voting mechanism but we not touching that that part where trying to
0:14:34to move errors and then
0:14:36go back to the original rover
0:14:38so it doesn't affect what
0:14:46yes
0:14:47it does it it provides a means three voting with ten is in you can choose which are you like
0:14:51so now our
0:14:52we don't have a good confidence measure
0:14:54we don't use of we use the first
0:15:00any questions
0:15:04you questions
0:15:05well maybe can you just the you please
0:15:07read a home and on the computation complexity of the C will assess system
0:15:13yeah i mean uh for the C rover because we have to check her mean uh when we and do
0:15:18this
0:15:19error detection
0:15:20a classifier
0:15:21what will happen i mean there or do we need a me more computation however
0:15:25how was
0:15:26see
0:15:27i i i think i mean we have to have a huge corpus
0:15:30to be able to extract those probability
0:15:33how how does this a fact in terms of mean T
0:15:36in terms of P U power
0:15:37we have to one to give that
0:15:42measures
0:15:45he uh huh
0:15:46you no questions also
0:15:54oh
0:16:03i i of so yeah we still we can still get but better result because
0:16:07what we are we actually
0:16:08is are going to have the voting make a by tomorrow
0:16:11at
0:16:12so if we the error
0:16:13and we have coffee
0:16:14they are
0:16:15then that voting mechanism
0:16:17we we
0:16:18for sure i
0:16:19but
0:16:20yeah
0:16:24P if there is no one the question is too expensive speaker