0:00:06you have to do
0:00:07um that well yeah i eyepiece entity
0:00:10it's not a so parallel acoustic model adaptation for improving alphabetic language recognition
0:00:18um
0:00:19in general
0:00:20um phonotactic um language recognition system um that you move to complement
0:00:25the first one you start our recognition of one ten
0:00:28in which then maybe
0:00:29what a single phone recogniser or a little thing uh for recognising patterns
0:00:34we wish oh we use it for the
0:00:36uh for that information extraction
0:00:39and the second one you say
0:00:40and classifier
0:00:42that
0:00:43oh use the extracted
0:00:45oh from type information
0:00:47two
0:00:47to distinguish between target language
0:00:50um
0:00:51in politics uh language recognition
0:00:54the idea of feature down
0:00:56i would first implication
0:00:58is that well i don't
0:01:00but example include a
0:01:02using parallel recognise a uniform and
0:01:05and the second yourself
0:01:06using multiple high level
0:01:08in the phone lattice decoding
0:01:13two we use the speaker
0:01:14and um section
0:01:16i used to be but
0:01:17in the uh speech data
0:01:19generally i involving the telephone speech
0:01:22something like that
0:01:23um
0:01:24and now adaptation and
0:01:26speaker adaptive training S A T
0:01:28a parallel to to a phone lattice decoding
0:01:31is used
0:01:32has gone and it must be posted
0:01:34seriously
0:01:35so in this piece of our work
0:01:38um we would like to investigate
0:01:40different types of uh
0:01:41adaptation techniques
0:01:43and we
0:01:44with um
0:01:45quantitatively master
0:01:47that i was working
0:01:48i between two sets of
0:01:50phonotactic features
0:01:51and finally
0:01:52oh we investigate
0:01:54but the
0:01:54hello acoustic model adaptation
0:01:57can provide for the
0:01:58oh feature diversification
0:02:01and in particular the
0:02:02we will work on the mean only mllr the station
0:02:05and the variance on the and a rotation
0:02:13yeah slows down
0:02:14it struck general structure
0:02:16of a
0:02:17all three
0:02:18um
0:02:18food addict
0:02:19a language recognition system
0:02:23that you want a two component
0:02:25that i mentioned before
0:02:26the parallel phone recogniser
0:02:29and also the backend
0:02:30uh in the back and we can use a
0:02:32oh vectors
0:02:33space modelling
0:02:35or at the end where modelling
0:02:36uh you know about
0:02:38in our experiment we use the
0:02:39uh we have to model
0:02:41double curved space modelling
0:02:47i'm sorry
0:02:48the
0:02:48the reason there's some problem
0:02:50on it i don't know
0:02:52the
0:02:52but uh anyway
0:02:53oh it was so so
0:02:55there is a lot um a value
0:02:57school
0:02:58on the if of a yes uh model
0:03:00and then we would like to combine them
0:03:02to get up in the school that
0:03:05and
0:03:06that and in fact they say at here
0:03:08and the F represent different
0:03:10different our phone recogniser
0:03:12and we combine the school
0:03:15and also we have
0:03:16so i have a phone recogniser so we combine the at school
0:03:27and you know
0:03:28our work
0:03:28we were uh
0:03:30uh we should use wall
0:03:32all features are for
0:03:34i diversification
0:03:35using a different uh model adaptation
0:03:38you can see that yeah at a phone recogniser
0:03:41oh
0:03:42and for each formica and so we have to
0:03:44if uh
0:03:45mobile application
0:03:46so for yeah yeah
0:03:49um organiser and
0:03:50maybe is that we use that eight he was
0:03:53two
0:03:53and then they are to have well
0:03:55score from the reassembled
0:04:00and you know experiment
0:04:02where you try to set a
0:04:04the
0:04:04and you go to one that we that means we
0:04:07we use are
0:04:08a single form organised
0:04:10you know experiments
0:04:13but all of this and see we
0:04:15we were uh for the whole experiment we find that
0:04:18uh using the other from a fellow that
0:04:21uh well know that the patient
0:04:24yeah i can still get into
0:04:26when we use a paraffin recognise
0:04:38um to to further up to we use the speaker and the session induced variation
0:04:42oh we use the N R or and
0:04:45the um uh that
0:04:47uh i patient
0:04:48uh
0:04:49in in in the phone lattice decoding
0:04:52um
0:04:53the transformation can be
0:04:56for me data
0:04:57but these two impatient
0:04:58yeah
0:04:59eight B and H is the transform to be computed
0:05:03and the meal and uh
0:05:05signal
0:05:06is the
0:05:07gaussian mean ankle very informative
0:05:13yeah so well
0:05:15the different types of adaptation technique we test
0:05:18by the way we also test the each radius and not the patient
0:05:21and also
0:05:22oh adaptation with multiple
0:05:24regression classes
0:05:26that's how we found that not all of this uh improvement can be found at
0:05:29so we did a report the results
0:05:31in details you know people
0:05:33you know
0:05:39well that was the mobile application to class
0:05:42uh decoding using wap
0:05:43and the post process is
0:05:45first of all we generate a single bad so
0:05:48sequence
0:05:48and then we estimate the transform eighty and all eight
0:05:52and then based on the transformed but
0:05:54i i was the model we generate
0:05:56the the format
0:06:08up a second uh uh who's model adaptation in the test
0:06:11uh test data
0:06:12we cannot fight uh
0:06:14speaker adaptive training
0:06:16in the training data
0:06:17all the
0:06:18of the uh phone recogniser
0:06:20oh
0:06:22and
0:06:23in in which of that feature level and all times well
0:06:26is a pilot to each other
0:06:29um
0:06:30uh training utterance
0:06:31in a uniform recogniser
0:06:33and
0:06:34do we test
0:06:35our experiment
0:06:36oh
0:06:37three types of adaptation technique
0:06:40we have right
0:06:48in the U S N um vector space um although
0:06:51i can um
0:06:53the phone like this
0:06:55is uh on
0:06:57is a commercial
0:06:58two
0:07:00to to expect that and run a
0:07:02and he's expert you are
0:07:04very much
0:07:04um we use that and all that
0:07:07uh and rambled on tree
0:07:09and then
0:07:10it is converted to a high dimensional
0:07:13a remote that features
0:07:15that contains on unigram bigram line
0:07:18trigram forms
0:07:19uh for uh statistic
0:07:22and
0:07:23this
0:07:24the size of this L I dimensional phonotactic feature
0:07:28the the uh that the dimension S
0:07:31is determined by the name brand
0:07:33uh all the and
0:07:34and also the phone set size
0:07:36she
0:07:39after we generate uh
0:07:41the high dimension
0:07:42phonotactic feature
0:07:45we put it into the
0:07:47svm training for the S R O the reassemble inside
0:07:53moreover we also define
0:07:54the diversity pitching to to to set up
0:07:58between two or phonotactic feature
0:08:01oh
0:08:02using at that you uh you could be
0:08:05yeah idea is that
0:08:06um
0:08:08between uh
0:08:10that
0:08:11the the feature
0:08:12C A S E
0:08:13be
0:08:14based on their nonzero uh and bram
0:08:17a statistic
0:08:19and
0:08:20you are
0:08:21but you have to use
0:08:23you uh means that the set of anger and statistic
0:08:26which is nonzero in blue
0:08:28both C N C P
0:08:30and and you
0:08:31use those
0:08:32uh size
0:08:33of the set you
0:08:41our system
0:08:41has been talking about it
0:08:43uh using the thirty second tar
0:08:45in two thousand i snap and this uh language recognition evaluation
0:08:50you michelle fourteen target languages are involved
0:08:53in the detection cost
0:08:55um the system
0:08:56determine whether the
0:08:58target language is spoken
0:09:00in the speech
0:09:01uh huh
0:09:02and
0:09:03at least equal error rate
0:09:05which is
0:09:06calculate the from the eer of each target
0:09:09target language could easily ported
0:09:11oh we use this that i've page uh
0:09:14he are used to ensure that
0:09:16oh is
0:09:16target language has very
0:09:18has an equal contribution to the match
0:09:24on examination people of a single organiser
0:09:28is used
0:09:29you know one and
0:09:31um
0:09:32forty nine uh dimension mfcc feature
0:09:35or standard three state
0:09:37left to right hmm
0:09:38thirty two gaussian components per state is used
0:09:41in all acoustic model
0:09:45um
0:09:46for the training data
0:09:47fifteen hours of uh
0:09:49switchboard one set or the uh english uh data
0:09:52use use that to train do some recogniser
0:09:55and
0:09:56a full
0:09:57um on the phone loop grammar used used in the decoding
0:10:03of all the training data of the target languages
0:10:06we use the close friend
0:10:08ooh so uh corpora and also the training data set of uh
0:10:12this uh L R E zero two thousand and seven training data
0:10:18in those
0:10:19in the first experiment
0:10:21we compare
0:10:22if and if adaptation techniques
0:10:25and with this for uh
0:10:27uh what model but
0:10:28and these uh
0:10:31the
0:10:32yes i
0:10:32uh speaker independent and S A T multiple model
0:10:38um but so first of all
0:10:40oh we found that uh or adaptation techniques
0:10:42for white input but
0:10:44oh
0:10:45you can see that a system able we didn't do any adaptation technique
0:10:50and all the others
0:10:50we use on different kind of adaptation technique
0:10:53and
0:10:54maybe using A C T model
0:10:56and what S
0:10:58S I
0:11:00S I phone model
0:11:02yeah now adaptation and and
0:11:04mean only and uh
0:11:06adaptation performed the best
0:11:09and also you can find that
0:11:13a further improvement can be
0:11:15can be obtained
0:11:16when we use a
0:11:17yeah i say to you for model
0:11:24secondly are we test whether
0:11:26um to phonotactic system with different types of
0:11:30add that to uh also more uh uh that that that
0:11:33model
0:11:34provide complementary information to the uh to each other
0:11:38and better
0:11:38the corresponding system user
0:11:41um cookbook for white a further system uh
0:11:44input what
0:11:46by considering
0:11:47oh
0:11:48curacy whistle at the table that eight
0:11:50sis
0:11:51phonotactic system
0:11:52uh we can combine them
0:11:54oh
0:11:55can can generate twenty eight possible uh to assist on a user
0:12:00and then we plot
0:12:02yeah their corresponding
0:12:03average uh
0:12:05featured a varsity
0:12:06and also that
0:12:07oh be out in the fields the system
0:12:09and you can find that
0:12:13that's a
0:12:15you can also that
0:12:16system using mean only
0:12:19mean only adaptation and bayesian adaptation
0:12:22i i my
0:12:23here
0:12:24both of them
0:12:25um they can provide relatively higher uh
0:12:28oh
0:12:29diversity
0:12:30and also
0:12:31you can see the trend all over all
0:12:33twenty eight possible combination
0:12:36you can see that when you are
0:12:37uh when you update
0:12:39oh higher
0:12:40oh
0:12:41feature directly and then you can all take
0:12:44uh
0:12:44low uh yeah
0:12:53you know the last experiment
0:12:54or refuse to a system using mean only and that the only adaptation
0:12:59that need
0:13:00system at a cheaper so
0:13:02eighty and eighty four and B
0:13:04she too and petri
0:13:07you can see the result
0:13:08you need only
0:13:10here
0:13:10and then the fusion result
0:13:18and we also that just use a lot so
0:13:21system with
0:13:24uh can provide uh obvious improvement
0:13:26for example
0:13:27uh when
0:13:28when aside model use use
0:13:30and
0:13:31a tree and a four is used
0:13:34it can all hold form
0:13:36to the system be one which are
0:13:38S A T model is used
0:13:41and also
0:13:42when we use A S A T model
0:13:45um p2p plus P V
0:13:47we can provide
0:13:48a four door
0:13:50um improvement
0:13:52and you know vol
0:13:53when you compare
0:13:54this result using S A P model
0:13:57and comparing with uh a one before any
0:14:00uh adaptation techniques
0:14:02we can provide overall uh around forty percent relative improvement
0:14:12one two seven
0:14:13uh we have studied
0:14:14a different types of C and uh and and uh adaptation techniques
0:14:19for the phonotactic language
0:14:20recognition
0:14:22oh yeah yeah that's true
0:14:23uh illustrate
0:14:25oh yeah that a mistake model adaptation
0:14:28and we found that
0:14:30um and then only and no adaptation which polite and uh the phonotactic feature
0:14:35so i
0:14:35can provide a complementary information to the one using
0:14:39mean only mllr
0:14:40cation
0:14:41and our ongoing work include
0:14:43uh
0:14:44to see the interaction with a recogniser fun and
0:14:48and also we we investigate more sophisticated
0:14:51adaptation technique
0:14:54and that's all all all my temptation
0:14:56fig
0:15:03let's see
0:15:11you could use
0:15:13hmmm
0:15:16you mean for a second all test data
0:15:18yeah
0:15:19yes
0:15:19text
0:15:20we used
0:15:23hmmm
0:15:25fig no i didn't do it
0:15:26but uh
0:15:27in a room
0:15:28where motif on the first exactly
0:15:30you know
0:15:31but that that would be a problem if you we test it on the feedback and all kinds i control
0:15:37yeah but is likely to be no
0:15:41and in this movie also
0:15:42sure
0:15:43to think about this
0:15:44moving paul
0:15:45so that i i thought about the most of you
0:15:47data that
0:15:49yeah
0:15:50you see this
0:15:54no
0:15:55hmmm
0:15:57with extreme
0:16:00hmmm
0:16:01yeah yeah sure sure
0:16:02mixture
0:16:02sure
0:16:03sure
0:16:04exactly
0:16:05yeah but i
0:16:05it is in this moment
0:16:07in all
0:16:08in the very study we found that even using the simple most convenient
0:16:12uh
0:16:13commas a new method we can still get some improvement
0:16:15but school of course you are right
0:16:17we can do some more in public uh interpolation we have some
0:16:21some uh
0:16:22like that
0:16:23a universal
0:16:25adaptation trans
0:16:31is one
0:16:33yeah
0:16:34and your your
0:16:36you
0:16:39sorry
0:16:40you
0:16:40hmmm
0:16:41hmmm
0:16:42i
0:16:42oh
0:16:46you you mean i using
0:16:48from a practical
0:16:49acoustic or
0:16:50well as well
0:16:52uh
0:16:52to
0:16:55yeah
0:16:56hmmm
0:16:58oh oh you mean a and five test
0:17:00diffusion with a
0:17:02system no i didn't
0:17:05yes
0:17:08hmmm
0:17:11yeah sure sure sure
0:17:13but i didn't make a number so that depends on it
0:17:16yeah
0:17:22questions
0:17:29okay