0:00:06okay i would start
0:00:09my name is
0:00:09my skin colour and i will present
0:00:11uh
0:00:12my work that was done to delete my code for
0:00:15the typical but
0:00:16on taken back problematic outcry
0:00:19we would get then
0:00:20and did not ski
0:00:22uh the
0:00:23this
0:00:24presentation will be about
0:00:25feature extraction for phonotactic language recognition
0:00:29it should be done
0:00:30by
0:00:30pca
0:00:33and
0:00:34this is the overview of the whole cable first
0:00:37pick a little bit
0:00:37about motivation of this work
0:00:39by
0:00:40we want to do it
0:00:42and
0:00:43then i will describe
0:00:44uh the results on the nist
0:00:46uh
0:00:47uh language recognition
0:00:49evolution that was mine
0:00:56so basically for the introduction
0:00:59oh
0:01:00if we want to
0:01:01uh recognise languages
0:01:03um by phonotactic model C
0:01:05basically can
0:01:06i do we use uh anger models like language models very
0:01:09compute
0:01:10likelihood
0:01:11oh well
0:01:12sometimes there's a given
0:01:13specific uninterrupted
0:01:15models of
0:01:16languages
0:01:17uh or we can actually tried to use
0:01:20discriminative models like
0:01:21as the M based models
0:01:23that are
0:01:24usually performing better
0:01:25and this is
0:01:26what we will
0:01:27talking about
0:01:28this presentation
0:01:31usually
0:01:32uh for this as the N models
0:01:34a linear kernel and soft margin are used
0:01:38means
0:01:39basically that
0:01:40uh we L O somehow flyer
0:01:44uh so the problem with
0:01:46with uh this as the M approach is that
0:01:48we need to really
0:01:49very large
0:01:50feature vectors
0:01:51if we use uh let's say trigrams for
0:01:53five
0:01:54uh foreground
0:01:56uh
0:01:56going for higher are orders it's
0:01:59computationally
0:02:00uh yeah
0:02:01almost impossible because
0:02:03uh the growth of the
0:02:04uh of the
0:02:05uh features so feature set
0:02:07uh
0:02:08like
0:02:08financial
0:02:10uh i like his work
0:02:11here
0:02:12slide
0:02:13and
0:02:13we can easily compute that for some
0:02:15second all use like
0:02:17if the
0:02:18set of the phonemes
0:02:19large like
0:02:20it
0:02:20phonemes
0:02:21it will be using four grams then
0:02:23easily they can deal with much more than
0:02:26million of possible features of course
0:02:28always all these features
0:02:30um
0:02:31are
0:02:31present
0:02:32in the data about
0:02:33this
0:02:34this is like two article
0:02:35a limitation
0:02:42so
0:02:43uh we need to somehow we meet this space
0:02:46and usually
0:02:47we can either discard
0:02:48features like that will perform some selection
0:02:51sure
0:02:52or we can do a combination of the features
0:02:54which
0:02:55we'll call here
0:02:56feature extraction
0:02:58that is what i'll be describing later
0:03:01so basically for the feature selection we can either
0:03:04choose the the features that
0:03:05but you're frequently in the data like
0:03:08it is useless to have
0:03:09uh
0:03:10uh in this feature vector
0:03:12combinations of uh
0:03:13of phonemes that form anger on that
0:03:15never a cure the
0:03:17well training set which can
0:03:18easily
0:03:19uh a cure like
0:03:20some some combinations of phonemes are
0:03:23uh
0:03:23very unlikely to happen and we maybe
0:03:26uh some reasonable pruning then
0:03:28uh such thing of another
0:03:30a cure
0:03:31and of course there
0:03:33there are also other combinations of power in grounds that actually
0:03:36that can happen
0:03:37joker sometimes but
0:03:39uh it is not very meaningful to them
0:03:41in in that features that
0:03:43uh and we can uh
0:03:45usually discard them
0:03:46a base
0:03:47on a on a some threshold value so
0:03:49basically
0:03:50although although
0:03:52thank around
0:03:53or cure
0:03:54well then sample you can be
0:03:55discarded
0:03:56other approach
0:03:57two
0:03:58use this kinetic information and
0:04:01uh that
0:04:01means that
0:04:02we will try to
0:04:03keep all the
0:04:04and grammars that are actually good for the classification of languages
0:04:08it is slightly different because
0:04:10you can imagine that
0:04:12someone
0:04:12a low frequency in grounds that are quite rare
0:04:15uh in general across languages
0:04:17might be
0:04:18uh quite uh
0:04:19why this can wrap it or
0:04:21quite informative
0:04:22for discrimination
0:04:24or some such a language
0:04:25uh so
0:04:26uh this can be like a better way to
0:04:29this card
0:04:30feature
0:04:34uh
0:04:35oh here i would like to
0:04:36show that idea
0:04:38why
0:04:38we tried to
0:04:39use feature extraction
0:04:41because
0:04:42uh
0:04:43we can easily see that for example someone
0:04:46uh combinations of phonemes will
0:04:48have well
0:04:49various like zero values we can discard them
0:04:52is it a like i
0:04:53that on the bruise
0:04:54uh slide but on the other hand
0:04:56there can be combinations
0:04:58and grounds like
0:04:59i have written
0:05:00then here like
0:05:01being being being anything which
0:05:04it just something that sounds are computed the same but in some cases like
0:05:08is
0:05:09right
0:05:09for example but it can only happen that
0:05:11some
0:05:12a phoneme combinations will have very similar pronunciation variant and then
0:05:16maybe i'll
0:05:17uh frequently come here and uh
0:05:19in the lattices
0:05:21and uh of course
0:05:23even if a frequency of these i think around
0:05:25quite high
0:05:26it would be a good idea at least class then together somehow so
0:05:31that we would need
0:05:32not to deal with this
0:05:34uh with this uh
0:05:35oh
0:05:36uh with this the amount of features that is like
0:05:39you use less
0:05:39goes
0:05:40it can be seen that
0:05:42you don't need to like four
0:05:43choose here but it would be
0:05:44you know
0:05:45have just one so this is like motivation example
0:05:48what we try to do
0:05:55and uh
0:05:56uh we we have tried to
0:05:58to use a simple pca
0:06:00average
0:06:01like a dinner
0:06:02projection
0:06:03which can be used to compute some of the same some matrix two
0:06:06or from some linear projection of the
0:06:08but original feature space to some lower dimensional feature space
0:06:12and uh
0:06:13it's
0:06:13seems like it is uh a good way how to
0:06:16five
0:06:16curse of dimensionality that is caused by the
0:06:19financial
0:06:20increasing numbers
0:06:21number of parameters than me
0:06:23increase the size of the context like
0:06:25and we go from trigram
0:06:26program and so on
0:06:28and actually quite similar idea works in
0:06:30um normal bigram language modelling
0:06:33uh so
0:06:34it sounds like yeah
0:06:35some reasonable way
0:06:37to go
0:06:38and uh that of course
0:06:40the other is
0:06:41like
0:06:41uh are that we don't need to tune many parameters
0:06:45to try
0:06:46that's all
0:06:47it's very fast simple and there's still plenty of
0:06:50tools that can be used to compute pca
0:06:53so so simplicity is
0:06:54one of the reasons like
0:06:56using
0:06:57technique
0:06:59and now i will
0:07:00discuss the results that we have
0:07:01thing
0:07:02on a nice
0:07:03uh language recognition
0:07:04thousand
0:07:05nine uh
0:07:06on the coast
0:07:07that condition
0:07:09or all the durations
0:07:10will be in the
0:07:12the law
0:07:13line
0:07:14so
0:07:15uh for our development set uh
0:07:17but
0:07:18used for business
0:07:19uh
0:07:20thousand nine
0:07:21uh
0:07:22we have tried to
0:07:23first
0:07:24best again
0:07:25what happens then we discard
0:07:27features by their frequency
0:07:29so we are keeping only
0:07:31uh the most frequently appearing five thousand features in that and that was and so on and
0:07:36you can actually see
0:07:38that uh that the accuracy of the system
0:07:41rows of an interview at the more features and it seems like
0:07:44natural that it would be good to people all the features and
0:07:48what
0:07:48the as the ends to
0:07:49cool and what are they use the useful features and
0:07:53and not the discard any of them
0:07:55but of course this is impossible we don't have a result
0:07:58in in this table what would happen if
0:08:01yeah
0:08:01if we would
0:08:02use all the features
0:08:04because that will feature space
0:08:05like over
0:08:06one minute of combinations of course be
0:08:09uh not all these combinations they really happened to appear in the
0:08:12the training set but
0:08:13uh the amount of four combinations that actually happened
0:08:16it's like out several hundreds of thousands
0:08:19uh and
0:08:20this is
0:08:21simply impossible to
0:08:22to compute
0:08:23uh in a reasonable time
0:08:25so
0:08:26so uh
0:08:27this is like
0:08:28the the
0:08:29result that can be
0:08:31interpreted like yeah that
0:08:33we cannot go further
0:08:38and uh then we have to use the
0:08:41you see a
0:08:42actually this is shown on the
0:08:44and the trigram because uh as i have said
0:08:46previously on the program
0:08:48uh
0:08:49steam
0:08:49of it longer and
0:08:51a recogniser
0:08:52you are not able to compute the the full
0:08:54uh for feature space a base
0:08:57so
0:08:57uh yeah
0:08:58so
0:08:59uh this is
0:09:00on trigram
0:09:01we are
0:09:01can
0:09:02seen that the
0:09:03for system is around two point three
0:09:05C average and that's that
0:09:07and that
0:09:08that's the last line
0:09:09and the previous lines are
0:09:11when we
0:09:12previews
0:09:12this
0:09:13this feature space from this
0:09:14thirty six thousand features like
0:09:16one hundred
0:09:17five hundred and so on the
0:09:19you can see actually that
0:09:20when we go to something like five or five hundred or one thousand features
0:09:24which is like
0:09:25uh okay
0:09:26six times less
0:09:27and the original your space
0:09:29we can uh
0:09:30uh get almost the same performance then
0:09:33then the speed of that is described
0:09:35in more detail in the paper
0:09:36can be had a large
0:09:38oh
0:09:38for training that's
0:09:39stan
0:09:40and testing
0:09:41uh actually the the
0:09:42you don't have the
0:09:43think basis even
0:09:44faster than all the training phase because in the training phase
0:09:47basically need to estimate first
0:09:49pca one
0:09:51while
0:09:51in the testing phase we don't need to do this
0:09:54the only project the data
0:09:56so
0:09:57it can be seen from from this like that
0:10:00actually
0:10:00seems to work reasonable
0:10:06uh yes and
0:10:07maybe i can add it did we actually tried to use
0:10:10um more
0:10:11uh more toolkits
0:10:13that are freely available to compute these svms models and like
0:10:17uh we have tried to tune
0:10:19all of these to obtain the best performance then
0:10:22like um
0:10:24my
0:10:24very cool experience is that
0:10:26and it as a gmm svm search
0:10:28but quite good results
0:10:30and pleading there is
0:10:31like
0:10:32ten times faster but
0:10:33about five percent worse in accuracy
0:10:40and now for the
0:10:42for the result with multiple systems because we have trained
0:10:45uh a hungarian
0:10:47anger in phoneme recogniser
0:10:49english phoneme recogniser impression phoneme recogniser
0:10:53then in the end we use all the results together
0:10:55we'll see that later
0:10:57and we can see on this table of a basically happened
0:11:00like if you would focus on the
0:11:02i greens
0:11:03payment
0:11:05you can see that actually five hundred
0:11:06features
0:11:07were quite well but
0:11:08uh when we go two thousand bits
0:11:10actually better and then
0:11:12uh we don't observe any real time
0:11:14man
0:11:15from going to four thousand
0:11:16features
0:11:17so it seems like the the value around one thousand
0:11:20features uh
0:11:21seems to be quite good
0:11:22uh then the interesting thing is that
0:11:24actually foreground
0:11:26work uh
0:11:27um horsemen trigrams
0:11:29of course
0:11:30space
0:11:31at the feature space of foreground
0:11:33yeah it's not full
0:11:34we need it apart from some feature selection there
0:11:37because otherwise in
0:11:38the estimation of pca would be difficult
0:11:40do
0:11:41uh so so basically it seems a reasonable to use just trigrams
0:11:45and it should work okay
0:11:49yeah
0:11:50that the data
0:11:51that results in more they
0:11:52detail later
0:11:53now for the english system we can see that
0:11:55so i does basically the same thing
0:11:58as uh for the hungarians
0:11:59stan
0:12:00even it seems that
0:12:01it would be enough to keep it just
0:12:03five hundred
0:12:03matures
0:12:04of course
0:12:05uh
0:12:06the optimal size of of this
0:12:08a reduced uh
0:12:09space
0:12:10uh
0:12:10depends also on density of values in the lattices and these things so
0:12:14so it's not like sound
0:12:16some singularly about
0:12:17it should be
0:12:18uh somehow tuned for every system
0:12:20but of course using bigger
0:12:22uh bigger features
0:12:24some of the problem
0:12:25so the competition fine goes up
0:12:28and then twenty four directions system it was the last one
0:12:31and the the largest uh phoneme set
0:12:34was quite difficult to train
0:12:36one
0:12:37uh we have uh actually tried to
0:12:39use some more
0:12:40uh training data
0:12:42as uh in all the to use uh systems we have used
0:12:45uh our ten thousand to
0:12:47uh training samples
0:12:49to train the sustains about for the last
0:12:51then we have
0:12:51actually
0:12:52uh used to almost fifty thousand is already
0:12:55very large and the original feature space those
0:12:58uh
0:12:59more than one hundred thousand
0:13:00sure so
0:13:01there's also a very age sixteen men
0:13:04uh
0:13:05quite belong to a given training that
0:13:06pca
0:13:07but
0:13:08can be seen that in the end it works
0:13:10the buttons
0:13:11it would be definitely good to
0:13:13train the systems
0:13:14all this
0:13:14stints on all the
0:13:15available data
0:13:17it can be seen from
0:13:18this result
0:13:20in the end we did not
0:13:21apart from that
0:13:26uh here we have
0:13:27final result
0:13:28actually happens when we
0:13:29use all the trigrams
0:13:31stance
0:13:31from
0:13:32of the previous slide
0:13:33and all the forums
0:13:35since
0:13:35can be seen that
0:13:36but the trigrams
0:13:37are they performing rather than foreground
0:13:41and from the from the combination of
0:13:43no three grams
0:13:44for foreground
0:13:46uh we can get some small improvement
0:13:48that goes across all the conditions but it's
0:13:51very small
0:13:52uh what
0:13:53what this more useful
0:13:54is so i think a system that was trained on more data
0:13:57that
0:13:58russian trigram all
0:14:00stan
0:14:01and so that gives us
0:14:02actually
0:14:03uh better improvement than using foreground
0:14:06and
0:14:07that in the end of an
0:14:08when uh we were able to fix
0:14:10so the development set
0:14:12it was described by
0:14:13all double quote in
0:14:14in this presentation in the morning
0:14:16uh it
0:14:17possible to to get even much better result
0:14:20and but it's
0:14:21around one point eight
0:14:22on the original thought
0:14:24second switch
0:14:25like
0:14:25right
0:14:26number
0:14:27and
0:14:28uh
0:14:29well
0:14:30i don't have the results for the fusion but
0:14:32uh that is in the paper
0:14:34at all levels
0:14:35presenting
0:14:39so for the conclusion we can say that
0:14:41we can achieve uh very high speed up
0:14:44as was
0:14:44in the previous tables like
0:14:46we can uh
0:14:47uh a system that is trained
0:14:49um much faster than a hundred times
0:14:52and that we don't lose almost any performance like on the accuracy
0:14:57and uh of course
0:14:58uh this
0:14:59uh this uh technique and it can be used to
0:15:01radius of the parameter space
0:15:03for the future space
0:15:05uh that would allow us to
0:15:07to use um a more complicated techniques like
0:15:10as the ends with nonlinear kernels then
0:15:12very need to tune the more parameters which is
0:15:15quite difficult to do
0:15:16uh when we were operating
0:15:18false
0:15:18just by
0:15:20and
0:15:21some ideas for
0:15:23for some future work
0:15:24of course
0:15:25we can think about some more
0:15:27complicate it a feature reduction technique that would be
0:15:30like
0:15:30something nonlinear maybe some neural net
0:15:33and can be
0:15:34oh it
0:15:35estimated that this kind of
0:15:37oh okay
0:15:38even bigger
0:15:39uh feature space reductions
0:15:41uh it
0:15:42it's similar performance
0:15:43and of course uh
0:15:45uh in this uh in this
0:15:46examples um for the
0:15:48speed up the results of that
0:15:49uh
0:15:50i have shown here
0:15:51we have been estimated you see all the data
0:15:54which is not really need it and
0:15:56we have given some other results
0:15:58from which we know that we can estimate P C
0:16:00yeah
0:16:01you say just on the subset of the data and then
0:16:04you can get even faster
0:16:05yeah
0:16:06so that would be like all
0:16:09thanks for attention
0:16:17questions
0:16:22thanks
0:16:22um
0:16:23i mean that's very surprising
0:16:26um
0:16:27one hundred dimensions to the
0:16:30and that's
0:16:31two
0:16:32capture
0:16:33sonnets
0:16:34yeah
0:16:37um
0:16:38usually
0:16:39when you do principal components and um
0:16:43some sort of room
0:16:44um like you
0:16:46estimated to to learn as well
0:16:50and then use
0:16:52how many are not used in to account for
0:16:55ninety percent
0:16:57my
0:16:57five percent
0:16:58oh
0:16:59hmmm
0:17:01i wondered
0:17:02do you know
0:17:03uh how much of the variability is captured
0:17:07um
0:17:08well um
0:17:09actually i didn't try to compute this i was just looking at them and the final results link
0:17:13the accuracy of the system and i was reduced
0:17:15the data
0:17:16so i'm
0:17:17would not
0:17:18i am not able
0:17:19for this
0:17:20uh
0:17:21i think it might be interesting and i think it might be interesting to do the calculation when you huh
0:17:27the
0:17:27right
0:17:28description
0:17:29hmmm
0:17:29or not
0:17:32uh
0:17:32see
0:17:33the variability here
0:17:35really
0:17:37um
0:17:37due to the
0:17:38redundancy in in the in the something that
0:17:41hmmm i know the right
0:17:43mine's are
0:17:44goings
0:17:45we choose
0:17:46many
0:17:47very similar yes yeah
0:17:50so it's a
0:17:51that you can project to when you're not there
0:17:54right
0:17:55not
0:17:57oh
0:17:57if you just have the correct transcription
0:18:01but you don't have that
0:18:02yeah
0:18:03my intuition would be
0:18:05the
0:18:07phonotactic very
0:18:09um
0:18:11thirty second utterance
0:18:12still usable
0:18:13oh
0:18:14so
0:18:14and
0:18:16hmmm
0:18:16would be much time
0:18:18um
0:18:19dimension
0:18:20well
0:18:23my impression
0:18:23hmmm
0:18:24i
0:18:25uh_huh
0:18:25i
0:18:26i think it would be nice
0:18:28if we could see some like that
0:18:30hmmm
0:18:31okay
0:18:41yeah
0:18:43i assume that
0:18:44right
0:18:45if you just you you know
0:18:47yeah
0:18:48so we set and case
0:18:50you might
0:18:51well you
0:18:52some of the you you you get some nice um
0:18:55i'm sorry
0:18:55you need
0:18:57you can't use an
0:18:58no need to be
0:18:59weeks and i these
0:19:01well i mean it's not a nice
0:19:04yeah
0:19:04yeah so um yeah i i guess that's
0:19:07table it's like the most
0:19:08the simple technique to use them
0:19:10uh that was like the the reason why we have used it here
0:19:13just to see that the idea of a work or not
0:19:15but of course i'm not saying that because the optimal thing
0:19:19thank you
0:19:23questions
0:19:26yeah
0:19:27yeah this is
0:19:28based on ignorance but you see you see
0:19:31my question reason in surrey
0:19:33oh
0:19:33use it is
0:19:34pca
0:19:35it's easy easy
0:19:37yeah
0:19:38where
0:19:39dimension to the million
0:19:41hmmm
0:19:43so we choose
0:19:44specifically you do they
0:19:46as far as i
0:19:47do you see a you need it
0:19:49covariance matrix
0:19:50we should i think i mean you know
0:19:52in our paper we
0:19:54we use site uh another paper of our there is
0:19:57right
0:19:58a technicality
0:19:59maybe
0:20:00a on large amounts of data
0:20:01right efficiently
0:20:03and uh
0:20:03really
0:20:04uh you can find even code
0:20:06from a lot
0:20:07uh for this
0:20:08estimation of
0:20:09pca
0:20:09where you don't need to compute to all covariance matrix
0:20:12but uh it is uh it is based on iterative algorithm
0:20:16uh that uh
0:20:17arounds like
0:20:18uh that doesn't mean that the full covariance matrix
0:20:25yeah well you don't if you have more questions
0:20:29anything else
0:20:31you