0:00:13is might go
0:00:13i am gonna out so to day about an approach to selection of
0:00:17a order n-grams
0:00:19in a from that phonotactic
0:00:21a was recognition
0:00:22yeah
0:00:23lance time is to drive so are we try to
0:00:26go quickly
0:00:28a
0:00:30i will
0:00:30do a short introduction that we show you
0:00:34a a a which is that fit to select method call are what we present
0:00:38are we show you this pretty meant that that
0:00:40a a to that and results
0:00:41and i will finish with a shot somebody
0:00:44of the work
0:00:45so the motivation the mean but reason is that
0:00:48a a phonotactic language recognition
0:00:50hi of the and side expected to have more
0:00:53this scheme that if a information
0:00:55from the language
0:00:56i mean more languages which it's bit weak a information
0:00:59i
0:01:00the problem is that
0:01:02hi and number at the number of n-grams
0:01:04yeah a a spongy like
0:01:06as N increases
0:01:07so so have there are some a a a computer the in it's
0:01:10and uh that's why
0:01:12uh many many a C stands usually
0:01:16stick
0:01:17the and or more that to three or four
0:01:20a a we cannot i apply
0:01:22that that directly dimensionality of reductions
0:01:24like pca your the eh
0:01:26two sets a huge space
0:01:28so uh uh is thirty got yeah some
0:01:31a uh works
0:01:33related to
0:01:34fit selection i would mention yes
0:01:36two of them
0:01:37a has one by every just one that that in i cups
0:01:40to first and then eight
0:01:41where and they uh would tape just sent the to work well
0:01:45filter method
0:01:46that was used to select the most discriminative can if and mean as one N grams based on S B
0:01:51M
0:01:52where
0:01:52and that those n-grams where where a
0:01:55and it
0:01:56to get that's of out a a subset of anger
0:02:00some G is like that uh
0:02:02quite the similar work
0:02:03my turn it down
0:02:05i they use the same rubber filter method
0:02:07a a but use set to this to mean that if we tell you
0:02:10first uh as B M's but issue you in which is by see basically the same
0:02:15a a as the but it was one
0:02:16and second that use also it chi square measure
0:02:20the fact is that in both cases
0:02:22a there was no improvement or even at the relation
0:02:25and when
0:02:27hi high get than four grams were used
0:02:30we also had a
0:02:32quite a similar problem we we uh
0:02:36we have rest
0:02:37a quite similar problem
0:02:38in a previous work
0:02:39what we uh uh eat
0:02:41phonotactic language recognition using um
0:02:44a the colour
0:02:45oh prince of phone n-grams
0:02:47in that case that features space was very weak
0:02:50and that the K yeah we use that was just to to do a
0:02:54or can base fits a selection
0:02:57was too but to build in this past a vector of respect
0:03:01of cones
0:03:01using only the most frequent
0:03:03uh units
0:03:05but the problem is that in a high or than the not you know a there is fit to speech
0:03:10is
0:03:10you really huge
0:03:12so we been a simple uh a frequency based selection
0:03:15can be a a a a a channel
0:03:18so let's do frequent space
0:03:20effect selection that H be the number of phonetic units of an acoustic the colour
0:03:26a is so that is six
0:03:27eight for what when possible and n-grams
0:03:30i if a the number of units of an acoustic or it would be for
0:03:34and and save and they all that of the n-grams
0:03:37they fit to space it's really huge
0:03:40but uh we must take
0:03:42we must to and we must think that most of the most of those fits
0:03:46we we not the appear in the we will not be seen in the training so we can forget get
0:03:52well
0:03:53i mean even and most of the scene it's as would have a very low counts so we can
0:03:58for a bit then and just
0:04:00simply be select the most frequent
0:04:02fits
0:04:03the problem is that with such a high
0:04:06sets set you say a fit space
0:04:09the vector a set of a
0:04:11the uh i mean we cannot even
0:04:13uh two
0:04:15the could active comes we cannot uh a store all the comedy seconds
0:04:19of the of all day features sing
0:04:21in the training set
0:04:23so we state of the directly uh
0:04:27a a all the collective comes we must is me
0:04:32they is quite simple
0:04:33we have one i used the full training set
0:04:36i we are one that's like to buy
0:04:38to wheel table with all the comedy of cons of the scene fits
0:04:42but that it equally well when clean that's a what
0:04:45so every every collected K calls
0:04:47we are one that retain all the only those entries we
0:04:50hi good cons than a given first tell
0:04:54so in P D all the and this with no work calls are described that that it they can a
0:05:00set to zero
0:05:03uh yeah both T and tao
0:05:05a would be stick or constants so by mid this must be to
0:05:09in order to get a quite a beach
0:05:10table
0:05:11in our experiments we usually a a a a a a get
0:05:14and this
0:05:15ten ten times speaker tables
0:05:18and that the site is nice
0:05:19so the and is that the site the size do you find of the the side
0:05:23we we try to get a ten times speaker tape
0:05:26so the proposed going this work seen that we start with that and that T with an M table
0:05:31C will be dead parameter that L sats
0:05:33a a how many combative comes with
0:05:36we for a great
0:05:38since the last
0:05:39a
0:05:40since the last the update
0:05:42and for every i a training sent as we do
0:05:46we can wait
0:05:47the cons
0:05:48and did say well
0:05:49in the table
0:05:50and then we update the T parameter when that you parameter is high and the key from i'm at that
0:05:55we have data table type is with the model
0:05:57all the entries with a
0:06:00i the end we must do it in and a final that updates
0:06:03to see that they
0:06:05final size of the table is much be good than
0:06:08the decide
0:06:09and size
0:06:10then we to get a table and just use the most frequent and
0:06:15for
0:06:16the use race that it it to this are quite common approach in i
0:06:19phonotactic language recognition we use a and from this for this
0:06:23we estimate like this is uh and uh
0:06:26we use as bn based language model with backoff n-grams
0:06:29and think
0:06:30oh gaussian back and a linear fusion
0:06:33i D N
0:06:34a training development and test a a corporate i was that a T got for the two thousand and seven
0:06:39common used a language recognition evaluation
0:06:43a we use yes
0:06:44a ten conversations for development for function calibration
0:06:49and and those the compensation what where to split
0:06:52a for the splitting thirty seconds
0:06:54a thirty second
0:06:56second
0:06:57the evaluation was carried out in the core a it can be back thirty seconds plus
0:07:04oh this of what use was uh for a valuable
0:07:06the from the chorus
0:07:07a a a a a and know from but
0:07:10group
0:07:10for checks
0:07:12a on guard interval and
0:07:13version
0:07:14that this is where of time with htk using the a a brand or C
0:07:19as the a modeling who is that was done a using a deep linear uh quite fast
0:07:24you have only
0:07:26enough of the M
0:07:27and in back can um for was done using their focal toolkit keep from nick
0:07:34before i doing that according to the using their of a brand of the goal this a we split
0:07:39we are more the non-speech
0:07:41a a segments a from from the training segments
0:07:45and they all the non phonetic you knees where maps
0:07:48where mapped to set
0:07:51we use a a a remote that we do we do use to like this is so we use the
0:07:55but have the core there's only to get a estimates of day
0:07:59a gmm is state
0:08:02uh we did to a phone we use for that S and those that is where model by means of
0:08:07support vector much is using
0:08:09the knows
0:08:10a a with the test and that and back of n-grams think
0:08:14and using the stand that
0:08:15i the background probability weighting
0:08:18the training was doing use one versus all
0:08:21but
0:08:23so let's
0:08:25jump from fit three times to four
0:08:28and a we take just that all the all the grams
0:08:31a in training
0:08:32we see that that we got only in a round about
0:08:36to two me on a a fixed
0:08:38in train
0:08:39so
0:08:40i are have different numbers for each the call that
0:08:43so that is no need so to and you speech that with two thousand two medium as we can
0:08:48yeah a count them
0:08:49and select the most for of them
0:08:52if we use the full
0:08:53to um
0:08:55two medium the as
0:08:56in fact
0:08:57but not all the features which will be really need send
0:09:01so they are a well i but it's size of the a sparse vectors of this sparse code vectors
0:09:05was found to be a a a about
0:09:07seventy first
0:09:11we use the four
0:09:12four gram scrolls a we getting me prove mean feel of it or send the even you whatever right
0:09:18but we should take into account that
0:09:20they are but spectral size
0:09:21and it when we use the full
0:09:23four grams what's was
0:09:25quite a be your
0:09:26and and a three gram baseline system
0:09:28so that would that would be a problem if we've got a lot of data data for example or for
0:09:32the two thousand and nine competition
0:09:34what what the that was much be
0:09:37so that first thing was just to uh select the most frequent units
0:09:41from there full
0:09:43for
0:09:43but all around yeah
0:09:45in this day were you can see all the was
0:09:47when D we select a start in from
0:09:50so and units
0:09:51a to fight so you is obviously as we select a
0:09:55less and less units
0:09:56you
0:09:57their but but he's is more that
0:09:59equal whatever rate grows at not money but with some was delay of solution that's why we prefer see a
0:10:04there
0:10:05a a coarse P
0:10:07cost
0:10:08for for for evaluation because see somehow more more as significant
0:10:12and you but a right because
0:10:14as a mall a a small perturbations are run
0:10:17the it what a point
0:10:19lead to different people are are like by
0:10:22so it we mark
0:10:23two
0:10:24to to select "'em" points
0:10:25first there
0:10:26one hundred thousand and second was the thirty thousand
0:10:29there is an is that once a hundred percent features is more or less the same number of features that
0:10:34with full
0:10:35sector
0:10:36and the S C if that's and uh one was selected because
0:10:40are but it's vector size is more or less a Q but i
0:10:43more or less give a link
0:10:44to their site comes case so the computational of course the case of thirty some stuff for that was and
0:10:49was more or less the same
0:10:51as state
0:10:54so let's try to jam
0:10:56um
0:10:57four grams to high gear and
0:10:59we is the only fixed K and now but values
0:11:02to ensure at least to me
0:11:04for is at the end of the out all
0:11:07a a i guess the
0:11:08just to note that the a key value a sick you in
0:11:11to more than two hours of voice
0:11:13i would
0:11:14so that means that
0:11:15close to features that
0:11:17even at the in the year of the counties these really a each read a really low
0:11:22in two years out yeah
0:11:23are are a a a remote
0:11:26a a also as N increases as we use high get an a and all or
0:11:33i the number of like n-grams decreases
0:11:35in this this table we can see
0:11:37how many how many like n-grams grams
0:11:39we can find a as we change in all but all that from three to set and you can see
0:11:44that when we
0:11:45get
0:11:46the most free bins
0:11:47saving grams
0:11:48or or that around
0:11:49only
0:11:50twelve of them P
0:11:52so we select a seven as the high guest or that for
0:11:55and
0:11:58this table you can see the ross rules in the probability dimension on a too
0:12:02for two sec select you on uh leads
0:12:05a hundred thoughts and of thirty
0:12:07thirty thousand
0:12:08you're scene
0:12:09from three i'm to after
0:12:12two seven
0:12:12seven for order
0:12:14and
0:12:15uh as you can see
0:12:17the the right a a a a a a with a four and then and
0:12:21C runs
0:12:22a once a a that be from from and
0:12:25grams
0:12:26a with a five six and seven
0:12:28they that the on the the they have a a a a a the from
0:12:33the four gram
0:12:34system anyway
0:12:35a good day the be good
0:12:37a wouldn't you know was that
0:12:39uh uh even the rest what not bad that they were not what's
0:12:43the i mean the are somehow this stable
0:12:46a
0:12:46that they have
0:12:47quite a a big not in a quite big K
0:12:50a number of
0:12:51i or other
0:12:52and in the east
0:12:54or we can collect
0:12:56eighteen
0:12:57eighteen thousand
0:13:00so uh
0:13:02i would try to
0:13:04fine is my presentation
0:13:05we present that that the a for its a mixed in with a a a a a has been proposed
0:13:11so i has some i don't make fits a selection with the that has been proposed which of was to
0:13:16perform phonotactic being based language recognition
0:13:18with a high or that and are
0:13:21or from an improvements
0:13:22a a with a route to the baseline trigram svm system have been reported in experiments are on there
0:13:27so those and seven used competition but the base
0:13:30when when uh applying the proposed a in to select the most frequent units up to four five six
0:13:36and seven
0:13:38a is from our was obtained
0:13:40when selecting that
0:13:41a
0:13:42hundred thousand most frequent units up to five grams
0:13:46would you lead and need what or a great improvement of eleven percent
0:13:49with a to that these than three gram system
0:13:52i we are currently working on the evaluation of as smart the selection criteria on the
0:13:57these
0:13:58uh approach
0:13:59so
0:14:00that's so on
0:14:02a thank you have
0:14:09question
0:14:10for
0:14:12uh
0:14:15that
0:14:16i
0:14:16or
0:14:20a
0:14:22from
0:14:27we
0:14:33i think what we known as was that with each lower or or gram you had a different dynamic range
0:14:39i was wondering if you
0:14:40tried to be scaled them differently or
0:14:42or if them separately or something that
0:14:45yeah
0:14:46right
0:14:47oh
0:14:47no
0:14:48yeah
0:14:48yeah
0:14:50yeah
0:14:51uh
0:14:54yeah
0:14:55yeah
0:14:57i
0:14:59don't
0:15:02right
0:15:03just leave them on the vector
0:15:04yeah
0:15:05yeah
0:15:05oh
0:15:07ah
0:15:10a
0:15:14okay yeah
0:15:16i
0:15:17and
0:15:18and
0:15:19this one
0:15:20and
0:15:22and
0:15:23uh
0:15:24but
0:15:26we
0:15:28i
0:15:31oh okay
0:15:32a
0:15:38thank you
0:15:39yeah
0:15:43one iteration of me
0:15:45this one or the other
0:15:46that
0:15:47well
0:15:48well
0:15:49you
0:15:50phonetic
0:15:51for
0:15:53i
0:15:54yeah
0:15:55oh
0:15:56because general
0:15:57yeah
0:15:58i
0:15:59that
0:16:00that
0:16:01were
0:16:03but
0:16:05i
0:16:05uh_huh
0:16:06yeah true
0:16:08sure but H M
0:16:10we have somehow
0:16:12sent in politics but
0:16:14i
0:16:16maybe i
0:16:17i i and the so you pressed to the you mean we
0:16:19right
0:16:19here's
0:16:20acoustics is that we have more say
0:16:24the
0:16:25three
0:16:26looking at
0:16:27a
0:16:27but
0:16:28i
0:16:28and all these no baseline
0:16:30i
0:16:32oh
0:16:33that
0:16:34station
0:16:35yes think that that work
0:16:37but i
0:16:38a a a a a a a a sure if we we use for
0:16:41one it and the X i mean
0:16:43like
0:16:45anyway way i think somehow some something this
0:16:48so how you see to
0:16:50i
0:16:50that uh
0:16:52thinks is
0:16:53you
0:16:53when you are pop used in and in in a in uh
0:16:57in a a a a special uh
0:16:59a a a a a out of you for all of them
0:17:02for the rest of
0:17:04the thing