0:00:13 | is might go |
---|
0:00:13 | i am gonna out so to day about an approach to selection of |
---|
0:00:17 | a order n-grams |
---|
0:00:19 | in a from that phonotactic |
---|
0:00:21 | a was recognition |
---|
0:00:22 | yeah |
---|
0:00:23 | lance time is to drive so are we try to |
---|
0:00:26 | go quickly |
---|
0:00:28 | a |
---|
0:00:30 | i will |
---|
0:00:30 | do a short introduction that we show you |
---|
0:00:34 | a a a which is that fit to select method call are what we present |
---|
0:00:38 | are we show you this pretty meant that that |
---|
0:00:40 | a a to that and results |
---|
0:00:41 | and i will finish with a shot somebody |
---|
0:00:44 | of the work |
---|
0:00:45 | so the motivation the mean but reason is that |
---|
0:00:48 | a a phonotactic language recognition |
---|
0:00:50 | hi of the and side expected to have more |
---|
0:00:53 | this scheme that if a information |
---|
0:00:55 | from the language |
---|
0:00:56 | i mean more languages which it's bit weak a information |
---|
0:00:59 | i |
---|
0:01:00 | the problem is that |
---|
0:01:02 | hi and number at the number of n-grams |
---|
0:01:04 | yeah a a spongy like |
---|
0:01:06 | as N increases |
---|
0:01:07 | so so have there are some a a a computer the in it's |
---|
0:01:10 | and uh that's why |
---|
0:01:12 | uh many many a C stands usually |
---|
0:01:16 | stick |
---|
0:01:17 | the and or more that to three or four |
---|
0:01:20 | a a we cannot i apply |
---|
0:01:22 | that that directly dimensionality of reductions |
---|
0:01:24 | like pca your the eh |
---|
0:01:26 | two sets a huge space |
---|
0:01:28 | so uh uh is thirty got yeah some |
---|
0:01:31 | a uh works |
---|
0:01:33 | related to |
---|
0:01:34 | fit selection i would mention yes |
---|
0:01:36 | two of them |
---|
0:01:37 | a has one by every just one that that in i cups |
---|
0:01:40 | to first and then eight |
---|
0:01:41 | where and they uh would tape just sent the to work well |
---|
0:01:45 | filter method |
---|
0:01:46 | that was used to select the most discriminative can if and mean as one N grams based on S B |
---|
0:01:51 | M |
---|
0:01:52 | where |
---|
0:01:52 | and that those n-grams where where a |
---|
0:01:55 | and it |
---|
0:01:56 | to get that's of out a a subset of anger |
---|
0:02:00 | some G is like that uh |
---|
0:02:02 | quite the similar work |
---|
0:02:03 | my turn it down |
---|
0:02:05 | i they use the same rubber filter method |
---|
0:02:07 | a a but use set to this to mean that if we tell you |
---|
0:02:10 | first uh as B M's but issue you in which is by see basically the same |
---|
0:02:15 | a a as the but it was one |
---|
0:02:16 | and second that use also it chi square measure |
---|
0:02:20 | the fact is that in both cases |
---|
0:02:22 | a there was no improvement or even at the relation |
---|
0:02:25 | and when |
---|
0:02:27 | hi high get than four grams were used |
---|
0:02:30 | we also had a |
---|
0:02:32 | quite a similar problem we we uh |
---|
0:02:36 | we have rest |
---|
0:02:37 | a quite similar problem |
---|
0:02:38 | in a previous work |
---|
0:02:39 | what we uh uh eat |
---|
0:02:41 | phonotactic language recognition using um |
---|
0:02:44 | a the colour |
---|
0:02:45 | oh prince of phone n-grams |
---|
0:02:47 | in that case that features space was very weak |
---|
0:02:50 | and that the K yeah we use that was just to to do a |
---|
0:02:54 | or can base fits a selection |
---|
0:02:57 | was too but to build in this past a vector of respect |
---|
0:03:01 | of cones |
---|
0:03:01 | using only the most frequent |
---|
0:03:03 | uh units |
---|
0:03:05 | but the problem is that in a high or than the not you know a there is fit to speech |
---|
0:03:10 | is |
---|
0:03:10 | you really huge |
---|
0:03:12 | so we been a simple uh a frequency based selection |
---|
0:03:15 | can be a a a a a channel |
---|
0:03:18 | so let's do frequent space |
---|
0:03:20 | effect selection that H be the number of phonetic units of an acoustic the colour |
---|
0:03:26 | a is so that is six |
---|
0:03:27 | eight for what when possible and n-grams |
---|
0:03:30 | i if a the number of units of an acoustic or it would be for |
---|
0:03:34 | and and save and they all that of the n-grams |
---|
0:03:37 | they fit to space it's really huge |
---|
0:03:40 | but uh we must take |
---|
0:03:42 | we must to and we must think that most of the most of those fits |
---|
0:03:46 | we we not the appear in the we will not be seen in the training so we can forget get |
---|
0:03:52 | well |
---|
0:03:53 | i mean even and most of the scene it's as would have a very low counts so we can |
---|
0:03:58 | for a bit then and just |
---|
0:04:00 | simply be select the most frequent |
---|
0:04:02 | fits |
---|
0:04:03 | the problem is that with such a high |
---|
0:04:06 | sets set you say a fit space |
---|
0:04:09 | the vector a set of a |
---|
0:04:11 | the uh i mean we cannot even |
---|
0:04:13 | uh two |
---|
0:04:15 | the could active comes we cannot uh a store all the comedy seconds |
---|
0:04:19 | of the of all day features sing |
---|
0:04:21 | in the training set |
---|
0:04:23 | so we state of the directly uh |
---|
0:04:27 | a a all the collective comes we must is me |
---|
0:04:32 | they is quite simple |
---|
0:04:33 | we have one i used the full training set |
---|
0:04:36 | i we are one that's like to buy |
---|
0:04:38 | to wheel table with all the comedy of cons of the scene fits |
---|
0:04:42 | but that it equally well when clean that's a what |
---|
0:04:45 | so every every collected K calls |
---|
0:04:47 | we are one that retain all the only those entries we |
---|
0:04:50 | hi good cons than a given first tell |
---|
0:04:54 | so in P D all the and this with no work calls are described that that it they can a |
---|
0:05:00 | set to zero |
---|
0:05:03 | uh yeah both T and tao |
---|
0:05:05 | a would be stick or constants so by mid this must be to |
---|
0:05:09 | in order to get a quite a beach |
---|
0:05:10 | table |
---|
0:05:11 | in our experiments we usually a a a a a a get |
---|
0:05:14 | and this |
---|
0:05:15 | ten ten times speaker tables |
---|
0:05:18 | and that the site is nice |
---|
0:05:19 | so the and is that the site the size do you find of the the side |
---|
0:05:23 | we we try to get a ten times speaker tape |
---|
0:05:26 | so the proposed going this work seen that we start with that and that T with an M table |
---|
0:05:31 | C will be dead parameter that L sats |
---|
0:05:33 | a a how many combative comes with |
---|
0:05:36 | we for a great |
---|
0:05:38 | since the last |
---|
0:05:39 | a |
---|
0:05:40 | since the last the update |
---|
0:05:42 | and for every i a training sent as we do |
---|
0:05:46 | we can wait |
---|
0:05:47 | the cons |
---|
0:05:48 | and did say well |
---|
0:05:49 | in the table |
---|
0:05:50 | and then we update the T parameter when that you parameter is high and the key from i'm at that |
---|
0:05:55 | we have data table type is with the model |
---|
0:05:57 | all the entries with a |
---|
0:06:00 | i the end we must do it in and a final that updates |
---|
0:06:03 | to see that they |
---|
0:06:05 | final size of the table is much be good than |
---|
0:06:08 | the decide |
---|
0:06:09 | and size |
---|
0:06:10 | then we to get a table and just use the most frequent and |
---|
0:06:15 | for |
---|
0:06:16 | the use race that it it to this are quite common approach in i |
---|
0:06:19 | phonotactic language recognition we use a and from this for this |
---|
0:06:23 | we estimate like this is uh and uh |
---|
0:06:26 | we use as bn based language model with backoff n-grams |
---|
0:06:29 | and think |
---|
0:06:30 | oh gaussian back and a linear fusion |
---|
0:06:33 | i D N |
---|
0:06:34 | a training development and test a a corporate i was that a T got for the two thousand and seven |
---|
0:06:39 | common used a language recognition evaluation |
---|
0:06:43 | a we use yes |
---|
0:06:44 | a ten conversations for development for function calibration |
---|
0:06:49 | and and those the compensation what where to split |
---|
0:06:52 | a for the splitting thirty seconds |
---|
0:06:54 | a thirty second |
---|
0:06:56 | second |
---|
0:06:57 | the evaluation was carried out in the core a it can be back thirty seconds plus |
---|
0:07:04 | oh this of what use was uh for a valuable |
---|
0:07:06 | the from the chorus |
---|
0:07:07 | a a a a a and know from but |
---|
0:07:10 | group |
---|
0:07:10 | for checks |
---|
0:07:12 | a on guard interval and |
---|
0:07:13 | version |
---|
0:07:14 | that this is where of time with htk using the a a brand or C |
---|
0:07:19 | as the a modeling who is that was done a using a deep linear uh quite fast |
---|
0:07:24 | you have only |
---|
0:07:26 | enough of the M |
---|
0:07:27 | and in back can um for was done using their focal toolkit keep from nick |
---|
0:07:34 | before i doing that according to the using their of a brand of the goal this a we split |
---|
0:07:39 | we are more the non-speech |
---|
0:07:41 | a a segments a from from the training segments |
---|
0:07:45 | and they all the non phonetic you knees where maps |
---|
0:07:48 | where mapped to set |
---|
0:07:51 | we use a a a remote that we do we do use to like this is so we use the |
---|
0:07:55 | but have the core there's only to get a estimates of day |
---|
0:07:59 | a gmm is state |
---|
0:08:02 | uh we did to a phone we use for that S and those that is where model by means of |
---|
0:08:07 | support vector much is using |
---|
0:08:09 | the knows |
---|
0:08:10 | a a with the test and that and back of n-grams think |
---|
0:08:14 | and using the stand that |
---|
0:08:15 | i the background probability weighting |
---|
0:08:18 | the training was doing use one versus all |
---|
0:08:21 | but |
---|
0:08:23 | so let's |
---|
0:08:25 | jump from fit three times to four |
---|
0:08:28 | and a we take just that all the all the grams |
---|
0:08:31 | a in training |
---|
0:08:32 | we see that that we got only in a round about |
---|
0:08:36 | to two me on a a fixed |
---|
0:08:38 | in train |
---|
0:08:39 | so |
---|
0:08:40 | i are have different numbers for each the call that |
---|
0:08:43 | so that is no need so to and you speech that with two thousand two medium as we can |
---|
0:08:48 | yeah a count them |
---|
0:08:49 | and select the most for of them |
---|
0:08:52 | if we use the full |
---|
0:08:53 | to um |
---|
0:08:55 | two medium the as |
---|
0:08:56 | in fact |
---|
0:08:57 | but not all the features which will be really need send |
---|
0:09:01 | so they are a well i but it's size of the a sparse vectors of this sparse code vectors |
---|
0:09:05 | was found to be a a a about |
---|
0:09:07 | seventy first |
---|
0:09:11 | we use the four |
---|
0:09:12 | four gram scrolls a we getting me prove mean feel of it or send the even you whatever right |
---|
0:09:18 | but we should take into account that |
---|
0:09:20 | they are but spectral size |
---|
0:09:21 | and it when we use the full |
---|
0:09:23 | four grams what's was |
---|
0:09:25 | quite a be your |
---|
0:09:26 | and and a three gram baseline system |
---|
0:09:28 | so that would that would be a problem if we've got a lot of data data for example or for |
---|
0:09:32 | the two thousand and nine competition |
---|
0:09:34 | what what the that was much be |
---|
0:09:37 | so that first thing was just to uh select the most frequent units |
---|
0:09:41 | from there full |
---|
0:09:43 | for |
---|
0:09:43 | but all around yeah |
---|
0:09:45 | in this day were you can see all the was |
---|
0:09:47 | when D we select a start in from |
---|
0:09:50 | so and units |
---|
0:09:51 | a to fight so you is obviously as we select a |
---|
0:09:55 | less and less units |
---|
0:09:56 | you |
---|
0:09:57 | their but but he's is more that |
---|
0:09:59 | equal whatever rate grows at not money but with some was delay of solution that's why we prefer see a |
---|
0:10:04 | there |
---|
0:10:05 | a a coarse P |
---|
0:10:07 | cost |
---|
0:10:08 | for for for evaluation because see somehow more more as significant |
---|
0:10:12 | and you but a right because |
---|
0:10:14 | as a mall a a small perturbations are run |
---|
0:10:17 | the it what a point |
---|
0:10:19 | lead to different people are are like by |
---|
0:10:22 | so it we mark |
---|
0:10:23 | two |
---|
0:10:24 | to to select "'em" points |
---|
0:10:25 | first there |
---|
0:10:26 | one hundred thousand and second was the thirty thousand |
---|
0:10:29 | there is an is that once a hundred percent features is more or less the same number of features that |
---|
0:10:34 | with full |
---|
0:10:35 | sector |
---|
0:10:36 | and the S C if that's and uh one was selected because |
---|
0:10:40 | are but it's vector size is more or less a Q but i |
---|
0:10:43 | more or less give a link |
---|
0:10:44 | to their site comes case so the computational of course the case of thirty some stuff for that was and |
---|
0:10:49 | was more or less the same |
---|
0:10:51 | as state |
---|
0:10:54 | so let's try to jam |
---|
0:10:56 | um |
---|
0:10:57 | four grams to high gear and |
---|
0:10:59 | we is the only fixed K and now but values |
---|
0:11:02 | to ensure at least to me |
---|
0:11:04 | for is at the end of the out all |
---|
0:11:07 | a a i guess the |
---|
0:11:08 | just to note that the a key value a sick you in |
---|
0:11:11 | to more than two hours of voice |
---|
0:11:13 | i would |
---|
0:11:14 | so that means that |
---|
0:11:15 | close to features that |
---|
0:11:17 | even at the in the year of the counties these really a each read a really low |
---|
0:11:22 | in two years out yeah |
---|
0:11:23 | are are a a a remote |
---|
0:11:26 | a a also as N increases as we use high get an a and all or |
---|
0:11:33 | i the number of like n-grams decreases |
---|
0:11:35 | in this this table we can see |
---|
0:11:37 | how many how many like n-grams grams |
---|
0:11:39 | we can find a as we change in all but all that from three to set and you can see |
---|
0:11:44 | that when we |
---|
0:11:45 | get |
---|
0:11:46 | the most free bins |
---|
0:11:47 | saving grams |
---|
0:11:48 | or or that around |
---|
0:11:49 | only |
---|
0:11:50 | twelve of them P |
---|
0:11:52 | so we select a seven as the high guest or that for |
---|
0:11:55 | and |
---|
0:11:58 | this table you can see the ross rules in the probability dimension on a too |
---|
0:12:02 | for two sec select you on uh leads |
---|
0:12:05 | a hundred thoughts and of thirty |
---|
0:12:07 | thirty thousand |
---|
0:12:08 | you're scene |
---|
0:12:09 | from three i'm to after |
---|
0:12:12 | two seven |
---|
0:12:12 | seven for order |
---|
0:12:14 | and |
---|
0:12:15 | uh as you can see |
---|
0:12:17 | the the right a a a a a a with a four and then and |
---|
0:12:21 | C runs |
---|
0:12:22 | a once a a that be from from and |
---|
0:12:25 | grams |
---|
0:12:26 | a with a five six and seven |
---|
0:12:28 | they that the on the the they have a a a a a the from |
---|
0:12:33 | the four gram |
---|
0:12:34 | system anyway |
---|
0:12:35 | a good day the be good |
---|
0:12:37 | a wouldn't you know was that |
---|
0:12:39 | uh uh even the rest what not bad that they were not what's |
---|
0:12:43 | the i mean the are somehow this stable |
---|
0:12:46 | a |
---|
0:12:46 | that they have |
---|
0:12:47 | quite a a big not in a quite big K |
---|
0:12:50 | a number of |
---|
0:12:51 | i or other |
---|
0:12:52 | and in the east |
---|
0:12:54 | or we can collect |
---|
0:12:56 | eighteen |
---|
0:12:57 | eighteen thousand |
---|
0:13:00 | so uh |
---|
0:13:02 | i would try to |
---|
0:13:04 | fine is my presentation |
---|
0:13:05 | we present that that the a for its a mixed in with a a a a a has been proposed |
---|
0:13:11 | so i has some i don't make fits a selection with the that has been proposed which of was to |
---|
0:13:16 | perform phonotactic being based language recognition |
---|
0:13:18 | with a high or that and are |
---|
0:13:21 | or from an improvements |
---|
0:13:22 | a a with a route to the baseline trigram svm system have been reported in experiments are on there |
---|
0:13:27 | so those and seven used competition but the base |
---|
0:13:30 | when when uh applying the proposed a in to select the most frequent units up to four five six |
---|
0:13:36 | and seven |
---|
0:13:38 | a is from our was obtained |
---|
0:13:40 | when selecting that |
---|
0:13:41 | a |
---|
0:13:42 | hundred thousand most frequent units up to five grams |
---|
0:13:46 | would you lead and need what or a great improvement of eleven percent |
---|
0:13:49 | with a to that these than three gram system |
---|
0:13:52 | i we are currently working on the evaluation of as smart the selection criteria on the |
---|
0:13:57 | these |
---|
0:13:58 | uh approach |
---|
0:13:59 | so |
---|
0:14:00 | that's so on |
---|
0:14:02 | a thank you have |
---|
0:14:09 | question |
---|
0:14:10 | for |
---|
0:14:12 | uh |
---|
0:14:15 | that |
---|
0:14:16 | i |
---|
0:14:16 | or |
---|
0:14:20 | a |
---|
0:14:22 | from |
---|
0:14:27 | we |
---|
0:14:33 | i think what we known as was that with each lower or or gram you had a different dynamic range |
---|
0:14:39 | i was wondering if you |
---|
0:14:40 | tried to be scaled them differently or |
---|
0:14:42 | or if them separately or something that |
---|
0:14:45 | yeah |
---|
0:14:46 | right |
---|
0:14:47 | oh |
---|
0:14:47 | no |
---|
0:14:48 | yeah |
---|
0:14:48 | yeah |
---|
0:14:50 | yeah |
---|
0:14:51 | uh |
---|
0:14:54 | yeah |
---|
0:14:55 | yeah |
---|
0:14:57 | i |
---|
0:14:59 | don't |
---|
0:15:02 | right |
---|
0:15:03 | just leave them on the vector |
---|
0:15:04 | yeah |
---|
0:15:05 | yeah |
---|
0:15:05 | oh |
---|
0:15:07 | ah |
---|
0:15:10 | a |
---|
0:15:14 | okay yeah |
---|
0:15:16 | i |
---|
0:15:17 | and |
---|
0:15:18 | and |
---|
0:15:19 | this one |
---|
0:15:20 | and |
---|
0:15:22 | and |
---|
0:15:23 | uh |
---|
0:15:24 | but |
---|
0:15:26 | we |
---|
0:15:28 | i |
---|
0:15:31 | oh okay |
---|
0:15:32 | a |
---|
0:15:38 | thank you |
---|
0:15:39 | yeah |
---|
0:15:43 | one iteration of me |
---|
0:15:45 | this one or the other |
---|
0:15:46 | that |
---|
0:15:47 | well |
---|
0:15:48 | well |
---|
0:15:49 | you |
---|
0:15:50 | phonetic |
---|
0:15:51 | for |
---|
0:15:53 | i |
---|
0:15:54 | yeah |
---|
0:15:55 | oh |
---|
0:15:56 | because general |
---|
0:15:57 | yeah |
---|
0:15:58 | i |
---|
0:15:59 | that |
---|
0:16:00 | that |
---|
0:16:01 | were |
---|
0:16:03 | but |
---|
0:16:05 | i |
---|
0:16:05 | uh_huh |
---|
0:16:06 | yeah true |
---|
0:16:08 | sure but H M |
---|
0:16:10 | we have somehow |
---|
0:16:12 | sent in politics but |
---|
0:16:14 | i |
---|
0:16:16 | maybe i |
---|
0:16:17 | i i and the so you pressed to the you mean we |
---|
0:16:19 | right |
---|
0:16:19 | here's |
---|
0:16:20 | acoustics is that we have more say |
---|
0:16:24 | the |
---|
0:16:25 | three |
---|
0:16:26 | looking at |
---|
0:16:27 | a |
---|
0:16:27 | but |
---|
0:16:28 | i |
---|
0:16:28 | and all these no baseline |
---|
0:16:30 | i |
---|
0:16:32 | oh |
---|
0:16:33 | that |
---|
0:16:34 | station |
---|
0:16:35 | yes think that that work |
---|
0:16:37 | but i |
---|
0:16:38 | a a a a a a a a sure if we we use for |
---|
0:16:41 | one it and the X i mean |
---|
0:16:43 | like |
---|
0:16:45 | anyway way i think somehow some something this |
---|
0:16:48 | so how you see to |
---|
0:16:50 | i |
---|
0:16:50 | that uh |
---|
0:16:52 | thinks is |
---|
0:16:53 | you |
---|
0:16:53 | when you are pop used in and in in a in uh |
---|
0:16:57 | in a a a a special uh |
---|
0:16:59 | a a a a a out of you for all of them |
---|
0:17:02 | for the rest of |
---|
0:17:04 | the thing |
---|