0:00:21 | um so hmmm coding everybody so um |
---|
0:00:25 | my my is that's the most |
---|
0:00:26 | and the result in junior at three yeah |
---|
0:00:30 | and the work i'm bring to present you uh as been then by one of my critiques scheme we we |
---|
0:00:35 | who is associate professor |
---|
0:00:36 | at at key uh on the and all of the |
---|
0:00:40 | and the uh i set |
---|
0:00:43 | um |
---|
0:00:45 | the problem we are or |
---|
0:00:47 | a thinking this is a work |
---|
0:00:48 | is the acoustic-to-articulatory inversion |
---|
0:00:51 | and we propose to use a a a a new model in this domain |
---|
0:00:54 | uh which is a and they present in my |
---|
0:00:58 | so here is the the of my to work |
---|
0:01:00 | um in the first part uh i'm going to briefly present you |
---|
0:01:04 | uh |
---|
0:01:05 | what is the problem of the uh acoustic to a to mean person |
---|
0:01:09 | uh also um |
---|
0:01:11 | or brief presentation of the it is a tick mapping |
---|
0:01:14 | and uh the motivation of |
---|
0:01:15 | is |
---|
0:01:17 | uh then i we propose you the um present you the the proposed approach |
---|
0:01:21 | so uh which we call the the not keep it had memory |
---|
0:01:25 | and these but be followed by a a compact addition |
---|
0:01:29 | before the completion |
---|
0:01:33 | so um |
---|
0:01:35 | what's do acoustic don't good to mission problem uh and the is to recover |
---|
0:01:40 | the uh articulatory gestures |
---|
0:01:42 | from a uh a a speech you |
---|
0:01:45 | uh this is a of an interesting problem because many application can take and H |
---|
0:01:49 | uh of the |
---|
0:01:51 | knowledge about the articulatory |
---|
0:01:53 | such as uh a language learning |
---|
0:01:55 | speech directly or also speech recognition |
---|
0:01:59 | this is an interesting problem but also a very difficult why |
---|
0:02:02 | because this problem |
---|
0:02:03 | uh use i D uh a nonlinear |
---|
0:02:05 | and uh |
---|
0:02:07 | the mapping between the acoustic to the after three space |
---|
0:02:11 | uh is it and then you |
---|
0:02:15 | so uh we think that um in fact the dynamics |
---|
0:02:19 | at the at very then a mix can and to us sold |
---|
0:02:22 | uh i is partially |
---|
0:02:23 | the non-uniqueness uniqueness of the solution |
---|
0:02:26 | because uh |
---|
0:02:28 | the the dynamics |
---|
0:02:30 | uh accounts for uh |
---|
0:02:32 | some that when only effect |
---|
0:02:33 | uh such as the quad addition |
---|
0:02:35 | um |
---|
0:02:36 | is a control so for the physical property of the a greater |
---|
0:02:40 | such as the a ct the last |
---|
0:02:43 | uh the degree of freedom |
---|
0:02:45 | and also it accounts uh |
---|
0:02:48 | for the twenty teaching |
---|
0:02:49 | that the uh speaker use the are |
---|
0:02:51 | a a to a good choice it |
---|
0:02:56 | um |
---|
0:02:57 | so what about the that modeling um |
---|
0:03:00 | in the like it's like their linguistic many works |
---|
0:03:03 | uh a a a three |
---|
0:03:04 | uh a on the existence of a if or you know |
---|
0:03:07 | in fact this is a a part of brand |
---|
0:03:10 | uh where we encode code uh in the uh |
---|
0:03:13 | events |
---|
0:03:14 | we experience in or like |
---|
0:03:16 | and this uh |
---|
0:03:18 | uh |
---|
0:03:18 | experience uh uh a a are good the uh you into it is that |
---|
0:03:23 | i can you retrieved |
---|
0:03:24 | uh at any time |
---|
0:03:26 | and they are they are maybe that's is that's we use the order to may be speech processing |
---|
0:03:31 | and uh in fact you can uh retrieve a fast if you that you know that to interpret present events |
---|
0:03:37 | and also to um |
---|
0:03:39 | uh |
---|
0:03:41 | two |
---|
0:03:42 | and to speak uh you we knew |
---|
0:03:45 | for |
---|
0:03:47 | oh so they it but it can be uh or we use the uh in a |
---|
0:03:51 | i think to to to of speech uh processing |
---|
0:03:55 | uh us just the speech recognition |
---|
0:03:57 | so we don't be based speech recognition and also |
---|
0:04:00 | uh we've uh a speech and this is |
---|
0:04:02 | uh uh uh we've unit addition |
---|
0:04:05 | which can you also uh seen as a |
---|
0:04:08 | so um |
---|
0:04:09 | this model |
---|
0:04:11 | it's models or |
---|
0:04:12 | are in fact a |
---|
0:04:15 | i yeah collections of uh acoustic tradition of a lexical units |
---|
0:04:21 | we can be phones life on sites say to votes on word |
---|
0:04:25 | and uh most of the time this uh |
---|
0:04:28 | a it is that are are this try uh i as uh i'm i'm the uh acoustic frequencies |
---|
0:04:33 | and uh we've contextual information |
---|
0:04:36 | um |
---|
0:04:39 | the results of the |
---|
0:04:41 | this model uh |
---|
0:04:43 | for both speech recognition and speech and these are uh most of the time expressed |
---|
0:04:47 | uh as a concatenation of it that |
---|
0:04:50 | and he's can get and we should uh best explains |
---|
0:04:53 | the |
---|
0:04:54 | input seen your signal for speech recognition |
---|
0:04:57 | but a put to the the input speech you know would be uh describe a sequence of it is that |
---|
0:05:02 | and for speech in |
---|
0:05:03 | uh this it i and that's use we the also express i to comp condition |
---|
0:05:08 | of of |
---|
0:05:09 | so |
---|
0:05:10 | uh i i call these uh are sure uh |
---|
0:05:14 | was the decay uh a memory as compared to |
---|
0:05:18 | so let's go back |
---|
0:05:19 | to do or or from problem which is the the |
---|
0:05:21 | acoustic but going there's |
---|
0:05:24 | so uh |
---|
0:05:25 | because is can is attractive for this problem for uh to reason |
---|
0:05:29 | the first one it that's it relies on uh all sir |
---|
0:05:32 | uh synchronized acoustic and articulatory data |
---|
0:05:35 | so we don't at to form a any assumption about a mapping function |
---|
0:05:39 | uh the second uh it that each it's that's to get three dynamics are these of we think it is |
---|
0:05:45 | that |
---|
0:05:45 | and and then was to solve |
---|
0:05:47 | the problem of the than unity you |
---|
0:05:51 | um um or were there is um |
---|
0:05:54 | maybe |
---|
0:05:55 | more practical problem than uh |
---|
0:05:57 | uh to record problem |
---|
0:05:59 | um i mean |
---|
0:06:00 | a if we consider speech recognition and speech in |
---|
0:06:03 | um |
---|
0:06:05 | the not being is a from continuous space from a discrete space |
---|
0:06:08 | for speech recognition so we try to map and acoustic signal to a sequence of |
---|
0:06:13 | lexicon |
---|
0:06:14 | the speech and this |
---|
0:06:15 | try to map |
---|
0:06:16 | uh |
---|
0:06:17 | the sequence of lexical units |
---|
0:06:18 | so that's a phone type one |
---|
0:06:20 | two and a |
---|
0:06:23 | but if you can see that the uh i did not that patch the prime used |
---|
0:06:27 | the mapping is between two |
---|
0:06:29 | continues space |
---|
0:06:32 | so um |
---|
0:06:34 | usually usually for speech cushion speech and this the memory are based on uh |
---|
0:06:39 | let's say a a few of words of to tens of a words of speech |
---|
0:06:43 | uh to have a uh reason it one uh press |
---|
0:06:47 | but uh |
---|
0:06:49 | the a or are uh of uh board |
---|
0:06:52 | for uh we articulatory in information are very sport for now |
---|
0:06:56 | uh |
---|
0:06:58 | pixel out have a few minutes |
---|
0:07:00 | or uh |
---|
0:07:01 | at most |
---|
0:07:02 | two tenths of |
---|
0:07:03 | and that's this |
---|
0:07:04 | uh |
---|
0:07:05 | small amount of data |
---|
0:07:07 | uh can at cover |
---|
0:07:08 | us to efficiently |
---|
0:07:09 | uh well the |
---|
0:07:11 | evaluation in the |
---|
0:07:13 | the uh |
---|
0:07:15 | acoustic and articulatory space |
---|
0:07:23 | so um |
---|
0:07:25 | we propose to um |
---|
0:07:28 | to frank |
---|
0:07:29 | for two to combine uh the the bit about it is that and uh this combination |
---|
0:07:34 | uh uh we'll be based on the look similar i between these it is that |
---|
0:07:39 | uh this way of combining it use that can uh produce |
---|
0:07:44 | and seen a uh are that we trajectory |
---|
0:07:46 | and can uh |
---|
0:07:47 | bit there are or a nice about the |
---|
0:07:50 | that these we can |
---|
0:07:51 | the memory will be able to produce variation of fixed |
---|
0:07:56 | so |
---|
0:07:57 | a here is a a a a a a a very basic example just to illustrate uh what i mean |
---|
0:08:01 | by combining it |
---|
0:08:02 | so just consider a |
---|
0:08:04 | a very simple like and pro problem |
---|
0:08:06 | and just a that i give you this letter and and |
---|
0:08:10 | uh |
---|
0:08:11 | ask you |
---|
0:08:12 | two are try to to solve this problem |
---|
0:08:14 | and we think uh only a to six |
---|
0:08:18 | and |
---|
0:08:19 | image that you to fine to to try |
---|
0:08:23 | uh within in this that you hand |
---|
0:08:24 | uh the the um |
---|
0:08:26 | the red one and a two one |
---|
0:08:28 | and after that |
---|
0:08:29 | i can ask you could you |
---|
0:08:31 | a a give me or their solution to do so |
---|
0:08:35 | and we get |
---|
0:08:38 | i see three point point |
---|
0:08:41 | uh let's say the some sort of a real E |
---|
0:08:44 | so from the to previously five |
---|
0:08:46 | uh |
---|
0:08:47 | trajectory |
---|
0:08:48 | uh we think the like and we can find a what of want |
---|
0:08:52 | name |
---|
0:08:54 | i and |
---|
0:08:56 | and |
---|
0:08:56 | yeah |
---|
0:08:57 | models |
---|
0:08:59 | but this is a very basic problem and a is only spatial |
---|
0:09:03 | and and of course |
---|
0:09:04 | here we don't have to do with a a for and uh to mention |
---|
0:09:08 | uh |
---|
0:09:11 | a a a a a a a solution |
---|
0:09:13 | um |
---|
0:09:14 | so |
---|
0:09:16 | here right spend oh i bits my memory um |
---|
0:09:20 | we consider a it is that as a a sequence use of synchronized acoustic and country three observation |
---|
0:09:25 | uh and uh the consider leads you can it is the phone |
---|
0:09:28 | were |
---|
0:09:30 | so |
---|
0:09:31 | what |
---|
0:09:32 | um |
---|
0:09:33 | do we consider are local but i T so |
---|
0:09:35 | see uh look uh local also T |
---|
0:09:37 | is uh |
---|
0:09:40 | to uh are similar are good we can gosh which a pure at |
---|
0:09:45 | so you know times |
---|
0:09:46 | so not instance |
---|
0:09:47 | during the addition of a given for |
---|
0:09:51 | so you have to do with to uh time mention |
---|
0:09:54 | the first one to tom they mention |
---|
0:09:56 | and the second one is to spatial |
---|
0:09:57 | image |
---|
0:09:58 | oh so we use uh a the D U W uh i i've to uh |
---|
0:10:03 | did with temporal dimension |
---|
0:10:05 | and we you also if the and |
---|
0:10:07 | not the to uh |
---|
0:10:08 | make the |
---|
0:10:10 | the mapping |
---|
0:10:11 | uh a symmetry |
---|
0:10:13 | and to be able to compare different uh |
---|
0:10:16 | uh distance |
---|
0:10:17 | between it is that |
---|
0:10:20 | and uh uh also be talk or constraint uh a a low to uh control the |
---|
0:10:25 | distortion that time distortion |
---|
0:10:27 | a a of the at |
---|
0:10:29 | um |
---|
0:10:30 | for for special to a similar P uh let's consider |
---|
0:10:33 | uh |
---|
0:10:34 | the plots on the bottom right corner |
---|
0:10:37 | um uh uh just say that it's the a trajectory of one of one at late or |
---|
0:10:41 | and just consider the at a time |
---|
0:10:43 | the |
---|
0:10:45 | uh the position of position X uh X i |
---|
0:10:49 | and we uh just say that X i plus one it's the natural |
---|
0:10:54 | a a a a a a target of uh X I |
---|
0:10:57 | and we just |
---|
0:10:58 | make this |
---|
0:10:59 | the following estimation |
---|
0:11:00 | um |
---|
0:11:01 | that's X i plus one would have been is found |
---|
0:11:05 | uh without that a significant impact uh on the uh a a a a quiz |
---|
0:11:10 | so we define uh |
---|
0:11:12 | when in the divide |
---|
0:11:13 | uh a their center of around uh X Y this one |
---|
0:11:17 | and we just uh say that any uh uh got three configuration |
---|
0:11:23 | uh uh within this into value |
---|
0:11:24 | can be uh |
---|
0:11:26 | consider a a similar |
---|
0:11:27 | to uh X Y |
---|
0:11:33 | so |
---|
0:11:34 | um |
---|
0:11:35 | lets consider two to it is that now |
---|
0:11:37 | um |
---|
0:11:39 | oh a given for so |
---|
0:11:41 | that's say for example to uh acoustic and articulatory or a addition of the the phone G or |
---|
0:11:48 | know |
---|
0:11:49 | um |
---|
0:11:50 | don't um |
---|
0:11:52 | let |
---|
0:11:53 | uh see uh uh oh oh to beats uh |
---|
0:11:56 | the genetic thing |
---|
0:11:57 | so |
---|
0:11:58 | we just check before or |
---|
0:12:01 | before that uh X and Y are similar enough |
---|
0:12:04 | uh because |
---|
0:12:05 | uh uh to a realisation of uh |
---|
0:12:08 | some uh all |
---|
0:12:10 | uh can be quite different |
---|
0:12:12 | uh because some to get or on a not critical for for four |
---|
0:12:19 | um |
---|
0:12:20 | so we we we map uh first uh |
---|
0:12:23 | let's say it is that uh a want to the if is that X |
---|
0:12:27 | uh i've represent the the the a line observation |
---|
0:12:31 | we've the got collides |
---|
0:12:33 | so the right one |
---|
0:12:36 | oh |
---|
0:12:38 | okay okay |
---|
0:12:39 | two |
---|
0:12:43 | um |
---|
0:12:44 | so i it just to like that uh from to a it is the |
---|
0:12:47 | uh the genetic memory can things |
---|
0:12:50 | uh uh at the bottom of to grow uh of the figure |
---|
0:12:53 | as you can see |
---|
0:12:54 | uh eight |
---|
0:12:55 | that |
---|
0:12:55 | through good it it is so the memory is able to produce a |
---|
0:13:00 | from a a a two if is that eight uh |
---|
0:13:03 | it it is that which are uh up a battery uh for for from a a a a a three |
---|
0:13:07 | point of view |
---|
0:13:08 | uh |
---|
0:13:10 | a but it and can uh and that |
---|
0:13:14 | oh so the emission consist in the so so the chance you marie |
---|
0:13:18 | uh is an oriented graph |
---|
0:13:20 | so each node is the |
---|
0:13:22 | uh |
---|
0:13:23 | synchronized acoustic and at the target vision |
---|
0:13:26 | and the it is a the a load of uh a transition |
---|
0:13:29 | did from the |
---|
0:13:31 | a preceding a mapping from uh uh and it was that |
---|
0:13:34 | and know that |
---|
0:13:36 | and the emission in finding in the this draft |
---|
0:13:39 | uh |
---|
0:13:40 | the |
---|
0:13:41 | the path which best matching |
---|
0:13:43 | but matches the |
---|
0:13:45 | uh |
---|
0:13:47 | the input uh acoustic to be birds |
---|
0:13:51 | and uh of course don't to great gesture |
---|
0:13:53 | uh uh is the right from the to get three component of each node |
---|
0:13:59 | so um |
---|
0:14:00 | for the edition we have compared uh |
---|
0:14:02 | uh the memory yeah that's going |
---|
0:14:04 | we a concatenative in and we will could look bad uh this approach |
---|
0:14:09 | we the me call uh uh a a constraint |
---|
0:14:12 | um he is the cup are we use more got |
---|
0:14:15 | uh uh which contains two speakers and made and a female |
---|
0:14:19 | uh the which is english and uh |
---|
0:14:21 | we use a a more are you seven seven colours |
---|
0:14:24 | uh are two on the the lips |
---|
0:14:26 | on the low once he's are that don't keep the don't body |
---|
0:14:29 | a of some and the |
---|
0:14:31 | and we use also a french corpus we have recorded |
---|
0:14:33 | not your got |
---|
0:14:35 | a uh we don't use the uh we don't fix the code |
---|
0:14:38 | a a on the vet on but uh on the the route |
---|
0:14:41 | that |
---|
0:14:44 | um |
---|
0:14:46 | okay okay |
---|
0:14:47 | that |
---|
0:14:49 | and the would do that |
---|
0:14:51 | evaluation efficient um |
---|
0:14:52 | off the is to a uh trajectory |
---|
0:14:55 | uh are based on that would mean square or and the P which can you five this to me like |
---|
0:14:59 | and synchrony between to |
---|
0:15:01 | a accounts and it's meeting up to a that we |
---|
0:15:06 | so you are the results |
---|
0:15:08 | um |
---|
0:15:10 | do the red about isn't the codebook book uh a results |
---|
0:15:13 | the blue the concatenative memory and the green bar |
---|
0:15:15 | does not memory and |
---|
0:15:17 | we can observe to same uh improvement trend |
---|
0:15:20 | uh over all the three corpus |
---|
0:15:22 | so over the two language language which use um over the three speaker |
---|
0:15:26 | uh |
---|
0:15:27 | that memory uh a always perform |
---|
0:15:30 | the competitive memory and the could be |
---|
0:15:33 | and uh graph can five the probability of movement |
---|
0:15:37 | so we can expect an improvement |
---|
0:15:38 | between five and and percent |
---|
0:15:40 | with an eight nine person computer |
---|
0:15:42 | uh |
---|
0:15:43 | for the gmm over the seem am and uh |
---|
0:15:46 | between ten and fifteen points some |
---|
0:15:48 | uh for this unit level |
---|
0:15:53 | here is you a uh |
---|
0:15:55 | a |
---|
0:15:55 | so uh as you can see the could to write very jerky trajectory |
---|
0:16:00 | why the |
---|
0:16:01 | um it to dig memories |
---|
0:16:03 | uh provide us with the uh |
---|
0:16:05 | trajectory |
---|
0:16:06 | because it i it it's better model |
---|
0:16:10 | so it correspond to the movement of them |
---|
0:16:12 | a along the at that X |
---|
0:16:15 | for the french and sure |
---|
0:16:16 | she's to can "'cause" extreme the boss |
---|
0:16:21 | okay that uh |
---|
0:16:22 | a compile the the is you the of the or results |
---|
0:16:25 | uh we can say that |
---|
0:16:27 | we have uh |
---|
0:16:28 | reason able good performance |
---|
0:16:30 | uh for example i |
---|
0:16:32 | i propose to some uh |
---|
0:16:34 | machine learning algorithm |
---|
0:16:37 | uh which have been proved over something to based and uh are we can see that |
---|
0:16:40 | the uh mean square and price all between |
---|
0:16:43 | a a one point four and one once |
---|
0:16:47 | but um |
---|
0:16:48 | a would have reported in article that uh |
---|
0:16:51 | do uh articulatory data acquisition is a a a all is about |
---|
0:16:56 | zero point for me to also |
---|
0:16:57 | we can just say that a a okay |
---|
0:16:59 | uh we have different uh method but |
---|
0:17:02 | maybe |
---|
0:17:03 | as we don't share exactly the same process |
---|
0:17:05 | thing uh |
---|
0:17:06 | that that process |
---|
0:17:07 | and uh because of the uh that the position error |
---|
0:17:10 | we are more |
---|
0:17:12 | and |
---|
0:17:14 | so |
---|
0:17:15 | um |
---|
0:17:16 | we propose a a a a not to because the be marie so this model is uh interesting because |
---|
0:17:21 | it does not require a it since assumption about the mapping function |
---|
0:17:24 | uh the memory is able to uh on but the dynamic |
---|
0:17:30 | and uh |
---|
0:17:32 | it is a also so to produce and seen to uh gesture and just can should are a i about |
---|
0:17:37 | it |
---|
0:17:38 | so |
---|
0:17:39 | for a future work |
---|
0:17:40 | uh we're focusing on the use of more reviews distance because for |
---|
0:17:45 | uh |
---|
0:17:45 | this where we have used the a to the end distance of the acoustic space and |
---|
0:17:50 | G |
---|
0:17:51 | these distance is known that to be |
---|
0:17:53 | uh robust |
---|
0:17:54 | for the |
---|
0:17:56 | we like was of to they can do |
---|
0:17:58 | the can uh the correlation between the articulators |
---|
0:18:01 | because that bit does can compensate can each with the |
---|
0:18:04 | and uh |
---|
0:18:06 | we think this |
---|
0:18:07 | uh |
---|
0:18:08 | correlation can add to get for that |
---|
0:18:11 | uh a like was to to uh move from uh |
---|
0:18:15 | a pure phonetic segmentation |
---|
0:18:17 | during the |
---|
0:18:18 | the building of the memory |
---|
0:18:19 | to uh |
---|
0:18:20 | but not cry just based uh |
---|
0:18:23 | that tension should propose or something but i uh i don't think used |
---|
0:18:27 | and finally can uh |
---|
0:18:29 | proceed |
---|
0:18:30 | or to get further improvement local the application |
---|
0:18:33 | uh because the memory is able to produce new trajectories but face |
---|
0:18:38 | uh two |
---|
0:18:39 | uh |
---|
0:18:40 | precisely map uh an acoustic frame it is uh |
---|
0:18:44 | in to the up that i've got made if |
---|
0:18:47 | uh |
---|
0:18:48 | synchronise of solution um |
---|
0:18:52 | thank |
---|
0:18:53 | i you i |
---|
0:18:58 | we have time about the question |
---|
0:19:08 | i and that's just one thing linear and it seems to me there is room for combining the codebook book |
---|
0:19:12 | and the chance model and that the codebook book be some kind of a starting trajectory arrears |
---|
0:19:18 | i i was T is a possible to come by the codebook book at the channel to model so the |
---|
0:19:22 | codebook book stuff as you are |
---|
0:19:24 | yeah no initialization annotation so to speak are |
---|
0:19:27 | it's |
---|
0:19:28 | yeah i think um |
---|
0:19:29 | space |
---|
0:19:31 | and i to the search or would that be computationally to |
---|
0:19:34 | expense |
---|
0:19:37 | oh |
---|
0:19:37 | i in the memory it's |
---|
0:19:39 | it's uh |
---|
0:19:40 | and is that as a kind of code |
---|
0:19:42 | it's |
---|
0:19:42 | it's much data could because |
---|
0:19:44 | uh |
---|
0:19:45 | we have to dump for information within the memory |
---|
0:19:48 | uh this is and see that the could |
---|
0:19:50 | uh |
---|
0:19:51 | missus |
---|
0:20:03 | but |
---|
0:20:04 | okay so thank you again |
---|