0:00:15 | finally we one and sensual buttoning this presentation |
---|
0:00:18 | and my value all the in domain presenting the work with the initial be and |
---|
0:00:22 | enabling the |
---|
0:00:24 | about the unsupervised domain adaptation of a language identification just and with the goal of |
---|
0:00:30 | being robust to transmit junction |
---|
0:00:33 | in this work was that is a problem of language identification something transmit change and |
---|
0:00:39 | then which has not been a perceptron training of this just an we you and |
---|
0:00:43 | a bit data from this target transmission channel |
---|
0:00:47 | this problem is cold unsupervised domain adaptation |
---|
0:00:51 | we propose to either regularization loss functions of the classification as function of the embedding |
---|
0:00:57 | extract all during its training |
---|
0:00:59 | you in this presentation we first define the task of unsupervised domain adaptation for language |
---|
0:01:06 | identification thing |
---|
0:01:08 | then we describe the proposed method of regularization optimizing extract all |
---|
0:01:14 | and finally we present our experiments and rate |
---|
0:01:18 | so first task open supervised them in the station for language and showing |
---|
0:01:24 | we use just on down language identification just then based on the egg rolls |
---|
0:01:29 | this is then is constituted of three bouts first within a feature extractor always aim |
---|
0:01:35 | to extract frame of a feature |
---|
0:01:38 | it is a stack of nine inch work which i've been trained to pretty tri-phone |
---|
0:01:44 | and |
---|
0:01:46 | frame level and buildings are extracted completely in and they are used as input of |
---|
0:01:51 | technical extra so |
---|
0:01:53 | the exact like spectral used and urinate well discriminatively trained |
---|
0:01:58 | pretty |
---|
0:01:58 | language and there's |
---|
0:02:01 | we extracted a segment of it and beating funds is known it well and finally |
---|
0:02:07 | a language classifier your question rigid of dimension reduction and support vector machine |
---|
0:02:14 | corners is a scroll for each target language |
---|
0:02:19 | we train such a system on the corpus thus we lose to in this in |
---|
0:02:25 | g religious |
---|
0:02:26 | the contain five languages are be english farsi actual and all |
---|
0:02:32 | we have recordings for this five languages online transmissions and then |
---|
0:02:37 | fast but telephone and then |
---|
0:02:39 | and eight radio channels so now unless we sent to if you find a frost |
---|
0:02:45 | a telephone recording |
---|
0:02:53 | now and feature file |
---|
0:03:01 | speech if |
---|
0:03:07 | and you which |
---|
0:03:14 | as you may have only this byers on the original files present a very difficult |
---|
0:03:20 | noise and distortion characteristics so is this is a real challenge for domain adaptation |
---|
0:03:28 | during just |
---|
0:03:29 | well we knew stress again is reaching lines for both training and testing of this |
---|
0:03:35 | just |
---|
0:03:36 | our first work |
---|
0:03:38 | whereas to investigate the domain mismatch issue with the corpus so we trained a language |
---|
0:03:44 | identification system for each of the nine transmit ranch and then |
---|
0:03:49 | and it corresponds to the rows of this or and we they did it justice |
---|
0:03:54 | them on the nine transmit ranch and there's also this it |
---|
0:03:59 | so first |
---|
0:04:01 | on the diagonal we have the performance and the matched conditions and we had shamanic |
---|
0:04:06 | whatever weight and ranging between c and fifteen percent and near acceptable performance |
---|
0:04:15 | i was side of the diagonal when we test and the channel which has not |
---|
0:04:20 | been observed during training we examine the you performance |
---|
0:04:24 | so |
---|
0:04:25 | it means that sent domain mismatch is a real issue is disgusting |
---|
0:04:30 | conversely |
---|
0:04:32 | on the last nine we train a system with that the of the nine transmissions |
---|
0:04:37 | and then and we are two would performance on all channels meaning that the |
---|
0:04:43 | okay to intensification system has the capacity |
---|
0:04:49 | got one where on all channels and the problem into that there's observed during training |
---|
0:04:56 | the goal of a word is to improve performance outside of the diagonal is the |
---|
0:05:02 | better |
---|
0:05:02 | without using a novel data from as a target |
---|
0:05:07 | some speech engines |
---|
0:05:09 | so this problem score and supervised a minute shouldn't |
---|
0:05:13 | where domains corresponding transmission channels |
---|
0:05:16 | so |
---|
0:05:17 | we have a soul domain code s we live in that x is yelling zero |
---|
0:05:23 | recordings and what is the corresponding language of data and we have an evident that |
---|
0:05:29 | a form a target domain that's |
---|
0:05:33 | or not |
---|
0:05:33 | is it which she would language identification performance on the target |
---|
0:05:40 | so now we describe |
---|
0:05:43 | our method for unsupervised domain adaptation which is weaker an action of the meeting extra |
---|
0:05:48 | though |
---|
0:05:50 | a lot of unsupervised domain adaptation methods are based on a very simple idea of |
---|
0:05:57 | making distribution of representations of both domain can you know |
---|
0:06:03 | and this the in domain by using only unlabeled data by aligning distribution of representation |
---|
0:06:10 | then with this is similar representation you can train a classifier always novel data from |
---|
0:06:17 | the source the men and if so presentation a invariant between domain |
---|
0:06:22 | so this if you're we also achieve a good performance on target domain so this |
---|
0:06:26 | is a data gram to understand this idea you have no leverage that performance on |
---|
0:06:32 | the menu are able to train a fist fight you're but there's not that from |
---|
0:06:36 | where on an unseen target domain |
---|
0:06:39 | consequently we use an evident that the proposed a man to wrong |
---|
0:06:44 | a space of for presentation well representations of was domain have the same distribution consequently |
---|
0:06:52 | if a classifier is trained on the source the main in we also well where |
---|
0:06:57 | on the target |
---|
0:06:59 | so now of the question is |
---|
0:07:03 | where we and false invariance of the war presentation within the language identification then |
---|
0:07:09 | and |
---|
0:07:11 | idea is to apply to the expect all seems natural since it is |
---|
0:07:16 | a representation with language information directly extracted for an ornate well trained pretty language |
---|
0:07:26 | so i'll make the two |
---|
0:07:29 | creates a dominion valiant expect a used to add a domain adaptation regularization as function |
---|
0:07:35 | to address function of the embedding it's like well so classical used everything is what |
---|
0:07:40 | is trained as a classification both sharon |
---|
0:07:43 | accompanied the core sample |
---|
0:07:46 | which is always fun |
---|
0:07:48 | that's recover those functions that takes a lot in that exactly what s phone that's |
---|
0:07:53 | holding |
---|
0:07:55 | we added to this post function and a regularization them and all |
---|
0:08:01 | is that as to make the l and invariance between that just distribution of a |
---|
0:08:06 | collect all four wheels domain here but in the band are wrong down where there |
---|
0:08:10 | is a compromise between invariance of the work right annotation between them and |
---|
0:08:15 | and we classification performance |
---|
0:08:17 | on this will them |
---|
0:08:19 | also regularization and thus we decided to use the maximum and disturbance |
---|
0:08:25 | so the maximum discrepancy |
---|
0:08:27 | is a development function that correspond |
---|
0:08:30 | to the supremum |
---|
0:08:32 | of for the difference between the average of for function |
---|
0:08:36 | overall was domains |
---|
0:08:38 | well as experiments they can or well as basis function hate |
---|
0:08:42 | if h is a the unit ball of our policing john it can be all |
---|
0:08:48 | space |
---|
0:08:49 | as maximum mean discrepancy |
---|
0:08:52 | is the expectation of |
---|
0:08:55 | channel values of phones embedded was domain |
---|
0:08:58 | and it did me estimating the and with unit simple |
---|
0:09:03 | so we i mean you bash during training of the system |
---|
0:09:06 | we do exactly that doing training of for each mini batch we compute the maximum |
---|
0:09:13 | mean discrepancy on different better and we idea sets of the classification mass function |
---|
0:09:19 | in this well we use a good friend got an utterance define the space of |
---|
0:09:24 | functions |
---|
0:09:28 | we compare this murder the of reproduction of the n binning extract all to javabayes |
---|
0:09:33 | them in addition we don't call correlation i the main corridor |
---|
0:09:39 | j g of a to javabayes domain adaptation method is to transform representation of the |
---|
0:09:45 | source domain to make then most similar to the target domain is then |
---|
0:09:50 | train |
---|
0:09:51 | the following blocks of this just an |
---|
0:09:55 | with every that the from this whole domain that high in transform |
---|
0:09:59 | and then applied this case if you all on the target domain |
---|
0:10:03 | but correlation alignments a transformation to make sure the testing now targeted at that is |
---|
0:10:09 | a matrix multiplication with the goal of making covariance matrices of was the mainstreaming |
---|
0:10:16 | we apply this make the two |
---|
0:10:18 | to |
---|
0:10:20 | but use of this is then |
---|
0:10:21 | the exact like select all so |
---|
0:10:23 | we |
---|
0:10:24 | transform |
---|
0:10:26 | the frame of a weakening charles |
---|
0:10:28 | and the classify so we apply correlation containment to the segment of an exact |
---|
0:10:37 | and finally the we could use and we database domain adaptation meter the for the |
---|
0:10:43 | language class if you know since our work is to prove that the minute addition |
---|
0:10:50 | of demeaning extractor is superior to the meaning that the end of the classifier you |
---|
0:10:55 | know |
---|
0:10:56 | we simply trained with is a little bit from the target domain the classifier you |
---|
0:11:01 | also is not the domain adaptation with the supervised training data |
---|
0:11:07 | and it's the it gives us a bound on the potential performance of an adaptation |
---|
0:11:12 | of the big increase real to the target |
---|
0:11:18 | so |
---|
0:11:19 | in this work we compare for methods |
---|
0:11:24 | two |
---|
0:11:25 | feature obeys the domain adaptation methods that are applied the in billing cycle also find |
---|
0:11:30 | that that's you know and a longer model based meet the applied to the meaning |
---|
0:11:35 | select all compare two and |
---|
0:11:37 | a bone the a the performance that could it she only database adaptation of the |
---|
0:11:43 | final clustering |
---|
0:11:47 | so no relates to present the experiments |
---|
0:11:51 | so we |
---|
0:11:53 | trained systems that with this means the that |
---|
0:11:56 | with the same sitting so |
---|
0:11:59 | the same with a feature extractor which is the pre-trained retaining when with their next |
---|
0:12:05 | with a feature extractor |
---|
0:12:08 | system now see the nn architecture for the |
---|
0:12:10 | exact on twelve |
---|
0:12:12 | and |
---|
0:12:13 | we go from a training for the regularization of the n binning structural |
---|
0:12:18 | by the station for channel e g two so it's when domain adaptation is now |
---|
0:12:23 | you and we select the hyperparameter long that's it but there's a compromise between the |
---|
0:12:29 | troubles function bayes can performance of the target domain |
---|
0:12:32 | well this domain annotations in i and then sees value from they select in and |
---|
0:12:37 | apply to all of the l domain adaptation scenario |
---|
0:12:41 | what is it important |
---|
0:12:43 | is because |
---|
0:12:44 | in a real domain adaptation scenario we can choose the lab and that from the |
---|
0:12:49 | time domain state i mean so this bombing well as to be robust |
---|
0:12:55 | and then we have to choose because of the men so we always use |
---|
0:12:59 | the telephone channel as for the task since |
---|
0:13:02 | most |
---|
0:13:04 | language recognition corpora |
---|
0:13:07 | a telephone corpus |
---|
0:13:09 | and |
---|
0:13:10 | we the target domain a each of the eight radio channels |
---|
0:13:15 | so we |
---|
0:13:16 | have a novel data from this domain |
---|
0:13:21 | so fast we have to select the by mid on that |
---|
0:13:26 | so we train |
---|
0:13:27 | and the meaning extract or |
---|
0:13:29 | with different values of from the corresponding to the court all this |
---|
0:13:33 | but some |
---|
0:13:35 | so that the value of the regularization loss function and the validation that |
---|
0:13:42 | we is have you all wields use expect so at the beginning of training |
---|
0:13:49 | as a maximum initial been steelers is close to zero |
---|
0:13:52 | since is unattractive |
---|
0:13:55 | randomly initialized and distribution of balls domain are so i'll |
---|
0:14:00 | then in decreases during training because |
---|
0:14:03 | classification needs to make a difference between the main |
---|
0:14:08 | and that the value in that she is that is controlled by the value of |
---|
0:14:14 | so regularization parameter a wrong |
---|
0:14:19 | so that no with general and the classification as functions of course e |
---|
0:14:23 | in these plots we have both the classification errors function and sort them in the |
---|
0:14:29 | sorted line |
---|
0:14:30 | and on time and domains in the line so that in lines corresponding to "'cause" |
---|
0:14:35 | i'm complete on the target domain are not of the l really in and entropy |
---|
0:14:40 | of a domain and the fusion |
---|
0:14:42 | training experiments |
---|
0:14:43 | but our cousins in it when the system here to understand what happens |
---|
0:14:49 | so when the by mean they're also regularization and a smaller so the right job |
---|
0:14:54 | here |
---|
0:14:56 | "'cause" consequently israel used in the source domain but explodes on the diagonal don't |
---|
0:15:03 | but i |
---|
0:15:05 | increasing the value of from the we managed to read used to get between both |
---|
0:15:11 | domain as out the green |
---|
0:15:13 | and |
---|
0:15:14 | right tails |
---|
0:15:16 | but it slows down training on the solemn and for a high value of from |
---|
0:15:23 | that |
---|
0:15:24 | so non that scores one hundred than a we are not able to |
---|
0:15:30 | compared and this whole domain |
---|
0:15:32 | so the choice of from the |
---|
0:15:34 | is a compromise |
---|
0:15:35 | between |
---|
0:15:37 | reducing the between domain |
---|
0:15:39 | and but winkle reference and this will and |
---|
0:15:42 | and we selected the value themselves than |
---|
0:15:45 | for them |
---|
0:15:47 | and then we'll lines is but you for all domain adaptation scenario means telephones old |
---|
0:15:52 | man and each of the eight radio channel as target |
---|
0:15:56 | so in this table we on your present performances all to be a domain |
---|
0:16:03 | a and |
---|
0:16:05 | because the l |
---|
0:16:07 | best and worst performance for stringwise system and the target domain and as the average |
---|
0:16:12 | performance twenty eight channels |
---|
0:16:17 | but results from all channels are consistent |
---|
0:16:21 | so first we were able performance of the baseline since then |
---|
0:16:25 | the when travesty train control domain is you instrumented trained on data get into an |
---|
0:16:33 | performance on for the system trained on for the main is really cool with an |
---|
0:16:38 | average equal rights fourteen |
---|
0:16:40 | and a training of the in domain and shift and a particular boy of twelve |
---|
0:16:46 | then we have |
---|
0:16:47 | so full |
---|
0:16:48 | for system trained with baseline domain adaptation data so first the feature based them into |
---|
0:16:54 | the efficient data |
---|
0:16:55 | if it is applied to the classifier you l |
---|
0:16:59 | we go from forty two such as tree the nine percent average equal weight |
---|
0:17:07 | so we are she a slight improvement |
---|
0:17:10 | we're cell that's |
---|
0:17:11 | the feature based domain adaptation method is more efficient when applied to the and building |
---|
0:17:16 | expect all |
---|
0:17:17 | meaning a supporting all idea |
---|
0:17:20 | that adaptation of them in a spectral is you don't with the patient the structural |
---|
0:17:25 | and is the n meaning extractor is that it did |
---|
0:17:29 | i thing |
---|
0:17:30 | a feature based adaptation of the classifier you know that mandarin improve performance |
---|
0:17:35 | finally |
---|
0:17:36 | "'cause" you've got based training |
---|
0:17:38 | of the classifier with and between training and skills in the man |
---|
0:17:43 | actually a good performance and but we just significantly dog we stand so that are |
---|
0:17:48 | trained on data domain it means that |
---|
0:17:53 | i'm willing to train on this all the men are not perfectly suited for the |
---|
0:17:57 | target domain and should also again of adapting the embedding extract all |
---|
0:18:04 | so |
---|
0:18:05 | domain adaptation |
---|
0:18:07 | i was the cos if al |
---|
0:18:10 | cannot compensate the domain mismatch in the space of the mean |
---|
0:18:15 | and finally we can look at fraser's only the also |
---|
0:18:20 | the maximum mean discrepancy regularization of the meeting extract all |
---|
0:18:24 | so fast when the backend classifier is train and soul domain it is i don't |
---|
0:18:30 | spectral domain adaptation experiment and false even of eight |
---|
0:18:36 | ten years of the corpus we achieve a better performance than versus then two then |
---|
0:18:42 | trained on the strip opposite trend untimed in domain |
---|
0:18:46 | so with the exception of the channel |
---|
0:18:48 | so this is a very good way they're showing that in brian's in the space |
---|
0:18:52 | of them being |
---|
0:18:53 | is useful and this required with the addition of the beginning this value |
---|
0:18:58 | but and it's this is the last line |
---|
0:19:01 | of the table |
---|
0:19:03 | if we train the back-end classifier well on the type in domain we are still |
---|
0:19:08 | able to improve performance with and that's in the meetings |
---|
0:19:12 | means that these and beating looked at any and |
---|
0:19:16 | and that we would work to improve again invariance of this and name |
---|
0:19:21 | all we queen a commune eight this be done with an unsupervised domain adaptation the |
---|
0:19:27 | of the pacific |
---|
0:19:30 | so in this paper we study the as the transmission channel mismatch for a language |
---|
0:19:36 | identification system and propose and unsupervised domain adaptation method of such as just them |
---|
0:19:43 | so propose middle in the |
---|
0:19:45 | is to add a regularization as function |
---|
0:19:48 | of the to the unwitting extractor |
---|
0:19:51 | and distance function is don't maximum mean discrepancy |
---|
0:19:55 | so we surely |
---|
0:19:56 | that system and the |
---|
0:19:58 | is |
---|
0:19:59 | details and supervised training of the word system on the target domain and we you |
---|
0:20:06 | we don't anchorman that stability ideas that adaptation of the impending extract so is more |
---|
0:20:12 | efficient than adaptation |
---|
0:20:14 | of the classifier |
---|
0:20:16 | in an exact are based language identification system |
---|
0:20:20 | thank you |
---|