0:00:15finally we one and sensual buttoning this presentation
0:00:18and my value all the in domain presenting the work with the initial be and
0:00:22enabling the
0:00:24about the unsupervised domain adaptation of a language identification just and with the goal of
0:00:30being robust to transmit junction
0:00:33in this work was that is a problem of language identification something transmit change and
0:00:39then which has not been a perceptron training of this just an we you and
0:00:43a bit data from this target transmission channel
0:00:47this problem is cold unsupervised domain adaptation
0:00:51we propose to either regularization loss functions of the classification as function of the embedding
0:00:57extract all during its training
0:00:59you in this presentation we first define the task of unsupervised domain adaptation for language
0:01:06identification thing
0:01:08then we describe the proposed method of regularization optimizing extract all
0:01:14and finally we present our experiments and rate
0:01:18so first task open supervised them in the station for language and showing
0:01:24we use just on down language identification just then based on the egg rolls
0:01:29this is then is constituted of three bouts first within a feature extractor always aim
0:01:35to extract frame of a feature
0:01:38it is a stack of nine inch work which i've been trained to pretty tri-phone
0:01:44and
0:01:46frame level and buildings are extracted completely in and they are used as input of
0:01:51technical extra so
0:01:53the exact like spectral used and urinate well discriminatively trained
0:01:58pretty
0:01:58language and there's
0:02:01we extracted a segment of it and beating funds is known it well and finally
0:02:07a language classifier your question rigid of dimension reduction and support vector machine
0:02:14corners is a scroll for each target language
0:02:19we train such a system on the corpus thus we lose to in this in
0:02:25g religious
0:02:26the contain five languages are be english farsi actual and all
0:02:32we have recordings for this five languages online transmissions and then
0:02:37fast but telephone and then
0:02:39and eight radio channels so now unless we sent to if you find a frost
0:02:45a telephone recording
0:02:53now and feature file
0:03:01speech if
0:03:07and you which
0:03:14as you may have only this byers on the original files present a very difficult
0:03:20noise and distortion characteristics so is this is a real challenge for domain adaptation
0:03:28during just
0:03:29well we knew stress again is reaching lines for both training and testing of this
0:03:35just
0:03:36our first work
0:03:38whereas to investigate the domain mismatch issue with the corpus so we trained a language
0:03:44identification system for each of the nine transmit ranch and then
0:03:49and it corresponds to the rows of this or and we they did it justice
0:03:54them on the nine transmit ranch and there's also this it
0:03:59so first
0:04:01on the diagonal we have the performance and the matched conditions and we had shamanic
0:04:06whatever weight and ranging between c and fifteen percent and near acceptable performance
0:04:15i was side of the diagonal when we test and the channel which has not
0:04:20been observed during training we examine the you performance
0:04:24so
0:04:25it means that sent domain mismatch is a real issue is disgusting
0:04:30conversely
0:04:32on the last nine we train a system with that the of the nine transmissions
0:04:37and then and we are two would performance on all channels meaning that the
0:04:43okay to intensification system has the capacity
0:04:49got one where on all channels and the problem into that there's observed during training
0:04:56the goal of a word is to improve performance outside of the diagonal is the
0:05:02better
0:05:02without using a novel data from as a target
0:05:07some speech engines
0:05:09so this problem score and supervised a minute shouldn't
0:05:13where domains corresponding transmission channels
0:05:16so
0:05:17we have a soul domain code s we live in that x is yelling zero
0:05:23recordings and what is the corresponding language of data and we have an evident that
0:05:29a form a target domain that's
0:05:33or not
0:05:33is it which she would language identification performance on the target
0:05:40so now we describe
0:05:43our method for unsupervised domain adaptation which is weaker an action of the meeting extra
0:05:48though
0:05:50a lot of unsupervised domain adaptation methods are based on a very simple idea of
0:05:57making distribution of representations of both domain can you know
0:06:03and this the in domain by using only unlabeled data by aligning distribution of representation
0:06:10then with this is similar representation you can train a classifier always novel data from
0:06:17the source the men and if so presentation a invariant between domain
0:06:22so this if you're we also achieve a good performance on target domain so this
0:06:26is a data gram to understand this idea you have no leverage that performance on
0:06:32the menu are able to train a fist fight you're but there's not that from
0:06:36where on an unseen target domain
0:06:39consequently we use an evident that the proposed a man to wrong
0:06:44a space of for presentation well representations of was domain have the same distribution consequently
0:06:52if a classifier is trained on the source the main in we also well where
0:06:57on the target
0:06:59so now of the question is
0:07:03where we and false invariance of the war presentation within the language identification then
0:07:09and
0:07:11idea is to apply to the expect all seems natural since it is
0:07:16a representation with language information directly extracted for an ornate well trained pretty language
0:07:26so i'll make the two
0:07:29creates a dominion valiant expect a used to add a domain adaptation regularization as function
0:07:35to address function of the embedding it's like well so classical used everything is what
0:07:40is trained as a classification both sharon
0:07:43accompanied the core sample
0:07:46which is always fun
0:07:48that's recover those functions that takes a lot in that exactly what s phone that's
0:07:53holding
0:07:55we added to this post function and a regularization them and all
0:08:01is that as to make the l and invariance between that just distribution of a
0:08:06collect all four wheels domain here but in the band are wrong down where there
0:08:10is a compromise between invariance of the work right annotation between them and
0:08:15and we classification performance
0:08:17on this will them
0:08:19also regularization and thus we decided to use the maximum and disturbance
0:08:25so the maximum discrepancy
0:08:27is a development function that correspond
0:08:30to the supremum
0:08:32of for the difference between the average of for function
0:08:36overall was domains
0:08:38well as experiments they can or well as basis function hate
0:08:42if h is a the unit ball of our policing john it can be all
0:08:48space
0:08:49as maximum mean discrepancy
0:08:52is the expectation of
0:08:55channel values of phones embedded was domain
0:08:58and it did me estimating the and with unit simple
0:09:03so we i mean you bash during training of the system
0:09:06we do exactly that doing training of for each mini batch we compute the maximum
0:09:13mean discrepancy on different better and we idea sets of the classification mass function
0:09:19in this well we use a good friend got an utterance define the space of
0:09:24functions
0:09:28we compare this murder the of reproduction of the n binning extract all to javabayes
0:09:33them in addition we don't call correlation i the main corridor
0:09:39j g of a to javabayes domain adaptation method is to transform representation of the
0:09:45source domain to make then most similar to the target domain is then
0:09:50train
0:09:51the following blocks of this just an
0:09:55with every that the from this whole domain that high in transform
0:09:59and then applied this case if you all on the target domain
0:10:03but correlation alignments a transformation to make sure the testing now targeted at that is
0:10:09a matrix multiplication with the goal of making covariance matrices of was the mainstreaming
0:10:16we apply this make the two
0:10:18to
0:10:20but use of this is then
0:10:21the exact like select all so
0:10:23we
0:10:24transform
0:10:26the frame of a weakening charles
0:10:28and the classify so we apply correlation containment to the segment of an exact
0:10:37and finally the we could use and we database domain adaptation meter the for the
0:10:43language class if you know since our work is to prove that the minute addition
0:10:50of demeaning extractor is superior to the meaning that the end of the classifier you
0:10:55know
0:10:56we simply trained with is a little bit from the target domain the classifier you
0:11:01also is not the domain adaptation with the supervised training data
0:11:07and it's the it gives us a bound on the potential performance of an adaptation
0:11:12of the big increase real to the target
0:11:18so
0:11:19in this work we compare for methods
0:11:24two
0:11:25feature obeys the domain adaptation methods that are applied the in billing cycle also find
0:11:30that that's you know and a longer model based meet the applied to the meaning
0:11:35select all compare two and
0:11:37a bone the a the performance that could it she only database adaptation of the
0:11:43final clustering
0:11:47so no relates to present the experiments
0:11:51so we
0:11:53trained systems that with this means the that
0:11:56with the same sitting so
0:11:59the same with a feature extractor which is the pre-trained retaining when with their next
0:12:05with a feature extractor
0:12:08system now see the nn architecture for the
0:12:10exact on twelve
0:12:12and
0:12:13we go from a training for the regularization of the n binning structural
0:12:18by the station for channel e g two so it's when domain adaptation is now
0:12:23you and we select the hyperparameter long that's it but there's a compromise between the
0:12:29troubles function bayes can performance of the target domain
0:12:32well this domain annotations in i and then sees value from they select in and
0:12:37apply to all of the l domain adaptation scenario
0:12:41what is it important
0:12:43is because
0:12:44in a real domain adaptation scenario we can choose the lab and that from the
0:12:49time domain state i mean so this bombing well as to be robust
0:12:55and then we have to choose because of the men so we always use
0:12:59the telephone channel as for the task since
0:13:02most
0:13:04language recognition corpora
0:13:07a telephone corpus
0:13:09and
0:13:10we the target domain a each of the eight radio channels
0:13:15so we
0:13:16have a novel data from this domain
0:13:21so fast we have to select the by mid on that
0:13:26so we train
0:13:27and the meaning extract or
0:13:29with different values of from the corresponding to the court all this
0:13:33but some
0:13:35so that the value of the regularization loss function and the validation that
0:13:42we is have you all wields use expect so at the beginning of training
0:13:49as a maximum initial been steelers is close to zero
0:13:52since is unattractive
0:13:55randomly initialized and distribution of balls domain are so i'll
0:14:00then in decreases during training because
0:14:03classification needs to make a difference between the main
0:14:08and that the value in that she is that is controlled by the value of
0:14:14so regularization parameter a wrong
0:14:19so that no with general and the classification as functions of course e
0:14:23in these plots we have both the classification errors function and sort them in the
0:14:29sorted line
0:14:30and on time and domains in the line so that in lines corresponding to "'cause"
0:14:35i'm complete on the target domain are not of the l really in and entropy
0:14:40of a domain and the fusion
0:14:42training experiments
0:14:43but our cousins in it when the system here to understand what happens
0:14:49so when the by mean they're also regularization and a smaller so the right job
0:14:54here
0:14:56"'cause" consequently israel used in the source domain but explodes on the diagonal don't
0:15:03but i
0:15:05increasing the value of from the we managed to read used to get between both
0:15:11domain as out the green
0:15:13and
0:15:14right tails
0:15:16but it slows down training on the solemn and for a high value of from
0:15:23that
0:15:24so non that scores one hundred than a we are not able to
0:15:30compared and this whole domain
0:15:32so the choice of from the
0:15:34is a compromise
0:15:35between
0:15:37reducing the between domain
0:15:39and but winkle reference and this will and
0:15:42and we selected the value themselves than
0:15:45for them
0:15:47and then we'll lines is but you for all domain adaptation scenario means telephones old
0:15:52man and each of the eight radio channel as target
0:15:56so in this table we on your present performances all to be a domain
0:16:03a and
0:16:05because the l
0:16:07best and worst performance for stringwise system and the target domain and as the average
0:16:12performance twenty eight channels
0:16:17but results from all channels are consistent
0:16:21so first we were able performance of the baseline since then
0:16:25the when travesty train control domain is you instrumented trained on data get into an
0:16:33performance on for the system trained on for the main is really cool with an
0:16:38average equal rights fourteen
0:16:40and a training of the in domain and shift and a particular boy of twelve
0:16:46then we have
0:16:47so full
0:16:48for system trained with baseline domain adaptation data so first the feature based them into
0:16:54the efficient data
0:16:55if it is applied to the classifier you l
0:16:59we go from forty two such as tree the nine percent average equal weight
0:17:07so we are she a slight improvement
0:17:10we're cell that's
0:17:11the feature based domain adaptation method is more efficient when applied to the and building
0:17:16expect all
0:17:17meaning a supporting all idea
0:17:20that adaptation of them in a spectral is you don't with the patient the structural
0:17:25and is the n meaning extractor is that it did
0:17:29i thing
0:17:30a feature based adaptation of the classifier you know that mandarin improve performance
0:17:35finally
0:17:36"'cause" you've got based training
0:17:38of the classifier with and between training and skills in the man
0:17:43actually a good performance and but we just significantly dog we stand so that are
0:17:48trained on data domain it means that
0:17:53i'm willing to train on this all the men are not perfectly suited for the
0:17:57target domain and should also again of adapting the embedding extract all
0:18:04so
0:18:05domain adaptation
0:18:07i was the cos if al
0:18:10cannot compensate the domain mismatch in the space of the mean
0:18:15and finally we can look at fraser's only the also
0:18:20the maximum mean discrepancy regularization of the meeting extract all
0:18:24so fast when the backend classifier is train and soul domain it is i don't
0:18:30spectral domain adaptation experiment and false even of eight
0:18:36ten years of the corpus we achieve a better performance than versus then two then
0:18:42trained on the strip opposite trend untimed in domain
0:18:46so with the exception of the channel
0:18:48so this is a very good way they're showing that in brian's in the space
0:18:52of them being
0:18:53is useful and this required with the addition of the beginning this value
0:18:58but and it's this is the last line
0:19:01of the table
0:19:03if we train the back-end classifier well on the type in domain we are still
0:19:08able to improve performance with and that's in the meetings
0:19:12means that these and beating looked at any and
0:19:16and that we would work to improve again invariance of this and name
0:19:21all we queen a commune eight this be done with an unsupervised domain adaptation the
0:19:27of the pacific
0:19:30so in this paper we study the as the transmission channel mismatch for a language
0:19:36identification system and propose and unsupervised domain adaptation method of such as just them
0:19:43so propose middle in the
0:19:45is to add a regularization as function
0:19:48of the to the unwitting extractor
0:19:51and distance function is don't maximum mean discrepancy
0:19:55so we surely
0:19:56that system and the
0:19:58is
0:19:59details and supervised training of the word system on the target domain and we you
0:20:06we don't anchorman that stability ideas that adaptation of the impending extract so is more
0:20:12efficient than adaptation
0:20:14of the classifier
0:20:16in an exact are based language identification system
0:20:20thank you