finally we one and sensual buttoning this presentation

and my value all the in domain presenting the work with the initial be and

enabling the

about the unsupervised domain adaptation of a language identification just and with the goal of

being robust to transmit junction

in this work was that is a problem of language identification something transmit change and

then which has not been a perceptron training of this just an we you and

a bit data from this target transmission channel

this problem is cold unsupervised domain adaptation

we propose to either regularization loss functions of the classification as function of the embedding

extract all during its training

you in this presentation we first define the task of unsupervised domain adaptation for language

identification thing

then we describe the proposed method of regularization optimizing extract all

and finally we present our experiments and rate

so first task open supervised them in the station for language and showing

we use just on down language identification just then based on the egg rolls

this is then is constituted of three bouts first within a feature extractor always aim

to extract frame of a feature

it is a stack of nine inch work which i've been trained to pretty tri-phone

and

frame level and buildings are extracted completely in and they are used as input of

technical extra so

the exact like spectral used and urinate well discriminatively trained

pretty

language and there's

we extracted a segment of it and beating funds is known it well and finally

a language classifier your question rigid of dimension reduction and support vector machine

corners is a scroll for each target language

we train such a system on the corpus thus we lose to in this in

g religious

the contain five languages are be english farsi actual and all

we have recordings for this five languages online transmissions and then

fast but telephone and then

and eight radio channels so now unless we sent to if you find a frost

a telephone recording

now and feature file

speech if

and you which

as you may have only this byers on the original files present a very difficult

noise and distortion characteristics so is this is a real challenge for domain adaptation

during just

well we knew stress again is reaching lines for both training and testing of this

just

our first work

whereas to investigate the domain mismatch issue with the corpus so we trained a language

identification system for each of the nine transmit ranch and then

and it corresponds to the rows of this or and we they did it justice

them on the nine transmit ranch and there's also this it

so first

on the diagonal we have the performance and the matched conditions and we had shamanic

whatever weight and ranging between c and fifteen percent and near acceptable performance

i was side of the diagonal when we test and the channel which has not

been observed during training we examine the you performance

so

it means that sent domain mismatch is a real issue is disgusting

conversely

on the last nine we train a system with that the of the nine transmissions

and then and we are two would performance on all channels meaning that the

okay to intensification system has the capacity

got one where on all channels and the problem into that there's observed during training

the goal of a word is to improve performance outside of the diagonal is the

better

without using a novel data from as a target

some speech engines

so this problem score and supervised a minute shouldn't

where domains corresponding transmission channels

so

we have a soul domain code s we live in that x is yelling zero

recordings and what is the corresponding language of data and we have an evident that

a form a target domain that's

or not

is it which she would language identification performance on the target

so now we describe

our method for unsupervised domain adaptation which is weaker an action of the meeting extra

though

a lot of unsupervised domain adaptation methods are based on a very simple idea of

making distribution of representations of both domain can you know

and this the in domain by using only unlabeled data by aligning distribution of representation

then with this is similar representation you can train a classifier always novel data from

the source the men and if so presentation a invariant between domain

so this if you're we also achieve a good performance on target domain so this

is a data gram to understand this idea you have no leverage that performance on

the menu are able to train a fist fight you're but there's not that from

where on an unseen target domain

consequently we use an evident that the proposed a man to wrong

a space of for presentation well representations of was domain have the same distribution consequently

if a classifier is trained on the source the main in we also well where

on the target

so now of the question is

where we and false invariance of the war presentation within the language identification then

and

idea is to apply to the expect all seems natural since it is

a representation with language information directly extracted for an ornate well trained pretty language

so i'll make the two

creates a dominion valiant expect a used to add a domain adaptation regularization as function

to address function of the embedding it's like well so classical used everything is what

is trained as a classification both sharon

accompanied the core sample

which is always fun

that's recover those functions that takes a lot in that exactly what s phone that's

holding

we added to this post function and a regularization them and all

is that as to make the l and invariance between that just distribution of a

collect all four wheels domain here but in the band are wrong down where there

is a compromise between invariance of the work right annotation between them and

and we classification performance

on this will them

also regularization and thus we decided to use the maximum and disturbance

so the maximum discrepancy

is a development function that correspond

to the supremum

of for the difference between the average of for function

overall was domains

well as experiments they can or well as basis function hate

if h is a the unit ball of our policing john it can be all

space

as maximum mean discrepancy

is the expectation of

channel values of phones embedded was domain

and it did me estimating the and with unit simple

so we i mean you bash during training of the system

we do exactly that doing training of for each mini batch we compute the maximum

mean discrepancy on different better and we idea sets of the classification mass function

in this well we use a good friend got an utterance define the space of

functions

we compare this murder the of reproduction of the n binning extract all to javabayes

them in addition we don't call correlation i the main corridor

j g of a to javabayes domain adaptation method is to transform representation of the

source domain to make then most similar to the target domain is then

train

the following blocks of this just an

with every that the from this whole domain that high in transform

and then applied this case if you all on the target domain

but correlation alignments a transformation to make sure the testing now targeted at that is

a matrix multiplication with the goal of making covariance matrices of was the mainstreaming

we apply this make the two

to

but use of this is then

the exact like select all so

we

transform

the frame of a weakening charles

and the classify so we apply correlation containment to the segment of an exact

and finally the we could use and we database domain adaptation meter the for the

language class if you know since our work is to prove that the minute addition

of demeaning extractor is superior to the meaning that the end of the classifier you

know

we simply trained with is a little bit from the target domain the classifier you

also is not the domain adaptation with the supervised training data

and it's the it gives us a bound on the potential performance of an adaptation

of the big increase real to the target

so

in this work we compare for methods

two

feature obeys the domain adaptation methods that are applied the in billing cycle also find

that that's you know and a longer model based meet the applied to the meaning

select all compare two and

a bone the a the performance that could it she only database adaptation of the

final clustering

so no relates to present the experiments

so we

trained systems that with this means the that

with the same sitting so

the same with a feature extractor which is the pre-trained retaining when with their next

with a feature extractor

system now see the nn architecture for the

exact on twelve

and

we go from a training for the regularization of the n binning structural

by the station for channel e g two so it's when domain adaptation is now

you and we select the hyperparameter long that's it but there's a compromise between the

troubles function bayes can performance of the target domain

well this domain annotations in i and then sees value from they select in and

apply to all of the l domain adaptation scenario

what is it important

is because

in a real domain adaptation scenario we can choose the lab and that from the

time domain state i mean so this bombing well as to be robust

and then we have to choose because of the men so we always use

the telephone channel as for the task since

most

language recognition corpora

a telephone corpus

and

we the target domain a each of the eight radio channels

so we

have a novel data from this domain

so fast we have to select the by mid on that

so we train

and the meaning extract or

with different values of from the corresponding to the court all this

but some

so that the value of the regularization loss function and the validation that

we is have you all wields use expect so at the beginning of training

as a maximum initial been steelers is close to zero

since is unattractive

randomly initialized and distribution of balls domain are so i'll

then in decreases during training because

classification needs to make a difference between the main

and that the value in that she is that is controlled by the value of

so regularization parameter a wrong

so that no with general and the classification as functions of course e

in these plots we have both the classification errors function and sort them in the

sorted line

and on time and domains in the line so that in lines corresponding to "'cause"

i'm complete on the target domain are not of the l really in and entropy

of a domain and the fusion

training experiments

but our cousins in it when the system here to understand what happens

so when the by mean they're also regularization and a smaller so the right job

here

"'cause" consequently israel used in the source domain but explodes on the diagonal don't

but i

increasing the value of from the we managed to read used to get between both

domain as out the green

and

right tails

but it slows down training on the solemn and for a high value of from

that

so non that scores one hundred than a we are not able to

compared and this whole domain

so the choice of from the

is a compromise

between

reducing the between domain

and but winkle reference and this will and

and we selected the value themselves than

for them

and then we'll lines is but you for all domain adaptation scenario means telephones old

man and each of the eight radio channel as target

so in this table we on your present performances all to be a domain

a and

because the l

best and worst performance for stringwise system and the target domain and as the average

performance twenty eight channels

but results from all channels are consistent

so first we were able performance of the baseline since then

the when travesty train control domain is you instrumented trained on data get into an

performance on for the system trained on for the main is really cool with an

average equal rights fourteen

and a training of the in domain and shift and a particular boy of twelve

then we have

so full

for system trained with baseline domain adaptation data so first the feature based them into

the efficient data

if it is applied to the classifier you l

we go from forty two such as tree the nine percent average equal weight

so we are she a slight improvement

we're cell that's

the feature based domain adaptation method is more efficient when applied to the and building

expect all

meaning a supporting all idea

that adaptation of them in a spectral is you don't with the patient the structural

and is the n meaning extractor is that it did

i thing

a feature based adaptation of the classifier you know that mandarin improve performance

finally

"'cause" you've got based training

of the classifier with and between training and skills in the man

actually a good performance and but we just significantly dog we stand so that are

trained on data domain it means that

i'm willing to train on this all the men are not perfectly suited for the

target domain and should also again of adapting the embedding extract all

so

domain adaptation

i was the cos if al

cannot compensate the domain mismatch in the space of the mean

and finally we can look at fraser's only the also

the maximum mean discrepancy regularization of the meeting extract all

so fast when the backend classifier is train and soul domain it is i don't

spectral domain adaptation experiment and false even of eight

ten years of the corpus we achieve a better performance than versus then two then

trained on the strip opposite trend untimed in domain

so with the exception of the channel

so this is a very good way they're showing that in brian's in the space

of them being

is useful and this required with the addition of the beginning this value

but and it's this is the last line

of the table

if we train the back-end classifier well on the type in domain we are still

able to improve performance with and that's in the meetings

means that these and beating looked at any and

and that we would work to improve again invariance of this and name

all we queen a commune eight this be done with an unsupervised domain adaptation the

of the pacific

so in this paper we study the as the transmission channel mismatch for a language

identification system and propose and unsupervised domain adaptation method of such as just them

so propose middle in the

is to add a regularization as function

of the to the unwitting extractor

and distance function is don't maximum mean discrepancy

so we surely

that system and the

is

details and supervised training of the word system on the target domain and we you

we don't anchorman that stability ideas that adaptation of the impending extract so is more

efficient than adaptation

of the classifier

in an exact are based language identification system

thank you