finally we one and sensual buttoning this presentation
and my value all the in domain presenting the work with the initial be and
enabling the
about the unsupervised domain adaptation of a language identification just and with the goal of
being robust to transmit junction
in this work was that is a problem of language identification something transmit change and
then which has not been a perceptron training of this just an we you and
a bit data from this target transmission channel
this problem is cold unsupervised domain adaptation
we propose to either regularization loss functions of the classification as function of the embedding
extract all during its training
you in this presentation we first define the task of unsupervised domain adaptation for language
identification thing
then we describe the proposed method of regularization optimizing extract all
and finally we present our experiments and rate
so first task open supervised them in the station for language and showing
we use just on down language identification just then based on the egg rolls
this is then is constituted of three bouts first within a feature extractor always aim
to extract frame of a feature
it is a stack of nine inch work which i've been trained to pretty tri-phone
and
frame level and buildings are extracted completely in and they are used as input of
technical extra so
the exact like spectral used and urinate well discriminatively trained
pretty
language and there's
we extracted a segment of it and beating funds is known it well and finally
a language classifier your question rigid of dimension reduction and support vector machine
corners is a scroll for each target language
we train such a system on the corpus thus we lose to in this in
g religious
the contain five languages are be english farsi actual and all
we have recordings for this five languages online transmissions and then
fast but telephone and then
and eight radio channels so now unless we sent to if you find a frost
a telephone recording
now and feature file
speech if
and you which
as you may have only this byers on the original files present a very difficult
noise and distortion characteristics so is this is a real challenge for domain adaptation
during just
well we knew stress again is reaching lines for both training and testing of this
just
our first work
whereas to investigate the domain mismatch issue with the corpus so we trained a language
identification system for each of the nine transmit ranch and then
and it corresponds to the rows of this or and we they did it justice
them on the nine transmit ranch and there's also this it
so first
on the diagonal we have the performance and the matched conditions and we had shamanic
whatever weight and ranging between c and fifteen percent and near acceptable performance
i was side of the diagonal when we test and the channel which has not
been observed during training we examine the you performance
so
it means that sent domain mismatch is a real issue is disgusting
conversely
on the last nine we train a system with that the of the nine transmissions
and then and we are two would performance on all channels meaning that the
okay to intensification system has the capacity
got one where on all channels and the problem into that there's observed during training
the goal of a word is to improve performance outside of the diagonal is the
better
without using a novel data from as a target
some speech engines
so this problem score and supervised a minute shouldn't
where domains corresponding transmission channels
so
we have a soul domain code s we live in that x is yelling zero
recordings and what is the corresponding language of data and we have an evident that
a form a target domain that's
or not
is it which she would language identification performance on the target
so now we describe
our method for unsupervised domain adaptation which is weaker an action of the meeting extra
though
a lot of unsupervised domain adaptation methods are based on a very simple idea of
making distribution of representations of both domain can you know
and this the in domain by using only unlabeled data by aligning distribution of representation
then with this is similar representation you can train a classifier always novel data from
the source the men and if so presentation a invariant between domain
so this if you're we also achieve a good performance on target domain so this
is a data gram to understand this idea you have no leverage that performance on
the menu are able to train a fist fight you're but there's not that from
where on an unseen target domain
consequently we use an evident that the proposed a man to wrong
a space of for presentation well representations of was domain have the same distribution consequently
if a classifier is trained on the source the main in we also well where
on the target
so now of the question is
where we and false invariance of the war presentation within the language identification then
and
idea is to apply to the expect all seems natural since it is
a representation with language information directly extracted for an ornate well trained pretty language
so i'll make the two
creates a dominion valiant expect a used to add a domain adaptation regularization as function
to address function of the embedding it's like well so classical used everything is what
is trained as a classification both sharon
accompanied the core sample
which is always fun
that's recover those functions that takes a lot in that exactly what s phone that's
holding
we added to this post function and a regularization them and all
is that as to make the l and invariance between that just distribution of a
collect all four wheels domain here but in the band are wrong down where there
is a compromise between invariance of the work right annotation between them and
and we classification performance
on this will them
also regularization and thus we decided to use the maximum and disturbance
so the maximum discrepancy
is a development function that correspond
to the supremum
of for the difference between the average of for function
overall was domains
well as experiments they can or well as basis function hate
if h is a the unit ball of our policing john it can be all
space
as maximum mean discrepancy
is the expectation of
channel values of phones embedded was domain
and it did me estimating the and with unit simple
so we i mean you bash during training of the system
we do exactly that doing training of for each mini batch we compute the maximum
mean discrepancy on different better and we idea sets of the classification mass function
in this well we use a good friend got an utterance define the space of
functions
we compare this murder the of reproduction of the n binning extract all to javabayes
them in addition we don't call correlation i the main corridor
j g of a to javabayes domain adaptation method is to transform representation of the
source domain to make then most similar to the target domain is then
train
the following blocks of this just an
with every that the from this whole domain that high in transform
and then applied this case if you all on the target domain
but correlation alignments a transformation to make sure the testing now targeted at that is
a matrix multiplication with the goal of making covariance matrices of was the mainstreaming
we apply this make the two
to
but use of this is then
the exact like select all so
we
transform
the frame of a weakening charles
and the classify so we apply correlation containment to the segment of an exact
and finally the we could use and we database domain adaptation meter the for the
language class if you know since our work is to prove that the minute addition
of demeaning extractor is superior to the meaning that the end of the classifier you
know
we simply trained with is a little bit from the target domain the classifier you
also is not the domain adaptation with the supervised training data
and it's the it gives us a bound on the potential performance of an adaptation
of the big increase real to the target
so
in this work we compare for methods
two
feature obeys the domain adaptation methods that are applied the in billing cycle also find
that that's you know and a longer model based meet the applied to the meaning
select all compare two and
a bone the a the performance that could it she only database adaptation of the
final clustering
so no relates to present the experiments
so we
trained systems that with this means the that
with the same sitting so
the same with a feature extractor which is the pre-trained retaining when with their next
with a feature extractor
system now see the nn architecture for the
exact on twelve
and
we go from a training for the regularization of the n binning structural
by the station for channel e g two so it's when domain adaptation is now
you and we select the hyperparameter long that's it but there's a compromise between the
troubles function bayes can performance of the target domain
well this domain annotations in i and then sees value from they select in and
apply to all of the l domain adaptation scenario
what is it important
is because
in a real domain adaptation scenario we can choose the lab and that from the
time domain state i mean so this bombing well as to be robust
and then we have to choose because of the men so we always use
the telephone channel as for the task since
most
language recognition corpora
a telephone corpus
and
we the target domain a each of the eight radio channels
so we
have a novel data from this domain
so fast we have to select the by mid on that
so we train
and the meaning extract or
with different values of from the corresponding to the court all this
but some
so that the value of the regularization loss function and the validation that
we is have you all wields use expect so at the beginning of training
as a maximum initial been steelers is close to zero
since is unattractive
randomly initialized and distribution of balls domain are so i'll
then in decreases during training because
classification needs to make a difference between the main
and that the value in that she is that is controlled by the value of
so regularization parameter a wrong
so that no with general and the classification as functions of course e
in these plots we have both the classification errors function and sort them in the
sorted line
and on time and domains in the line so that in lines corresponding to "'cause"
i'm complete on the target domain are not of the l really in and entropy
of a domain and the fusion
training experiments
but our cousins in it when the system here to understand what happens
so when the by mean they're also regularization and a smaller so the right job
here
"'cause" consequently israel used in the source domain but explodes on the diagonal don't
but i
increasing the value of from the we managed to read used to get between both
domain as out the green
and
right tails
but it slows down training on the solemn and for a high value of from
that
so non that scores one hundred than a we are not able to
compared and this whole domain
so the choice of from the
is a compromise
between
reducing the between domain
and but winkle reference and this will and
and we selected the value themselves than
for them
and then we'll lines is but you for all domain adaptation scenario means telephones old
man and each of the eight radio channel as target
so in this table we on your present performances all to be a domain
a and
because the l
best and worst performance for stringwise system and the target domain and as the average
performance twenty eight channels
but results from all channels are consistent
so first we were able performance of the baseline since then
the when travesty train control domain is you instrumented trained on data get into an
performance on for the system trained on for the main is really cool with an
average equal rights fourteen
and a training of the in domain and shift and a particular boy of twelve
then we have
so full
for system trained with baseline domain adaptation data so first the feature based them into
the efficient data
if it is applied to the classifier you l
we go from forty two such as tree the nine percent average equal weight
so we are she a slight improvement
we're cell that's
the feature based domain adaptation method is more efficient when applied to the and building
expect all
meaning a supporting all idea
that adaptation of them in a spectral is you don't with the patient the structural
and is the n meaning extractor is that it did
i thing
a feature based adaptation of the classifier you know that mandarin improve performance
finally
"'cause" you've got based training
of the classifier with and between training and skills in the man
actually a good performance and but we just significantly dog we stand so that are
trained on data domain it means that
i'm willing to train on this all the men are not perfectly suited for the
target domain and should also again of adapting the embedding extract all
so
domain adaptation
i was the cos if al
cannot compensate the domain mismatch in the space of the mean
and finally we can look at fraser's only the also
the maximum mean discrepancy regularization of the meeting extract all
so fast when the backend classifier is train and soul domain it is i don't
spectral domain adaptation experiment and false even of eight
ten years of the corpus we achieve a better performance than versus then two then
trained on the strip opposite trend untimed in domain
so with the exception of the channel
so this is a very good way they're showing that in brian's in the space
of them being
is useful and this required with the addition of the beginning this value
but and it's this is the last line
of the table
if we train the back-end classifier well on the type in domain we are still
able to improve performance with and that's in the meetings
means that these and beating looked at any and
and that we would work to improve again invariance of this and name
all we queen a commune eight this be done with an unsupervised domain adaptation the
of the pacific
so in this paper we study the as the transmission channel mismatch for a language
identification system and propose and unsupervised domain adaptation method of such as just them
so propose middle in the
is to add a regularization as function
of the to the unwitting extractor
and distance function is don't maximum mean discrepancy
so we surely
that system and the
is
details and supervised training of the word system on the target domain and we you
we don't anchorman that stability ideas that adaptation of the impending extract so is more
efficient than adaptation
of the classifier
in an exact are based language identification system
thank you