thank you very much
thanks to the organisation for the enhanced percent in a hardware work
which is still trying to complement well
so with some post analyses the necessary they larry able to
you to the due to some somebody beauties a meat couldn't come here so i'm
try to percent
thank you now present if you tell somewhat all overview about the other we submissions
where system
we have some hypotheses are not at each that they would like to show you
a how we work with a development dataset and the man on interactions that we
the evaluation results and someone of these things and configurations on the lesson study we
learn from this
okay still
very briefly the other we are able to a shown was focused on the development
of language recognition systems
for very closely related languages
so well we have to twenty target language is a split across
six different clusters and the participants have to devise their own development set
there were mean up to maine a channels the telephone speech and a broadcast speech
and here we have the six different plaster probably chinese english french slide we can
be very in
them the performance metric was the average of the performance within each cluster so
these a low to development
the development of six different a separate systems for
it's cluster
since the we have to torture the language in each cluster
okay so
we have before the yellow re some hypotheses the first one was that
there where the data that there where l limit mismatch between that there and the
test set up
as we have seen the previews salaries but of course work
i say so you
second one is that the bottleneck features where all
good features for these kind of a task
and also you that
we we're right from these hypotheses
i where hypothesis here was that the fusion with multiple systems
a it was a nice approached to increase their
and we were run
have a good development dataset design would be crucial
and we were
we have i mean three octaves here are the for one was to design a
development dataset
the second be below innovative approach is to dialect id
on the third one select a rubber used fusion coming from the right of complementary
bottleneck features so features
but we were all developing on their
darpa rats program
and also
fusion with the different backend classifier
so first we use plead that data in eighty percent for training and twenty percent
for that
a constant mentioned in his last question it was but there are a decision that
passage so you
or it could be better
and we have ten audio files per language you need you need to split
we prevent to have these telephone conversational scrollers uttering and taps
and in here we include a equal proportion of thirty four of telephone speech and
broadcast speech in its in need to split
and we screwed switchboard one and two basically because
our first experiments didn't so great impact on that
probably because we
didn't expect these huge missed spots
and so we
get their from the with that they out your we changed a the audio to
different segments of three seconds to assist a short durations
a the end we have a wrong hundred k used for they ubm and i
p i ubm training and which in the training data used for take a back
and classifiers
we contextualized features with different methods like sdc
and deltas and double deltas at run p c d or pca dct and also
we fusion different i-vector system select from a traditional features and at the end they
bottleneck where training with these combination of different
a better original features with different context of sessions
for data back and classifiers we used a the gaussian backend and a neural networks
both methods are very well known for the community
and two methods for adapt that the other coalition back and which aims to better
cope with a mismatch conditions
basically it's a based on the a i-vector taste we try to select some i-vectors
are from their from the training to train the gaussian backends
and also the resolution and neural networks that
it was a new method the we propose here
and i aims to exploit day they this short dialect differences that we caff or
with the phonetic information
so a we have a different chunk durations from short directions to thirty two seconds
direction a chance and the phone segment and we have a different weights for each
for each
for each tank
okay and here we have comparison
for all these five
i can systems that we had
they multi-resolution neural networks was performed the but the best solution we're using the best
single bottleneck features and the number linux features in the case of the a multiresolution
neural network we were using just the bottleneck features because
we need phonetic information so as to make sense to use the bottleneck features
since aware bottleneck feature for training with it for the siemens
and also another thing it that the additive gaussian backend approaches were more complement are
we with a normal bottleneck i-vectors
we're uncle these systems as we can see here for our data
and here
what it would like to show you use that it clearly works much better the
bottleneck features and non bottleneck features
for a
for the feature for the for the backends
okay so this is it
in general i claim or a of our system
at the end of the consumptions we used fusion somehow some of this of these
systems fusion like seek so or all five or six hours of them
where we in clusters specific fusion or on overall the a data fusion and we
with that the scores we get the look really cute conversions also or into the
cluster or with a global
with the global locally the huge radio and at the end this is therefore
aw systems that we were percent the
so the for our primary systems were used in five weight cluster based fusion
cluster based log-likelihood conversions
all the second one was to system we fusion a cluster based conversions the third
one was used using the belgian but can only five wait a cluster based fusion
and the for one was with us as the second one
but we think global compression of day likely if you to reduce
okay so some evaluation analyses is
we got the
test data we can see the future work that we have the difference between the
on the test we were from well
three percent to twenty three percent
it is huge
and of course we have questions weight happened right
so this is a round also for it the core to compare the data under
as we can see here this is our primary system
so it's i think it's real one to say that are there is a three
five percent of relative gain over the best single system that
on the test
we got a eight percent lost and on the evaluation
okay so
for us what was more important and distribution okay
t and use a different
algorithms that they have to develop a and use agreed a development set up
due to these several the mismatch what is more important the algorithms that use of
human data
and we run some analyses of to try to have some a answers to these
using an mfcc
plus deltas and double and the task weights at the nn out a gaussian backend
is that sixty nine twenty here
so after
which good discussions with something so the evaluation will there are several factors
in the development least
all morse
the chunking didn't help at all
so we're gonna do some experiments just removing the a the a the chunks of
the all on that
also the different this plead
most of the team square you seen sixty percent now forty or sixty percent for
training and forty percent for development
would like to things the in made to guys for providing their the least that
we were using
and also usual the data for the final mark and training and calibration
was also a key
thing to do
i'm unit using the uniform s p duration for the dev segments
and also we run some augmentation of the data and some double algorithms that we
okay so here is the results post evaluation results so us we can see we
went from our primary system and twenty three point three
to say fusion system to twenty one point nine within the fusion just that one
and we keep
improving if we modify the training and that this pleading we are you seen
all the all the data for the training the ubm and the backend systems and
diffusions and also
you we are not chunking we're we are also improvement
the performance so id in we could have fifteen percent a relative gain
out so
so that that's shows that a the development data was crucial easy solution
also scenes
a small leak said they where using a different ubm system for used its cluster
we want to also
use these solution and we also
could see some improvement
thanks to guys from prior for that
that so we want to study how we how sensitive he's the different
a blocks in our paper claim to this mismatch so we use radar so get
some data from the from the test put on the development we create up for
full deviations of that this they don't get some data on the different parts of
the of our paper
easily we can say that they back end that a and the i-vector extractor sniffling
c significantly impact the mismatch a lot because we can see there is a few
percent of relative gain an s sixty percent of relative gains seen in
steps a respectively
so some message to take a means that
for us it didn't work they fusion and the chunking training data for day for
the classification
and it works
and also it works for the rest of the groups i guess the bottleneck features
the gaussian and a neural networks cans
and also it were so
it was a low you that are the having a good development set it was
something very important for this
okay something top
we have time core for questions
all the channels cz getting they segments that we have and lead segment a speeding
very short segment
from the second two seconds
for the backend was used for the work
and the question
just like i guess this is a commonality whatever's but we define a fact that
we could be successful with an at twenty split and with doing a segment durations
for all classifier trained
figure two no
so we are
is not the ones for this okay good to know
we could you sure the spleen at least
just yes i think we could we had documentations in it too so we have
to talk about that part of this
could you put up to us like the can where you didn't the twenty at
the at twenty and then went down to the sixty forty splits
so that it was really nice to see that because i think most groups we
saw most sensitive using sixty forty than the data retrain right we didn't have an
operating cycles receive you cycles what an hour training so we did we actually started
to sixty which was where her track what hurt us
but i think most folks of they started with the at if they didn't do
a retrain probably
did or did okay
but i think that's actually showed really nice improvement on where exactly so when you
do all
you did is then all test
that is the you that is the and
to other questions
okay well let's think the speaker again thing