she thank you also
so the language recognition i-vector challenge had three main goals
first to including attracts people from outside a regular community
and to make
this
work that we do more accessible to that
and the idea behind that was to people to explore new approaches and methods
from machine learning and language recognition with the overall goal of improving performance and language
recognition
the task was open set language identification so given audio segments a which are and
languages the audio segments spoken in or whether was and
unknown language
the data used was from previous and a cell l are used as well as
from the i r pa babble program
and the data was selected in such a manner such that multiple sources were used
for each language in order to reduce
the source and language fact
and we're also select in order to have highly confusable languages included in the
dataset
accuracy the size of the data there were fifty languages and train and sixty five
and dev and test
about three hundred per language gender segments per language in the training and about a
hundred
and the devon test
and we see the total number of segments all the way the right hand column
fifteen hundred for training so about sixty four hundred for dev and about sixty five
hundred for test
and the training set did not include data that was from out of set
the development set included and unlabeled out of set
and the test set was divided into progress and evaluation subsets so we'll
cover and just a moment
people were able to upload their system outputs and receive some feedback on how that
one and that was done using a progress set
and then at the end of the evaluation period
a feedback was given on an evaluation set in it was a partition so there's
not overlap
here we see data sources for each language
on the
right hand side i sure noisy that is to see
you can see different corpora labels i think that a high-level we can say
blue or conversational telephone speech green include
broadcast narrowband speech and yellow is a combination of the two
i think
one thing to say is that if you look across
the training data which is the i guess you're leftmost column
the dev data which is in the middle and the test data to rest of
the right
the distribution across sources is very similar per language there are a few exceptions
and as we mentioned there was no out of set
due to the training
and here we see us speech duration
both in trained up and test
training is this page that is green and test is blue
and we see it again a similar distribution a model trained of interest
this was low more
the performance metric was error rates split into out of seven languages and within seven
languages
where the prior probability of a lot of seven languages point two three
participation was
wonderful a more than what will typically see and a lre
was from international sites six continents and thirty one countries
about eighty participants to model the data know little a fifty five per se but
the results
from
forty four unique organisations
during the evaluation period a little over seventy i'm sorry thirty seven hundred dollars emissions
were submitted
and that number continues to grow
after which
and mentioned that we
i had more participation and the i-vector challenge that we need to be with your
salary and we can see some other comparisons
i guess i've not had said one of the main differences between the i-vector challenge
and a traditional areas in the data that we distribute
and the traditional battery we send a audio segments as input to systems and i-vector
challenge we send i-vectors instead
the task was different never to challenge as a open set identification instill detection
and i-vector challenge the cost was based on a kind of total error rates per
language and in the traditional laureates on miss and false alarm rates
a larger number of target languages a different
distribution of speech duration and mention that was log normal and i-vector challenge in the
traditional array it's three ten and thirty second bins traditionally
the challenge lasted much longer than the i-vector challenge
and it
but also the i-vector challenge results were
feedback where it was given during the challenge period which is also about something we
do in traditional evaluations
and last there was a an evaluation platform that was online
and this was something that we
focused on for the i-vector challenge
in particular the goal was to facilitate
the evaluation process with limited human involvement
all evaluation activities were conducted via this platform including receiving the data
uploading submissions and been able to see how things went
and now looking at some results on the y-axis we see
cost
and on the x-axis a time
the first
first diff i think is around may seventeenth the choice certainly first
and the second floor
large dip is on may twenty first so
of about half roughly half of the progress made during the evaluation to place during
the first
two or three weeks or so
and then during the remainder of four months the rest of the progress was made
here we also see cost on the y-axis one x-axis we see
participant id so these are really discrete it's sorted by best cost
obtained on the evaluation
a subset
and so we see most of the sites be the be the baseline
which is trained and a few sites be an oracle system so i guess speaking
of speaking to both of these the baseline i believe is a simple
a simple
system that used cosine distance and oracle system used p lda
so it's called oracle because there were unlabeled data that were distributed to the participants
butts the oracle system used those labels
and here we see the number of submissions per participant
in general
a participants you did well estimated more systems but there were
a few exceptions i think now is a reasonable time dimension that
participant id and
site id the distinction between participants and site so
participants as someone who signed up and maybe there were multiple participants personally so i
use are not necessarily unrelated for example section three may have also been by thirty
just
and you receive results by a target language we have every year on the y-axis
on x-axis we see language the lowest error or was received on
parameters and highest on hindi
what was surprising was english also had a high error rate
second from can be actually of second for the right
and the blue was the out of seven languages somewhere in the middle the pack
and here we see results by speech duration i guess no surprise that is you
get more audio
you tend to do better
one thing that
is also may be interesting is there seems to be some diminishing marginal returns
so if for example you had three seconds and you could get ten you do
maybe
we
point to better but if you want from
a ten to twenty
the difference is not so great
just as an example
so some lessons learned
wonderful participation were all very grateful for you in the audience to fit it is
this was those we couldn't dryness today
number of systems be the baseline that surprisingly six you're actually better than the oracle
system sure hoping to learn more about
a half of the improvement made as early on i which may just to reconsider
the timeline
surprisingly top systems do not all do so well on english
performance of out of seven languages also was not is for this we might have
expected
we did not receive many system descriptions so it's unclear how many of the participants
attended have its although
later in the session will your from
tops is thus able to capture stated in the a team that created top system
that did develop level techniques and we'll see more that
and the web platform ends up so please feel free to visit and participant the
challenge now
and see how see how you're doing
and a quick plug for upcoming activities there's a story sixteen and workshop
where the it speaker detection on telephone speech recorded over a variety of handsets
similar to lre fifteen those are from layer there's now a fixed training condition as
well as an open condition
can see some other there so that the evaluation and there's also a twenty sixteen
lre analysis workshop and all of this will be co-located with salty sixteen and
send
so it looks like we have time for
for questions