i
i would say let's say get the session and to make
yeah i'm very happy to invite i introduce our next to invited speaker much elephant or equal
he is from the
from what used to be called I P C but is now the
i think the foundation bruno canceller uses independent research institute associated are located near the university of train yeah
over there he had the speech and language of the human language technology effort is the co director of that
you probably know him for many of his paper
yeah i personally know him from a summer workshop in two thousand seven at johns hopkins very he and the
number of people including philip goon
you have a lot to a lot of very useful software for machine translation that are sort of the genesis
of the moses toolkit
and for those of you don't know moses is to a lot of machine translation but htk is to the
speech recognition people
it's very widely used so that's how i got to know him but of course he has many long other
accomplishments please list i will not lead all of them just to point out that he's been maybe it's the
associate editor for the acm transactions speech and language processing and E foundations and trends in information retrieval
so and he's also i think
and offers better than S E N Z S A M D this is like the ieee technical committees its
thickness counterpart in the A C M
and he's gonna talk to us today about something that he's also very well known for his been running these
that
workshop on spoken language translation they have been very useful in fostering a lot of collaboration and discussion on this
important problem that can be i W S L T workshop something international workshops and spoken language translation
so he's also well known for that similar to do
yeah
okay thanks for the kind introduction
so an outline of might work i will introduce a diversity for those who do not know and the particular
these stores will focus on the door translation task that we started this year
i will introduce the research on and just behind this track
and describe how we organised an evaluation
on the whole translation
the language resources we provided the evaluation conditions we set
participants of course that took part in the in the workshop
that was have recently san francisco
i will briefly describe
are we run the subjective evaluation for machine translation which is a quite a tricky important aspect
i'd give an overview about results in finding of these exercise
and give some outlook about what we planned for next year and give some conclusions
so i diversity is international workshop on spoken language translation it consists of and evaluation campaign which is wrong before
the worker
and the scientific workshop
i'll absurdity is has been running now for at S
and the main organisers decide F B K R costs were institute of technology and the nation
institute of communication technologies injure
about evaluation campaign
features are that it's around spoken language translation so this is
something which is be clear to a diversity is thus is not covered elsewhere by other evaluations
another aspect is that language resources are
organise i've collected by the organisers and are provided for free to the participants
it's an open evaluation in the sense that it's
develop these benchmarks
for everyone who wants to work on them
and
we carry out for all these evaluations about objective and subjective evaluations
which is not for free of course for us but it's before the participants
concerning the scientific workshop
this is used as a venue to present research papers on
speech spoken language translation machine translation in general
and of course it's a venue for presenting the evaluation results and four participants of devaluation to
present their system paper
describing the systems
we have also invited talks and the discussion on
so if you look at the venues we start that's
two thousand fourteen Q or two then we had it's working or two
gain trying to usually
it's board batteries and
one week ago we were in san francisco
so if you look at the participants over these all these years
we can't for
fifty two different research groups the two parts of course not all not all of them to part to all
evaluations so we have
let me say a core group of around fourteen participants the two part that listing for evaluation
and we have around twenty sites that participated only in
one of the of the event
so you can figure out then
most prominent research groups it working on machine translation but also several
small groups
and components as well
so the aspect of small groups is important because we try also to proposes somehow affordable evaluation tracks so which
do not require
intensive computation power or
large groups to be around
so an overview about
the
so these are the figures about
the parties
so an overview about the
the pollution of our
tasks
so
we consider the was until the two thousand ten a lot of effort on so could be take travelling domain
which are organized separate evaluation
and just recently easier and part of last year we started to colour this tectonic domain
so concerning
is that talk domain we started in a two thousand four and the provided just an evaluation for text translation
over this bitter corpus which is a collection of travelling expressions
collected from books that's tourists for instance use to try to communicate
abroad
so we start to be chinese to english japanese to english
and two thousand five we had that's a track from a using
speech but indeed be provided basically
a transcripts from speech recognition engines
and this was really an exercise we write speech so people write these expressions these sentences
and yeah we cover the gain chinese english
also in english to chinese japanese english
a rabbit english and the korean to english
in two thousand and five we try to launch a new task but with a tumour to be taking two
thousand seven we
or arabic english japanese english then in two thousand eight with arabic english chinese english and
chinese
spanish and this time we propose to so that people translation task so
is chinese english had to go through english sorry chinese spanish or to go through the two english
so chinese english and english to spanish
for their we went on with the rubber english chinese english and we added the new language so
almost every year we add new languages and that we had the turkish so to it to give to english
translation
and the yeah after
we repeated arabic to english but be added french
so
the side let me say this stream of be take
tasks
we as explained we added some more complex tasks
always around travelling expressions
and the we start
modeling one dialogue
in two thousand six over a heartbeat english chinese english japanese english and
italian english
then following yeah we had we repeated the experiment
and
then we moved to
you man
machine sort of human slash machine mediated dialogues
which really reflect its
a translation task
while the former where basically translations of a monolingual dialogues between you months
and the translations where
produced after
yeah are the language and actions we worked on
english japanese chinese japanese
and following yeah chinese and english
and again chinese english in two thousand
then
in two thousand and we run for the first time an exercise without really evaluation and this was on their
doors
and we started with the providing output from speech recognition and translation direction was english to french
and
the following us with these yeah we provided a box machine translation tracks
so from rabbinic
to english chinese to english and french to english and the full end-to-end evaluation from speech
so from providing audio files from english to french
okay
we start to be distorts but it indeed is not really new stuff
for less port organisers
so we had a past work on a speech recognition of lectures in within the european project fame
from two thousand one to two thousand five and yeah some papers and it's a funny that i mean our
first work on a language modeling for towards transcription was on the text corpus
but here it's another acronym because it
it's a database of lectures recorded at eurospeech ninety three and this database was released by ldc and entering two
thousand two
so tense stance for trance lingual english database
and so in the spring project we worked on this database as well people from cost where worked on that
lectures they collected in their own
concerning spoken language translation of lectures or speech as i mentioned the european project tc-star which was a big effort
from two thousand four to two thousand seven
and that which i see many participants he also we had ibm construe technology lindsay a an I B M
and upc taking part and there are several papers about
translation of speeches
you have here a couple of examples so
in
in two thousand now ten i stated we started a new track diversity on down towards
and particularly we focused on these
domain tech talks translation and so what is that first maybe you know so it's
i is and more profit organisation in us that the organisers every to conferences
and a and a host of many i would say short or just brilliant talks over a variety of topics
and for all these all these stores are recorded and
there is a web sites mandarin by the tent
which collects all the videos of the talks the transcripts and also many translations
and all this material is provided with the creative commons i
so you can basically download its use it
so if you look at the translations i mentioned
there is a
community behind these
that's which
heads
with the translations so there are many volunteers who provide translations and here i show you a blocks
that compares
the
for
let me say the most popular languages which are translated
the number of course translated
up to november two thousand and ten and up to november two thousand eleven
and you see that
yeah
many languages for which you have around thousand told translated and if you look at the right side you have
to a global figures
so that works recorded
in english at the at the conferences and the transcriber eight hundred in two thousand ten and the thousand at
in the two thousand eleven so there are about two hundred fifty three hundred talks
processed every year
and the languages which are covered by these to volunteers move from eighty two eighty three
and
the number of these volunteers of these translators move from four thousand to almost seven time
and the number of translations globally provide so which are many more than that was you can find the end
is dropped
which covers around twenty languages move from to
twelve thousand two
twenty four thousand
so it's a really a large number
and as good as you can see you have your menu language many languages covered for which you usually do
not have a many language resources available
especially in terms of parallel corpora within
so let's see what's the from the point of view of these translators what's how we can describe the task
behind the preparing this dollars
and
preparing sorry these translations of talks
so typically so audio's partitions
because you might have music background classes
so used
detect the speech segments and
you split the this the speech into sentences and these are transcribed
so translation works on these on the segmented transcripts
and the as ideal transcript sre it as a D on a translation task
units they should focus on the on the on the simple caption
actually you an example so
ideally the translators should keep seem criminal synchronicity among the amount to
among the about the captions like you see in this example so the same sentence is exactly translated the same
way in french and italian of course you can see if you if you look a bit deeper
that for some languages they allow for some reordering across captions team for instance german which you have these longer
movement so you might have also movement across the captions but of course
this is the sentence boundaries are process
so
it does not give a look at the at the at the torso i show you some videos or nothing
scaring
like before
some audio
yeah there's
i'm a performer
i
i
and
i
i
i'm also
diagnosed
bipolar
every frame that is a positive because the crazy i get on stage
but no entertaining i become
but that was sixteen in san francisco had my breakthrough manic episode in which i thought i was jesus christ
you thought that was
so that was an example
and by the way from our real test set of this year
so if you compare try to compare this kind of content with the previous task and the also
the very popular use a translation task
which has been covered by other evaluation
so this table somehow summarise
so look travelling networks and the news from the communication perspective so we move from a
i don't to monologue communication
the situation is
informant for the travelling
so the in the in the travelling task we have usual tourist asking for information to
people on the street
while in that talks i would say so semiformal sometimes it's i mean
there is even some interaction with the body
one uses different G format
oh the email is
informative for the travelling and for the news would say
convey just information ask for information
well yeah i would say that
i
the aim is more pencils leave so these people are
to my view trying to convince you about something selling you one idea
the style us
different conversational
for travelling when he i would say into training in the detector
why
for my use
domain research with respect to the main problem
is
is limited it's focusing on information requests able to have the so it's troubling to me
that's the general term that is
well for tech talks and use it so really open you have really a variety of possible topics
with respect to the lexical
this might be surprising so travelling is for sure as small so the two
lexical was always around five dollars
it doesn't work that maximum
for ten dollars i would say this medium because
during it to work
mean
the goal is to convey something and they do with using a rather plain language so they use lots of
colloquial or colloquial expressions there is no they're not looking for L accounts i mean
expression unless you look for some great technical at all
so it
smaller differently than the vocabulary that you find in use
and concerning the syntax of the complexity of the of the sentences in terms of structure
you have a very simple structure in that
reading task
we had a maximum and average length of seven words eight words which is
very short
news you may have very long sentences while the tech talks sentences are typically show so
okay
fifteen months
and also the structure is quite a
quite
linear let's say you have not many
nested close
a concerning the challenges that this task
that you faced with this task
from the language modeling point of view
have of course limited in-domain training data
you think that's
the caucuses are a couple of million words which is not useful sites you expect for modeling
language and then you have portability of topics and styles so each door
is different from the others
and has its own topic and maybe also it's
maybe six time
acoustic modeling ugly but speakers
many speakers and you may have speakers with different accents you may have a
for instance
so nonnative speakers
you have different fluency speaking rate style that also
there is no one speaker but to
and you have chosen to cope with noise
so you have members that colour maybe the speech opposes last
and also music like before you
the guy was playing
well
we just like to translation modeling
we can work with this collection
with under-resourced languages
a rabbit constraint is it would not say they're on the resource because i D C collected lots of data
but there are several languages
for which probably are very little power the data around
and also distant languages joe languages for which you have a very different that structures like we did this year
we changing
you can deal with morphologically rich languages so they are well covered here
concerning speech translation specifically
the task
that we the design it's
requires going from spontaneous speech to a partition X
so
which means that you
that you have to provide a polished X pizza with capitalisation and punctuation for
it's which is a
and not treated
starting from speech
then you i have task like detection and that annotation of non-speech events
and
finally i think the ultimate goal here would be to provides subtitling and translation real time
well
the to work is given
of course
we did not they can all these challenges now so for two times in that and we basically
focused on the on the challenges
like
so the tracks we proposed for two thousand eleven O where for the first time one automatic speech recognition
so we
ask participants to provide transcription of doors
from audio to text
in english
we had a spoken language translation track
which requires automatic
translation of dorks from audio
or from the asr outputs we provided into tech
and the from english to french
keep in mind that's what the doors are recorded in english
and then and it's machine translation tracks
and this time
starting from texas
and
from english to french
from arabic to english and chinese to english so notice that for the last two translation directions
we basically started from the human translations
and try to translate back to the
original
so you might
think that is
it's not the best thing you can do that because
it's
has been started has been shown that some artifacts michael
if you are brighton for instance in there
in you know probably
as
as an active
either because you write some text to because you translate some text from some of the language but from our
point of view i mean this kind of artifacts are really not
important it respect to the quality that you can achieve nowadays be machine translation so it's better to have some
data if even if not the ideal data but it's better to use them as they are
okay and finally
again as in a material so provided some system combination track
but for asr output of for mt output
and the participants where given all the
the system outputs from the collected
doing the village
so the sources is important aspects
languages sources so for speech we did not provide data
but a lot to use any publicly available a recordings
they did before thirty first december
two thousand ten
and that's good because the evaluate the data were collected after that date
as parallel data we provided a text parlour didn't sort orders
for about two million words for an english french chinese english arabic english then we made available at the so-called
multi the united nation corpora
which
is around two hundred million running words
for english french chinese english and arabic english
this is i would say a large out of two main corpus and then all the data made available by
the works of machine translation
any particular the
upon a corpus of
english french crawled from the web
and which makes up to eight hundred million words
so it's a very large particle
as monolingual texts besides the modeling one part of the part of the data
we provided or the transcripts of the english talk speech or more than the was the best we can
and you probably they can and are we also allow two years ago but book collection problem
but the english and the french
then be provided datasets for asr sat T and system combination so this
this but
data were collected and checked by different
a specification so concerning conditions
we decided to go for a presegmented input this time for speech recognition it means that
we provided a
just segments with speech
so
segments of non-speech events were just oh
not consider
this time
and the same segments were used for speech recognition speech translation also for machine translation so there were perfectly aligned
in this
the reason for this is also that's
with a lot better means for the system combination with participants
provide out before the sex
same thing
inputs was case then punctuated
for machine translation only
outputs
was not required to be cases and computed for the speech recognition but it was for our machine translation systems
so the output of smt
man machine translation but for spoken language translation the machine translation had to be with punctuation and case information
we have an automatic evaluations on all the tracks
and we don't human evaluation of the machine translation
spoken language translation
as matrix here is the for the matrix we
using
about the schedule
the time and show us to buy which will be a provider training data
that data by the end of june and
in and a four was provided data for system combination and so we basically ask participants to do with first
on the dev sets and rector announced his runs
the tree and then put on the website
for the participants working on system combination and we had a very bad scheduling september in which we run one
after the other asr evaluation
asr system combination
acidity and machine translation evaluation and finally
machine translation system combination
so we allows participants to submit one primary run
in multiparty multiple secondary from
this test sets references were not released
so the evaluation was
done through an immigration server and we are going to keep this test set as a progress test set for
next year
what is good is that the benchmark
available on our website
and that the evaluation server is also going to be a so
everyone can give a try
and participants and what is it
scan what for there to improve the system
participants heads eleven teams so we had fifteen at the beginning but for a withdrawal after
a few months
probably i mean for sure the task is
was more difficult than the one of the of the previous year
so we had
see you so the centre for an extradition organisation and a conceit university difficulty
in germany our research on a constellation of technology number of americans of grenoble the mce cinemas
you
and most of them or
i'm at and i force research
microsoft research us
it shows you of communication
because of technology one and a R W D H german
submissions we received our yeah so we had five submissions
for asr five for smt french english french machine translation was the most popular track seven participant
and then we had for my Q for arabic english and chinese english
and a couple of solutions for system
really
so if you look at the
results for asr here is that is that is
so
if you look at the bottom line we had what was the baseline of last year which at the word
error rate of around twenty two or three
a sense
this year we had to significant improvements
terms of performance
and
you see that also system combination had quite a lot so that move from the best system fifteen not for
percent water rates to thirty three
if you want to give a look at the
if you reminder
excerpt of to what we have seen
you see but
the best transcription asr transcription provides
which i
so it's not really
thus
so we have a rather good performance but
you remind that
the guy is it dated between the microphone at the beginning and
it's not
was not
speaker so if you look at the
performance we have
over the
S chores provided
they're quite what's a uniform so
is you don't there
towards for which you are over
eighty percent with the best system fortunately with the system combination you are always mostly below twenty percent
so our difficult or was the one seventy eight
which is around
fifteen percent for
system combination so i give you a
i show you usually transcripts for the
you just one the one
eight three
i
because of the audio
the corresponding you
a few years ago
i felt like i was not in a row
so i decided follow in the footsteps of the great american philosopher morgan's for a lot
and try something for thirty day
yeah yes actually pretty simple
they could not something you always wanted to actually my
and try
for the next thirty days
it turns out there it is just about a right and a time had you had
or subtract
like watching than it is
from your life
there's a few things that i learned wondering used thirty day challenge
the first one is instead of the month find i forgot
but i'm much more memorable
so it's
really if you
so you have a very good so transcription
now
this is
for what
concerns
speech recognition
i told you now briefly about subject evaluation for mt as you might know you have we have automatic metrics
for
you like
the bleu score is the most
known one but there are others like nice to meet you are
i don't
a word error rate
position independent error rate
i know this matrix basically try to
compare match the mt outputs
against that one or more a reference
translations
it did not know is a matrix is there are there are far from being perfect
if you want to measure or
you want to rank or C compare system outputs you need a to rely on subjective evaluation which is of
course
more expensive and slow to carry out this is why he runs evaluations of
is because
once in a while you need to evaluate you systems
and
it'd be subject evaluations
has been have carried out by
and
coding some experts and asking them either to charge in absolute terms the quality of
machine translation or better
which is
a more focused on the final you want to rent and the outputs
ten which is
better
but considering the right
so what we did this year with respect to produce your is that
nearby is a wiener based experts and the
and run evaluation by crowd sourcing
and
it's not a new methodology because chris cut isn't large stuff that a couple of years ago with
W T with the war for machine translation so we applied to us a new ideas
about
random subject evaluation of it also seen
which are described in this
design
so i briefly tell you what's the what's about
so i'll or
core evaluation
is a now
one sentence pairs so we compare the output of
just to system
and the
we provide to each of these
not all real judges
a reference translation and the output of to say
and that we ask this the charges to rates which is the best one so they are allowed to say
was that i
define the translations are equally good or equally bad or to indicate which is the best transition like in this
case
you have
three judges
two of them choose
system to i'll just the best one and one said that they are equally bad
from these atomic
evaluation we can say that this
the wiener this case is just too
okay
of course this is just one sentence
what we can do is to repeat these evaluation for all sentences over all test sets
and repeat this every time so for sentence one sentence two systems we always between
system one and six
two
and we collect all the charges
judgements and the
and collects
final statistics about the
how many wins by system one how many by system to and how many times
for me this looking at the statistics we can decide that
here that we know is
just because i was
so and this comparisons
is run just for a couple of systems if you have a more system
in the taking part in relation we organised yeah
and from dropping tournaments
and all systems
so
what you see in this table is that you have all the systems on the top
and you have boxes in which you put wins and losses
statistics
and we do we do have a table which shows you all pairwise comparisons that you need to carry out
of course
depending from the direction
and the
for each of these boxes you run one of these
evaluation over the full test set
and you report and all the number of test set by wires wins
and
losses
table
so from these machinery
we can extract
some meaningful statistics for the comparison and use this quite standard
scores
so it is
first code used it's larger than others
and you report your the percentage of test sentences the system a given system was run
that are against any other system
so for each system we compute these
actually as well as the other metric which is
larger than equal which collects which in close box wins and the ties
collected by
and finally we have these had two heads
results
which counts the number of test set pairwise rankings one by the system
so if you look at the
figures of this year you
you can appreciate was the importance of a running subject evaluations because we report what
matrix automatic metrics and the subjective metrics
so as you know they correlate well but
you might have some surprises especially with systems with
our scores
rather closely with automatic metrics
for instance you see a customer and the not gonna very close metrics but
the
rankings may change with subject evaluation
so
what we see as that from one side
we had an improvement in terms of a bleu score with respect to produce you
one an exercise of the same translation direction and we had the maximal bleu score of sixteen to fifty
for this sat task so yeah these are results
of machine translation
starting from speech okay
and ending with the partition text
punctuation capitalisation
so we basically doubled
the bleu score which
which for sure means that
moreover for machine translation english french we had the similar behavior
so
the ranking a given but it was not
confirmed yet so we have a slightly different ranking or of course the correlation is cool so
you can write
machine translation arabic english
see here that in this case the ranking is confirmed you have
more significant difference
among the systems
the bleu score
so if you like this
a large difference
it's very likely that the
subjective ranking is performed
unfortunately you see that system combination do not really
help machine
for machine translation
so
system yeah ended up second
okay
the for chinese english
we have again a result confirmed from the robert english so the ranking of lewis is
for
the
with some slight difference on
bottom part
and this time
the
basically the system combination provided the
best was on times to head to head comparison
so
you see on the bottom line is i two it figure four means that the
justin commission or four
matches
he was
applied to
you one
again some of the other forces
now briefly about the
results we can compare yeah
i'll come from sat
which is translation from english to french so yeah
we have again a simple of given by D is a guy affected the body people a reason
so you might be surprised at
about something
what
san francisco
because these a translation starting from speech recognition and you remind that
in the asr actually before san francisco was not recognise red so i was also
what about
as you're suspicious so i looked into the asr output
of the of that
the best system and indeed he got san francisco
so it means that the system combination output reaches the lowest word error rates eight was brought on
recognizer san francisco while the
one system outlier here
from the best sat relation was right
the quality is reasonable yeah i think you understand what's going on but
can be improved
different stories if you look at machine translation output so from
perfect transcript
clean transcripts
from english into french yeah you have a
rather
oops
translation
i show you know another door
which belongs to the other test sets
sh
i
i don't
i
this is what we call this
but
and everybody agrees with this on the wall of the spectrum
for tracing over the
you want to
right and on a good writer good
the right
yeah
okay
that does look at machine translation from arabic into english of the store so i wanted to show you because
otherwise this plan
that being
unexplainable
so he is used out from around
it's not really especially the beginning nothing to show you
i again you
your grass
and
one was on the vocabulary
but you can get an idea
as you know chinese is much more difficult than rubber
but the okay but look at these up from the bed
i
to apply because
so there is the another colour word which is introduced which is this to do what one
which means the we need
as far as i
just to
again you last meeting it's
it's a reasonable
i'm from the future
no i overview now briefly what are the main findings of these evaluations show a survey told the system papers
by the participants and tried to figure out what where the optimal configuration and maybe ideally have some guidelines about
the future participants are researchers that like to approach
task so if you look at
asr systems from acoustic guitar perspective
participants typically download its
the titles
which can be downloaded
and try to automatically align the manual transcripts with the with the audio
so straightforward procedures
and get around hundred fifty hours and then use these hundred fifty hours for training acoustic models
as we see the technology
instead used other data from the choir project speech lectures they own
and the news
for find a larger amount of hours
what acoustic and linguistic features so participants use up to third order or acoustic features
and
large vectors
twenty or hlda
acoustic model training was done by the best the three systems with the discriminative training and then my and a
minimum phoneme however
criterion
concerning language models foreground interpretations of language models were employed by combining type data and now this one
a multi-pass decoding one of them all the participants from mountain pass decoding
but using models of increased resolution from dawn starts to speaker adaptive a train acoustic models
from trigrams to four gram language models
and also applied if
acoustic models in the process to do some
courses
so they had to use different acoustic features like you employ the neural network based the
was the features alright we can use that if a lexicon
sorry the use of different
lexical
concerning anti
people working in parallel data selection criteria so we provided a lot of
out-of-domain data very large collections like this eight hundred million words
oral data french english
you cannot use it
in a system you run out of memory so that is the best you can do is
to extract meaningful data from it
and
they use entropy over the line
score criteria
people work on multiple word segmentation for arabic english different alignment technique
thus ending model features
the work and
adaptation
for translation tables and language models by using interpolation log-linear interpolation or fill up
interpretation of
the phrase table discriminative training for
translation model is done by microsoft research
developing topic specific translation tables
whose
language models based on neural networks
he pretty class language models by the key to model the style of told
syntax based models based on categorial grammar by this you
and then i would say concerning the comparison between phrase
based hierarchical phrase based smt nothing definite can also some laps compare them
some of them find one was better than other the others find all the middle
sonar here
results
about a diversity two thousand twelve to introduce what's going on
for next year
so about a venue
decided to be
maybe in hong kong
in december
and that if you
and some anticipation about what we are going to plan
we are going to come from the text or task
so the a soundtrack will be again on english and B is time will be lower contrast in france
without using segmentation so you have the challenge recognise the speech and blouses
but the primary round will be on the segment that stuff
so english to french
you're going to repeat the rubber english you are
no thinking about to repeat a chinese english and we plan was to add and you want to exercise that
has to be worked out so you want to support some
longer term effort on our own specific languages so i think people should choose their own preferred language and have
a is the possibility to war repeatedly on these language like we for instance for italian but
so our friends in
two okay
would like to work on tradition so we're going to provide several translation directions here and would provide baselines and
people will be able to separate set once on these different languages
and we don't care really about having comparisons against each other but try to compare against the baseline and will
try to do some comparisons across different languages we have some ideas about
and as we lost a lot of this more players i mean
smaller let's have students for instance we are introducing a new you're using a new small domain task could olympics
corpus kindly provided by nist
japan in this with the corpus of around sixty thousand sentences
domain is travelling in traffic business a diamond support and was collected for the page
yeah we're on a track changes
some conclusions
i diversity or task it's basically subtitling and translation task we add it's a asr and system combination yeah and
you see a the data has been publicly released what resources language resources and benchmarks you can find it on
the website
and which also was subjectivity
what is it once we have eleven partners
random evaluations random story
system on our data i must say B A I so when impressive effort in high quality research on this
track and this witness by the research because you fine
in the proceedings
and significant improvement over the french
each task
so what to take on a at these detectors if you're not sure
to you knew about
i think that's a good interesting ideas
by the participants about how to cope with this problem
is it just will be online soon
the proceedings are going to be published online
we show the importance of subject evaluation
right
right
crowd sourcing
and you have to further normalize of these results because they are fresh one
the
take my invitation to try
this
this task
and eventually join X T
our
yeah some references
for my for my door and
and finally some credits
why the data
people especially wood
setting
we have time for a couple of quick questions
before we go to the next part of the session
oh much thank you very much for a very interesting overview of I W S L T
oh one of the things i guess that's probably very relevant for the community here is that is
i an ongoing debate as to
that's the way to improve speech-to-speech translation
what is the speech people should talk to the translation people or whether they're both all of that are doing
their own stuff and getting you know sort of slamming the two components together every once in a while and
keep their distance from each other
i'm wondering if you had any yeah it in any comments about the impact of having the speech people interact
more or less closely with the energy people as far as advancing the state of the art in this area
as
right
oh
but
you
the work on the
okay
speech recognition
she just
so that yeah
yeah
or
we do this
yeah
for
she
that is
so
right
the
i don't
people
that is
stop
work
actually i had a question the cup maybe three years ago we had this somewhat disappointing discover even we were
doing some of the gale
research that even if the speech group managed to improve accuracy two hundred percent
the translation wasn't good enough for us to meet the objectives of the program at the time
and so in some sense we cut down our speech effort tremendously and put in a lower energies into translation
and the hope was that one of these days translational get good enough that we can start paying attention to
speech again
as the I W estimate the experience been different or do have P do people accurately measure what difference it
would make if
the use the reference transcript on the test data have you looked at that as an evaluation question
yes we are evaluation
but with
transcript and we
i think
if the war
course
five percent
start
like
but
well
of course
machine translation
it's more difficult
sense
actually
very readable result
so we are not
spot in errors here and there
some languages
with
frames before saying
it's far behind
the level
maybe
iteration
the goal set for machine translation work
a beach
all right now this is good to know because i think in gale we were seen that there was no
difference even and divide error rate was fifteen to
maybe not twenty but higher than fifteen percent so it's good to know that and you already starting to different
so there's a reason to
make the speech better
other questions
so let's thank our speaker once again