i

i would say let's say get the session and to make

yeah i'm very happy to invite i introduce our next to invited speaker much elephant or equal

he is from the

from what used to be called I P C but is now the

i think the foundation bruno canceller uses independent research institute associated are located near the university of train yeah

over there he had the speech and language of the human language technology effort is the co director of that

you probably know him for many of his paper

yeah i personally know him from a summer workshop in two thousand seven at johns hopkins very he and the

number of people including philip goon

you have a lot to a lot of very useful software for machine translation that are sort of the genesis

of the moses toolkit

and for those of you don't know moses is to a lot of machine translation but htk is to the

speech recognition people

it's very widely used so that's how i got to know him but of course he has many long other

accomplishments please list i will not lead all of them just to point out that he's been maybe it's the

associate editor for the acm transactions speech and language processing and E foundations and trends in information retrieval

so and he's also i think

and offers better than S E N Z S A M D this is like the ieee technical committees its

thickness counterpart in the A C M

and he's gonna talk to us today about something that he's also very well known for his been running these

that

workshop on spoken language translation they have been very useful in fostering a lot of collaboration and discussion on this

important problem that can be i W S L T workshop something international workshops and spoken language translation

so he's also well known for that similar to do

yeah

okay thanks for the kind introduction

so an outline of might work i will introduce a diversity for those who do not know and the particular

these stores will focus on the door translation task that we started this year

i will introduce the research on and just behind this track

and describe how we organised an evaluation

on the whole translation

the language resources we provided the evaluation conditions we set

participants of course that took part in the in the workshop

that was have recently san francisco

i will briefly describe

are we run the subjective evaluation for machine translation which is a quite a tricky important aspect

i'd give an overview about results in finding of these exercise

and give some outlook about what we planned for next year and give some conclusions

so i diversity is international workshop on spoken language translation it consists of and evaluation campaign which is wrong before

the worker

and the scientific workshop

i'll absurdity is has been running now for at S

and the main organisers decide F B K R costs were institute of technology and the nation

institute of communication technologies injure

about evaluation campaign

features are that it's around spoken language translation so this is

something which is be clear to a diversity is thus is not covered elsewhere by other evaluations

another aspect is that language resources are

organise i've collected by the organisers and are provided for free to the participants

it's an open evaluation in the sense that it's

develop these benchmarks

for everyone who wants to work on them

and

we carry out for all these evaluations about objective and subjective evaluations

which is not for free of course for us but it's before the participants

concerning the scientific workshop

this is used as a venue to present research papers on

speech spoken language translation machine translation in general

and of course it's a venue for presenting the evaluation results and four participants of devaluation to

present their system paper

describing the systems

we have also invited talks and the discussion on

so if you look at the venues we start that's

two thousand fourteen Q or two then we had it's working or two

gain trying to usually

it's board batteries and

one week ago we were in san francisco

so if you look at the participants over these all these years

we can't for

fifty two different research groups the two parts of course not all not all of them to part to all

evaluations so we have

let me say a core group of around fourteen participants the two part that listing for evaluation

and we have around twenty sites that participated only in

one of the of the event

so you can figure out then

most prominent research groups it working on machine translation but also several

small groups

and components as well

so the aspect of small groups is important because we try also to proposes somehow affordable evaluation tracks so which

do not require

intensive computation power or

large groups to be around

so an overview about

the

so these are the figures about

the parties

so an overview about the

the pollution of our

tasks

so

we consider the was until the two thousand ten a lot of effort on so could be take travelling domain

which are organized separate evaluation

and just recently easier and part of last year we started to colour this tectonic domain

so concerning

is that talk domain we started in a two thousand four and the provided just an evaluation for text translation

over this bitter corpus which is a collection of travelling expressions

collected from books that's tourists for instance use to try to communicate

abroad

so we start to be chinese to english japanese to english

and two thousand five we had that's a track from a using

speech but indeed be provided basically

a transcripts from speech recognition engines

and this was really an exercise we write speech so people write these expressions these sentences

and yeah we cover the gain chinese english

also in english to chinese japanese english

a rabbit english and the korean to english

in two thousand and five we try to launch a new task but with a tumour to be taking two

thousand seven we

or arabic english japanese english then in two thousand eight with arabic english chinese english and

chinese

spanish and this time we propose to so that people translation task so

is chinese english had to go through english sorry chinese spanish or to go through the two english

so chinese english and english to spanish

for their we went on with the rubber english chinese english and we added the new language so

almost every year we add new languages and that we had the turkish so to it to give to english

translation

and the yeah after

we repeated arabic to english but be added french

so

the side let me say this stream of be take

tasks

we as explained we added some more complex tasks

always around travelling expressions

and the we start

modeling one dialogue

in two thousand six over a heartbeat english chinese english japanese english and

italian english

then following yeah we had we repeated the experiment

and

then we moved to

you man

machine sort of human slash machine mediated dialogues

which really reflect its

a translation task

while the former where basically translations of a monolingual dialogues between you months

and the translations where

produced after

yeah are the language and actions we worked on

english japanese chinese japanese

and following yeah chinese and english

and again chinese english in two thousand

then

in two thousand and we run for the first time an exercise without really evaluation and this was on their

doors

and we started with the providing output from speech recognition and translation direction was english to french

and

the following us with these yeah we provided a box machine translation tracks

so from rabbinic

to english chinese to english and french to english and the full end-to-end evaluation from speech

so from providing audio files from english to french

okay

we start to be distorts but it indeed is not really new stuff

for less port organisers

so we had a past work on a speech recognition of lectures in within the european project fame

from two thousand one to two thousand five and yeah some papers and it's a funny that i mean our

first work on a language modeling for towards transcription was on the text corpus

but here it's another acronym because it

it's a database of lectures recorded at eurospeech ninety three and this database was released by ldc and entering two

thousand two

so tense stance for trance lingual english database

and so in the spring project we worked on this database as well people from cost where worked on that

lectures they collected in their own

concerning spoken language translation of lectures or speech as i mentioned the european project tc-star which was a big effort

from two thousand four to two thousand seven

and that which i see many participants he also we had ibm construe technology lindsay a an I B M

and upc taking part and there are several papers about

translation of speeches

you have here a couple of examples so

in

in two thousand now ten i stated we started a new track diversity on down towards

and particularly we focused on these

domain tech talks translation and so what is that first maybe you know so it's

i is and more profit organisation in us that the organisers every to conferences

and a and a host of many i would say short or just brilliant talks over a variety of topics

and for all these all these stores are recorded and

there is a web sites mandarin by the tent

which collects all the videos of the talks the transcripts and also many translations

and all this material is provided with the creative commons i

so you can basically download its use it

so if you look at the translations i mentioned

there is a

community behind these

that's which

heads

with the translations so there are many volunteers who provide translations and here i show you a blocks

that compares

the

for

let me say the most popular languages which are translated

the number of course translated

up to november two thousand and ten and up to november two thousand eleven

and you see that

yeah

many languages for which you have around thousand told translated and if you look at the right side you have

to a global figures

so that works recorded

in english at the at the conferences and the transcriber eight hundred in two thousand ten and the thousand at

in the two thousand eleven so there are about two hundred fifty three hundred talks

processed every year

and the languages which are covered by these to volunteers move from eighty two eighty three

and

the number of these volunteers of these translators move from four thousand to almost seven time

and the number of translations globally provide so which are many more than that was you can find the end

is dropped

which covers around twenty languages move from to

twelve thousand two

twenty four thousand

so it's a really a large number

and as good as you can see you have your menu language many languages covered for which you usually do

not have a many language resources available

especially in terms of parallel corpora within

so let's see what's the from the point of view of these translators what's how we can describe the task

behind the preparing this dollars

and

preparing sorry these translations of talks

so typically so audio's partitions

because you might have music background classes

so used

detect the speech segments and

you split the this the speech into sentences and these are transcribed

so translation works on these on the segmented transcripts

and the as ideal transcript sre it as a D on a translation task

units they should focus on the on the on the simple caption

actually you an example so

ideally the translators should keep seem criminal synchronicity among the amount to

among the about the captions like you see in this example so the same sentence is exactly translated the same

way in french and italian of course you can see if you if you look a bit deeper

that for some languages they allow for some reordering across captions team for instance german which you have these longer

movement so you might have also movement across the captions but of course

this is the sentence boundaries are process

so

it does not give a look at the at the at the torso i show you some videos or nothing

scaring

like before

some audio

yeah there's

i'm a performer

i

i

and

i

i

i'm also

diagnosed

bipolar

every frame that is a positive because the crazy i get on stage

but no entertaining i become

but that was sixteen in san francisco had my breakthrough manic episode in which i thought i was jesus christ

you thought that was

so that was an example

and by the way from our real test set of this year

so if you compare try to compare this kind of content with the previous task and the also

the very popular use a translation task

which has been covered by other evaluation

so this table somehow summarise

so look travelling networks and the news from the communication perspective so we move from a

i don't to monologue communication

the situation is

informant for the travelling

so the in the in the travelling task we have usual tourist asking for information to

people on the street

while in that talks i would say so semiformal sometimes it's i mean

there is even some interaction with the body

one uses different G format

oh the email is

informative for the travelling and for the news would say

convey just information ask for information

well yeah i would say that

i

the aim is more pencils leave so these people are

to my view trying to convince you about something selling you one idea

the style us

different conversational

for travelling when he i would say into training in the detector

why

for my use

domain research with respect to the main problem

is

is limited it's focusing on information requests able to have the so it's troubling to me

that's the general term that is

well for tech talks and use it so really open you have really a variety of possible topics

with respect to the lexical

this might be surprising so travelling is for sure as small so the two

lexical was always around five dollars

it doesn't work that maximum

for ten dollars i would say this medium because

during it to work

mean

the goal is to convey something and they do with using a rather plain language so they use lots of

colloquial or colloquial expressions there is no they're not looking for L accounts i mean

expression unless you look for some great technical at all

so it

smaller differently than the vocabulary that you find in use

and concerning the syntax of the complexity of the of the sentences in terms of structure

you have a very simple structure in that

reading task

we had a maximum and average length of seven words eight words which is

very short

news you may have very long sentences while the tech talks sentences are typically show so

okay

fifteen months

and also the structure is quite a

quite

linear let's say you have not many

nested close

a concerning the challenges that this task

that you faced with this task

from the language modeling point of view

have of course limited in-domain training data

you think that's

the caucuses are a couple of million words which is not useful sites you expect for modeling

language and then you have portability of topics and styles so each door

is different from the others

and has its own topic and maybe also it's

maybe six time

acoustic modeling ugly but speakers

many speakers and you may have speakers with different accents you may have a

for instance

so nonnative speakers

you have different fluency speaking rate style that also

there is no one speaker but to

and you have chosen to cope with noise

so you have members that colour maybe the speech opposes last

and also music like before you

the guy was playing

well

we just like to translation modeling

we can work with this collection

with under-resourced languages

a rabbit constraint is it would not say they're on the resource because i D C collected lots of data

but there are several languages

for which probably are very little power the data around

and also distant languages joe languages for which you have a very different that structures like we did this year

we changing

you can deal with morphologically rich languages so they are well covered here

concerning speech translation specifically

the task

that we the design it's

requires going from spontaneous speech to a partition X

so

which means that you

that you have to provide a polished X pizza with capitalisation and punctuation for

it's which is a

and not treated

starting from speech

then you i have task like detection and that annotation of non-speech events

and

finally i think the ultimate goal here would be to provides subtitling and translation real time

well

the to work is given

of course

we did not they can all these challenges now so for two times in that and we basically

focused on the on the challenges

like

so the tracks we proposed for two thousand eleven O where for the first time one automatic speech recognition

so we

ask participants to provide transcription of doors

from audio to text

in english

we had a spoken language translation track

which requires automatic

translation of dorks from audio

or from the asr outputs we provided into tech

and the from english to french

keep in mind that's what the doors are recorded in english

and then and it's machine translation tracks

and this time

starting from texas

and

from english to french

from arabic to english and chinese to english so notice that for the last two translation directions

we basically started from the human translations

and try to translate back to the

original

so you might

think that is

it's not the best thing you can do that because

it's

has been started has been shown that some artifacts michael

if you are brighton for instance in there

in you know probably

as

as an active

either because you write some text to because you translate some text from some of the language but from our

point of view i mean this kind of artifacts are really not

important it respect to the quality that you can achieve nowadays be machine translation so it's better to have some

data if even if not the ideal data but it's better to use them as they are

okay and finally

again as in a material so provided some system combination track

but for asr output of for mt output

and the participants where given all the

the system outputs from the collected

doing the village

so the sources is important aspects

languages sources so for speech we did not provide data

but a lot to use any publicly available a recordings

they did before thirty first december

two thousand ten

and that's good because the evaluate the data were collected after that date

as parallel data we provided a text parlour didn't sort orders

for about two million words for an english french chinese english arabic english then we made available at the so-called

multi the united nation corpora

which

is around two hundred million running words

for english french chinese english and arabic english

this is i would say a large out of two main corpus and then all the data made available by

the works of machine translation

any particular the

upon a corpus of

english french crawled from the web

and which makes up to eight hundred million words

so it's a very large particle

as monolingual texts besides the modeling one part of the part of the data

we provided or the transcripts of the english talk speech or more than the was the best we can

and you probably they can and are we also allow two years ago but book collection problem

but the english and the french

then be provided datasets for asr sat T and system combination so this

this but

data were collected and checked by different

a specification so concerning conditions

we decided to go for a presegmented input this time for speech recognition it means that

we provided a

just segments with speech

so

segments of non-speech events were just oh

not consider

this time

and the same segments were used for speech recognition speech translation also for machine translation so there were perfectly aligned

in this

the reason for this is also that's

with a lot better means for the system combination with participants

provide out before the sex

same thing

inputs was case then punctuated

for machine translation only

outputs

was not required to be cases and computed for the speech recognition but it was for our machine translation systems

so the output of smt

man machine translation but for spoken language translation the machine translation had to be with punctuation and case information

we have an automatic evaluations on all the tracks

and we don't human evaluation of the machine translation

spoken language translation

as matrix here is the for the matrix we

using

about the schedule

the time and show us to buy which will be a provider training data

that data by the end of june and

in and a four was provided data for system combination and so we basically ask participants to do with first

on the dev sets and rector announced his runs

the tree and then put on the website

for the participants working on system combination and we had a very bad scheduling september in which we run one

after the other asr evaluation

asr system combination

acidity and machine translation evaluation and finally

machine translation system combination

so we allows participants to submit one primary run

in multiparty multiple secondary from

this test sets references were not released

so the evaluation was

done through an immigration server and we are going to keep this test set as a progress test set for

next year

what is good is that the benchmark

available on our website

and that the evaluation server is also going to be a so

everyone can give a try

and participants and what is it

scan what for there to improve the system

participants heads eleven teams so we had fifteen at the beginning but for a withdrawal after

a few months

probably i mean for sure the task is

was more difficult than the one of the of the previous year

so we had

see you so the centre for an extradition organisation and a conceit university difficulty

in germany our research on a constellation of technology number of americans of grenoble the mce cinemas

you

and most of them or

i'm at and i force research

microsoft research us

it shows you of communication

because of technology one and a R W D H german

submissions we received our yeah so we had five submissions

for asr five for smt french english french machine translation was the most popular track seven participant

and then we had for my Q for arabic english and chinese english

and a couple of solutions for system

really

so if you look at the

results for asr here is that is that is

so

if you look at the bottom line we had what was the baseline of last year which at the word

error rate of around twenty two or three

a sense

this year we had to significant improvements

terms of performance

and

you see that also system combination had quite a lot so that move from the best system fifteen not for

percent water rates to thirty three

if you want to give a look at the

if you reminder

excerpt of to what we have seen

you see but

the best transcription asr transcription provides

which i

so it's not really

thus

so we have a rather good performance but

you remind that

the guy is it dated between the microphone at the beginning and

it's not

was not

speaker so if you look at the

performance we have

over the

S chores provided

they're quite what's a uniform so

is you don't there

towards for which you are over

eighty percent with the best system fortunately with the system combination you are always mostly below twenty percent

so our difficult or was the one seventy eight

which is around

fifteen percent for

system combination so i give you a

i show you usually transcripts for the

you just one the one

eight three

i

because of the audio

the corresponding you

a few years ago

i felt like i was not in a row

so i decided follow in the footsteps of the great american philosopher morgan's for a lot

and try something for thirty day

yeah yes actually pretty simple

they could not something you always wanted to actually my

and try

for the next thirty days

it turns out there it is just about a right and a time had you had

or subtract

like watching than it is

from your life

there's a few things that i learned wondering used thirty day challenge

the first one is instead of the month find i forgot

but i'm much more memorable

so it's

really if you

so you have a very good so transcription

now

this is

for what

concerns

speech recognition

i told you now briefly about subject evaluation for mt as you might know you have we have automatic metrics

for

you like

the bleu score is the most

known one but there are others like nice to meet you are

i don't

a word error rate

position independent error rate

i know this matrix basically try to

compare match the mt outputs

against that one or more a reference

translations

it did not know is a matrix is there are there are far from being perfect

if you want to measure or

you want to rank or C compare system outputs you need a to rely on subjective evaluation which is of

course

more expensive and slow to carry out this is why he runs evaluations of

is because

once in a while you need to evaluate you systems

and

it'd be subject evaluations

has been have carried out by

and

coding some experts and asking them either to charge in absolute terms the quality of

machine translation or better

which is

a more focused on the final you want to rent and the outputs

ten which is

better

but considering the right

so what we did this year with respect to produce your is that

nearby is a wiener based experts and the

and run evaluation by crowd sourcing

and

it's not a new methodology because chris cut isn't large stuff that a couple of years ago with

W T with the war for machine translation so we applied to us a new ideas

about

random subject evaluation of it also seen

which are described in this

design

so i briefly tell you what's the what's about

so i'll or

core evaluation

is a now

one sentence pairs so we compare the output of

just to system

and the

we provide to each of these

not all real judges

a reference translation and the output of to say

and that we ask this the charges to rates which is the best one so they are allowed to say

was that i

define the translations are equally good or equally bad or to indicate which is the best transition like in this

case

you have

three judges

two of them choose

system to i'll just the best one and one said that they are equally bad

from these atomic

evaluation we can say that this

the wiener this case is just too

okay

of course this is just one sentence

what we can do is to repeat these evaluation for all sentences over all test sets

and repeat this every time so for sentence one sentence two systems we always between

system one and six

two

and we collect all the charges

judgements and the

and collects

final statistics about the

how many wins by system one how many by system to and how many times

for me this looking at the statistics we can decide that

here that we know is

just because i was

so and this comparisons

is run just for a couple of systems if you have a more system

in the taking part in relation we organised yeah

and from dropping tournaments

and all systems

so

what you see in this table is that you have all the systems on the top

and you have boxes in which you put wins and losses

statistics

and we do we do have a table which shows you all pairwise comparisons that you need to carry out

of course

depending from the direction

and the

for each of these boxes you run one of these

evaluation over the full test set

and you report and all the number of test set by wires wins

and

losses

table

so from these machinery

we can extract

some meaningful statistics for the comparison and use this quite standard

scores

so it is

first code used it's larger than others

and you report your the percentage of test sentences the system a given system was run

that are against any other system

so for each system we compute these

actually as well as the other metric which is

larger than equal which collects which in close box wins and the ties

collected by

and finally we have these had two heads

results

which counts the number of test set pairwise rankings one by the system

so if you look at the

figures of this year you

you can appreciate was the importance of a running subject evaluations because we report what

matrix automatic metrics and the subjective metrics

so as you know they correlate well but

you might have some surprises especially with systems with

our scores

rather closely with automatic metrics

for instance you see a customer and the not gonna very close metrics but

the

rankings may change with subject evaluation

so

what we see as that from one side

we had an improvement in terms of a bleu score with respect to produce you

one an exercise of the same translation direction and we had the maximal bleu score of sixteen to fifty

for this sat task so yeah these are results

of machine translation

starting from speech okay

and ending with the partition text

punctuation capitalisation

so we basically doubled

the bleu score which

which for sure means that

moreover for machine translation english french we had the similar behavior

so

the ranking a given but it was not

confirmed yet so we have a slightly different ranking or of course the correlation is cool so

you can write

machine translation arabic english

see here that in this case the ranking is confirmed you have

more significant difference

among the systems

the bleu score

so if you like this

a large difference

it's very likely that the

subjective ranking is performed

unfortunately you see that system combination do not really

help machine

for machine translation

so

system yeah ended up second

okay

the for chinese english

we have again a result confirmed from the robert english so the ranking of lewis is

for

the

with some slight difference on

bottom part

and this time

the

basically the system combination provided the

best was on times to head to head comparison

so

you see on the bottom line is i two it figure four means that the

justin commission or four

matches

he was

applied to

you one

again some of the other forces

now briefly about the

results we can compare yeah

i'll come from sat

which is translation from english to french so yeah

we have again a simple of given by D is a guy affected the body people a reason

so you might be surprised at

about something

what

san francisco

because these a translation starting from speech recognition and you remind that

in the asr actually before san francisco was not recognise red so i was also

what about

as you're suspicious so i looked into the asr output

of the of that

the best system and indeed he got san francisco

so it means that the system combination output reaches the lowest word error rates eight was brought on

recognizer san francisco while the

one system outlier here

from the best sat relation was right

the quality is reasonable yeah i think you understand what's going on but

can be improved

different stories if you look at machine translation output so from

perfect transcript

clean transcripts

from english into french yeah you have a

rather

oops

translation

i show you know another door

which belongs to the other test sets

sh

i

i don't

i

this is what we call this

but

and everybody agrees with this on the wall of the spectrum

for tracing over the

you want to

right and on a good writer good

the right

yeah

okay

that does look at machine translation from arabic into english of the store so i wanted to show you because

otherwise this plan

that being

unexplainable

so he is used out from around

it's not really especially the beginning nothing to show you

i again you

your grass

and

one was on the vocabulary

but you can get an idea

as you know chinese is much more difficult than rubber

but the okay but look at these up from the bed

i

to apply because

so there is the another colour word which is introduced which is this to do what one

which means the we need

as far as i

just to

again you last meeting it's

it's a reasonable

i'm from the future

no i overview now briefly what are the main findings of these evaluations show a survey told the system papers

by the participants and tried to figure out what where the optimal configuration and maybe ideally have some guidelines about

the future participants are researchers that like to approach

task so if you look at

asr systems from acoustic guitar perspective

participants typically download its

the titles

which can be downloaded

and try to automatically align the manual transcripts with the with the audio

so straightforward procedures

and get around hundred fifty hours and then use these hundred fifty hours for training acoustic models

as we see the technology

instead used other data from the choir project speech lectures they own

and the news

for find a larger amount of hours

what acoustic and linguistic features so participants use up to third order or acoustic features

and

large vectors

twenty or hlda

acoustic model training was done by the best the three systems with the discriminative training and then my and a

minimum phoneme however

criterion

concerning language models foreground interpretations of language models were employed by combining type data and now this one

a multi-pass decoding one of them all the participants from mountain pass decoding

but using models of increased resolution from dawn starts to speaker adaptive a train acoustic models

from trigrams to four gram language models

and also applied if

acoustic models in the process to do some

courses

so they had to use different acoustic features like you employ the neural network based the

was the features alright we can use that if a lexicon

sorry the use of different

lexical

concerning anti

people working in parallel data selection criteria so we provided a lot of

out-of-domain data very large collections like this eight hundred million words

oral data french english

you cannot use it

in a system you run out of memory so that is the best you can do is

to extract meaningful data from it

and

they use entropy over the line

score criteria

people work on multiple word segmentation for arabic english different alignment technique

thus ending model features

the work and

adaptation

for translation tables and language models by using interpolation log-linear interpolation or fill up

interpretation of

the phrase table discriminative training for

translation model is done by microsoft research

developing topic specific translation tables

whose

language models based on neural networks

he pretty class language models by the key to model the style of told

syntax based models based on categorial grammar by this you

and then i would say concerning the comparison between phrase

based hierarchical phrase based smt nothing definite can also some laps compare them

some of them find one was better than other the others find all the middle

sonar here

results

about a diversity two thousand twelve to introduce what's going on

for next year

so about a venue

decided to be

maybe in hong kong

in december

and that if you

and some anticipation about what we are going to plan

we are going to come from the text or task

so the a soundtrack will be again on english and B is time will be lower contrast in france

without using segmentation so you have the challenge recognise the speech and blouses

but the primary round will be on the segment that stuff

so english to french

you're going to repeat the rubber english you are

no thinking about to repeat a chinese english and we plan was to add and you want to exercise that

has to be worked out so you want to support some

longer term effort on our own specific languages so i think people should choose their own preferred language and have

a is the possibility to war repeatedly on these language like we for instance for italian but

so our friends in

two okay

would like to work on tradition so we're going to provide several translation directions here and would provide baselines and

people will be able to separate set once on these different languages

and we don't care really about having comparisons against each other but try to compare against the baseline and will

try to do some comparisons across different languages we have some ideas about

and as we lost a lot of this more players i mean

smaller let's have students for instance we are introducing a new you're using a new small domain task could olympics

corpus kindly provided by nist

japan in this with the corpus of around sixty thousand sentences

domain is travelling in traffic business a diamond support and was collected for the page

yeah we're on a track changes

some conclusions

i diversity or task it's basically subtitling and translation task we add it's a asr and system combination yeah and

you see a the data has been publicly released what resources language resources and benchmarks you can find it on

the website

and which also was subjectivity

what is it once we have eleven partners

random evaluations random story

system on our data i must say B A I so when impressive effort in high quality research on this

track and this witness by the research because you fine

in the proceedings

and significant improvement over the french

each task

so what to take on a at these detectors if you're not sure

to you knew about

i think that's a good interesting ideas

by the participants about how to cope with this problem

is it just will be online soon

the proceedings are going to be published online

we show the importance of subject evaluation

right

right

crowd sourcing

and you have to further normalize of these results because they are fresh one

the

take my invitation to try

this

this task

and eventually join X T

our

yeah some references

for my for my door and

and finally some credits

why the data

people especially wood

setting

we have time for a couple of quick questions

before we go to the next part of the session

oh much thank you very much for a very interesting overview of I W S L T

oh one of the things i guess that's probably very relevant for the community here is that is

i an ongoing debate as to

that's the way to improve speech-to-speech translation

what is the speech people should talk to the translation people or whether they're both all of that are doing

their own stuff and getting you know sort of slamming the two components together every once in a while and

keep their distance from each other

i'm wondering if you had any yeah it in any comments about the impact of having the speech people interact

more or less closely with the energy people as far as advancing the state of the art in this area

as

right

oh

but

you

the work on the

okay

speech recognition

she just

so that yeah

yeah

or

we do this

yeah

for

she

that is

so

right

the

i don't

people

that is

stop

work

actually i had a question the cup maybe three years ago we had this somewhat disappointing discover even we were

doing some of the gale

research that even if the speech group managed to improve accuracy two hundred percent

the translation wasn't good enough for us to meet the objectives of the program at the time

and so in some sense we cut down our speech effort tremendously and put in a lower energies into translation

and the hope was that one of these days translational get good enough that we can start paying attention to

speech again

as the I W estimate the experience been different or do have P do people accurately measure what difference it

would make if

the use the reference transcript on the test data have you looked at that as an evaluation question

yes we are evaluation

but with

transcript and we

i think

if the war

course

five percent

start

like

but

well

of course

machine translation

it's more difficult

sense

actually

very readable result

so we are not

spot in errors here and there

some languages

with

frames before saying

it's far behind

the level

maybe

iteration

the goal set for machine translation work

a beach

all right now this is good to know because i think in gale we were seen that there was no

difference even and divide error rate was fifteen to

maybe not twenty but higher than fifteen percent so it's good to know that and you already starting to different

so there's a reason to

make the speech better

other questions

so let's thank our speaker once again