0:00:12 | i |
---|
0:00:13 | i would say let's say get the session and to make |
---|
0:00:17 | yeah i'm very happy to invite i introduce our next to invited speaker much elephant or equal |
---|
0:00:23 | he is from the |
---|
0:00:25 | from what used to be called I P C but is now the |
---|
0:00:28 | i think the foundation bruno canceller uses independent research institute associated are located near the university of train yeah |
---|
0:00:36 | over there he had the speech and language of the human language technology effort is the co director of that |
---|
0:00:43 | you probably know him for many of his paper |
---|
0:00:47 | yeah i personally know him from a summer workshop in two thousand seven at johns hopkins very he and the |
---|
0:00:52 | number of people including philip goon |
---|
0:00:55 | you have a lot to a lot of very useful software for machine translation that are sort of the genesis |
---|
0:01:01 | of the moses toolkit |
---|
0:01:02 | and for those of you don't know moses is to a lot of machine translation but htk is to the |
---|
0:01:07 | speech recognition people |
---|
0:01:09 | it's very widely used so that's how i got to know him but of course he has many long other |
---|
0:01:13 | accomplishments please list i will not lead all of them just to point out that he's been maybe it's the |
---|
0:01:18 | associate editor for the acm transactions speech and language processing and E foundations and trends in information retrieval |
---|
0:01:26 | so and he's also i think |
---|
0:01:28 | and offers better than S E N Z S A M D this is like the ieee technical committees its |
---|
0:01:34 | thickness counterpart in the A C M |
---|
0:01:36 | and he's gonna talk to us today about something that he's also very well known for his been running these |
---|
0:01:42 | that |
---|
0:01:43 | workshop on spoken language translation they have been very useful in fostering a lot of collaboration and discussion on this |
---|
0:01:49 | important problem that can be i W S L T workshop something international workshops and spoken language translation |
---|
0:01:56 | so he's also well known for that similar to do |
---|
0:02:00 | yeah |
---|
0:02:01 | okay thanks for the kind introduction |
---|
0:02:08 | so an outline of might work i will introduce a diversity for those who do not know and the particular |
---|
0:02:15 | these stores will focus on the door translation task that we started this year |
---|
0:02:20 | i will introduce the research on and just behind this track |
---|
0:02:25 | and describe how we organised an evaluation |
---|
0:02:29 | on the whole translation |
---|
0:02:31 | the language resources we provided the evaluation conditions we set |
---|
0:02:37 | participants of course that took part in the in the workshop |
---|
0:02:41 | that was have recently san francisco |
---|
0:02:45 | i will briefly describe |
---|
0:02:48 | are we run the subjective evaluation for machine translation which is a quite a tricky important aspect |
---|
0:02:55 | i'd give an overview about results in finding of these exercise |
---|
0:03:00 | and give some outlook about what we planned for next year and give some conclusions |
---|
0:03:10 | so i diversity is international workshop on spoken language translation it consists of and evaluation campaign which is wrong before |
---|
0:03:19 | the worker |
---|
0:03:21 | and the scientific workshop |
---|
0:03:24 | i'll absurdity is has been running now for at S |
---|
0:03:29 | and the main organisers decide F B K R costs were institute of technology and the nation |
---|
0:03:37 | institute of communication technologies injure |
---|
0:03:40 | about evaluation campaign |
---|
0:03:43 | features are that it's around spoken language translation so this is |
---|
0:03:48 | something which is be clear to a diversity is thus is not covered elsewhere by other evaluations |
---|
0:03:56 | another aspect is that language resources are |
---|
0:04:00 | organise i've collected by the organisers and are provided for free to the participants |
---|
0:04:05 | it's an open evaluation in the sense that it's |
---|
0:04:11 | develop these benchmarks |
---|
0:04:12 | for everyone who wants to work on them |
---|
0:04:15 | and |
---|
0:04:17 | we carry out for all these evaluations about objective and subjective evaluations |
---|
0:04:24 | which is not for free of course for us but it's before the participants |
---|
0:04:29 | concerning the scientific workshop |
---|
0:04:33 | this is used as a venue to present research papers on |
---|
0:04:37 | speech spoken language translation machine translation in general |
---|
0:04:40 | and of course it's a venue for presenting the evaluation results and four participants of devaluation to |
---|
0:04:47 | present their system paper |
---|
0:04:49 | describing the systems |
---|
0:04:51 | we have also invited talks and the discussion on |
---|
0:04:58 | so if you look at the venues we start that's |
---|
0:05:01 | two thousand fourteen Q or two then we had it's working or two |
---|
0:05:04 | gain trying to usually |
---|
0:05:07 | it's board batteries and |
---|
0:05:09 | one week ago we were in san francisco |
---|
0:05:14 | so if you look at the participants over these all these years |
---|
0:05:18 | we can't for |
---|
0:05:21 | fifty two different research groups the two parts of course not all not all of them to part to all |
---|
0:05:28 | evaluations so we have |
---|
0:05:30 | let me say a core group of around fourteen participants the two part that listing for evaluation |
---|
0:05:36 | and we have around twenty sites that participated only in |
---|
0:05:42 | one of the of the event |
---|
0:05:46 | so you can figure out then |
---|
0:05:48 | most prominent research groups it working on machine translation but also several |
---|
0:05:53 | small groups |
---|
0:05:55 | and components as well |
---|
0:05:58 | so the aspect of small groups is important because we try also to proposes somehow affordable evaluation tracks so which |
---|
0:06:07 | do not require |
---|
0:06:08 | intensive computation power or |
---|
0:06:12 | large groups to be around |
---|
0:06:16 | so an overview about |
---|
0:06:20 | the |
---|
0:06:23 | so these are the figures about |
---|
0:06:25 | the parties |
---|
0:06:29 | so an overview about the |
---|
0:06:32 | the pollution of our |
---|
0:06:36 | tasks |
---|
0:06:37 | so |
---|
0:06:39 | we consider the was until the two thousand ten a lot of effort on so could be take travelling domain |
---|
0:06:48 | which are organized separate evaluation |
---|
0:06:50 | and just recently easier and part of last year we started to colour this tectonic domain |
---|
0:07:00 | so concerning |
---|
0:07:04 | is that talk domain we started in a two thousand four and the provided just an evaluation for text translation |
---|
0:07:13 | over this bitter corpus which is a collection of travelling expressions |
---|
0:07:17 | collected from books that's tourists for instance use to try to communicate |
---|
0:07:24 | abroad |
---|
0:07:25 | so we start to be chinese to english japanese to english |
---|
0:07:29 | and two thousand five we had that's a track from a using |
---|
0:07:34 | speech but indeed be provided basically |
---|
0:07:38 | a transcripts from speech recognition engines |
---|
0:07:42 | and this was really an exercise we write speech so people write these expressions these sentences |
---|
0:07:49 | and yeah we cover the gain chinese english |
---|
0:07:52 | also in english to chinese japanese english |
---|
0:07:57 | a rabbit english and the korean to english |
---|
0:08:03 | in two thousand and five we try to launch a new task but with a tumour to be taking two |
---|
0:08:10 | thousand seven we |
---|
0:08:12 | or arabic english japanese english then in two thousand eight with arabic english chinese english and |
---|
0:08:21 | chinese |
---|
0:08:22 | spanish and this time we propose to so that people translation task so |
---|
0:08:27 | is chinese english had to go through english sorry chinese spanish or to go through the two english |
---|
0:08:35 | so chinese english and english to spanish |
---|
0:08:39 | for their we went on with the rubber english chinese english and we added the new language so |
---|
0:08:46 | almost every year we add new languages and that we had the turkish so to it to give to english |
---|
0:08:52 | translation |
---|
0:08:53 | and the yeah after |
---|
0:08:57 | we repeated arabic to english but be added french |
---|
0:09:03 | so |
---|
0:09:05 | the side let me say this stream of be take |
---|
0:09:11 | tasks |
---|
0:09:12 | we as explained we added some more complex tasks |
---|
0:09:17 | always around travelling expressions |
---|
0:09:19 | and the we start |
---|
0:09:22 | modeling one dialogue |
---|
0:09:23 | in two thousand six over a heartbeat english chinese english japanese english and |
---|
0:09:30 | italian english |
---|
0:09:33 | then following yeah we had we repeated the experiment |
---|
0:09:39 | and |
---|
0:09:39 | then we moved to |
---|
0:09:42 | you man |
---|
0:09:44 | machine sort of human slash machine mediated dialogues |
---|
0:09:48 | which really reflect its |
---|
0:09:52 | a translation task |
---|
0:09:54 | while the former where basically translations of a monolingual dialogues between you months |
---|
0:10:01 | and the translations where |
---|
0:10:03 | produced after |
---|
0:10:06 | yeah are the language and actions we worked on |
---|
0:10:11 | english japanese chinese japanese |
---|
0:10:14 | and following yeah chinese and english |
---|
0:10:17 | and again chinese english in two thousand |
---|
0:10:20 | then |
---|
0:10:21 | in two thousand and we run for the first time an exercise without really evaluation and this was on their |
---|
0:10:29 | doors |
---|
0:10:30 | and we started with the providing output from speech recognition and translation direction was english to french |
---|
0:10:39 | and |
---|
0:10:40 | the following us with these yeah we provided a box machine translation tracks |
---|
0:10:46 | so from rabbinic |
---|
0:10:48 | to english chinese to english and french to english and the full end-to-end evaluation from speech |
---|
0:10:55 | so from providing audio files from english to french |
---|
0:11:02 | okay |
---|
0:11:04 | we start to be distorts but it indeed is not really new stuff |
---|
0:11:09 | for less port organisers |
---|
0:11:12 | so we had a past work on a speech recognition of lectures in within the european project fame |
---|
0:11:19 | from two thousand one to two thousand five and yeah some papers and it's a funny that i mean our |
---|
0:11:25 | first work on a language modeling for towards transcription was on the text corpus |
---|
0:11:31 | but here it's another acronym because it |
---|
0:11:37 | it's a database of lectures recorded at eurospeech ninety three and this database was released by ldc and entering two |
---|
0:11:46 | thousand two |
---|
0:11:47 | so tense stance for trance lingual english database |
---|
0:11:51 | and so in the spring project we worked on this database as well people from cost where worked on that |
---|
0:11:59 | lectures they collected in their own |
---|
0:12:03 | concerning spoken language translation of lectures or speech as i mentioned the european project tc-star which was a big effort |
---|
0:12:13 | from two thousand four to two thousand seven |
---|
0:12:15 | and that which i see many participants he also we had ibm construe technology lindsay a an I B M |
---|
0:12:27 | and upc taking part and there are several papers about |
---|
0:12:32 | translation of speeches |
---|
0:12:36 | you have here a couple of examples so |
---|
0:12:40 | in |
---|
0:12:42 | in two thousand now ten i stated we started a new track diversity on down towards |
---|
0:12:49 | and particularly we focused on these |
---|
0:12:53 | domain tech talks translation and so what is that first maybe you know so it's |
---|
0:13:01 | i is and more profit organisation in us that the organisers every to conferences |
---|
0:13:10 | and a and a host of many i would say short or just brilliant talks over a variety of topics |
---|
0:13:18 | and for all these all these stores are recorded and |
---|
0:13:23 | there is a web sites mandarin by the tent |
---|
0:13:27 | which collects all the videos of the talks the transcripts and also many translations |
---|
0:13:33 | and all this material is provided with the creative commons i |
---|
0:13:38 | so you can basically download its use it |
---|
0:13:44 | so if you look at the translations i mentioned |
---|
0:13:48 | there is a |
---|
0:13:51 | community behind these |
---|
0:13:54 | that's which |
---|
0:13:56 | heads |
---|
0:13:57 | with the translations so there are many volunteers who provide translations and here i show you a blocks |
---|
0:14:04 | that compares |
---|
0:14:06 | the |
---|
0:14:07 | for |
---|
0:14:09 | let me say the most popular languages which are translated |
---|
0:14:14 | the number of course translated |
---|
0:14:17 | up to november two thousand and ten and up to november two thousand eleven |
---|
0:14:22 | and you see that |
---|
0:14:25 | yeah |
---|
0:14:26 | many languages for which you have around thousand told translated and if you look at the right side you have |
---|
0:14:32 | to a global figures |
---|
0:14:34 | so that works recorded |
---|
0:14:37 | in english at the at the conferences and the transcriber eight hundred in two thousand ten and the thousand at |
---|
0:14:46 | in the two thousand eleven so there are about two hundred fifty three hundred talks |
---|
0:14:51 | processed every year |
---|
0:14:53 | and the languages which are covered by these to volunteers move from eighty two eighty three |
---|
0:14:59 | and |
---|
0:15:00 | the number of these volunteers of these translators move from four thousand to almost seven time |
---|
0:15:07 | and the number of translations globally provide so which are many more than that was you can find the end |
---|
0:15:12 | is dropped |
---|
0:15:14 | which covers around twenty languages move from to |
---|
0:15:17 | twelve thousand two |
---|
0:15:20 | twenty four thousand |
---|
0:15:22 | so it's a really a large number |
---|
0:15:25 | and as good as you can see you have your menu language many languages covered for which you usually do |
---|
0:15:32 | not have a many language resources available |
---|
0:15:35 | especially in terms of parallel corpora within |
---|
0:15:42 | so let's see what's the from the point of view of these translators what's how we can describe the task |
---|
0:15:50 | behind the preparing this dollars |
---|
0:15:53 | and |
---|
0:15:53 | preparing sorry these translations of talks |
---|
0:15:56 | so typically so audio's partitions |
---|
0:15:58 | because you might have music background classes |
---|
0:16:02 | so used |
---|
0:16:03 | detect the speech segments and |
---|
0:16:08 | you split the this the speech into sentences and these are transcribed |
---|
0:16:15 | so translation works on these on the segmented transcripts |
---|
0:16:20 | and the as ideal transcript sre it as a D on a translation task |
---|
0:16:26 | units they should focus on the on the on the simple caption |
---|
0:16:32 | actually you an example so |
---|
0:16:35 | ideally the translators should keep seem criminal synchronicity among the amount to |
---|
0:16:43 | among the about the captions like you see in this example so the same sentence is exactly translated the same |
---|
0:16:49 | way in french and italian of course you can see if you if you look a bit deeper |
---|
0:16:55 | that for some languages they allow for some reordering across captions team for instance german which you have these longer |
---|
0:17:03 | movement so you might have also movement across the captions but of course |
---|
0:17:08 | this is the sentence boundaries are process |
---|
0:17:15 | so |
---|
0:17:16 | it does not give a look at the at the at the torso i show you some videos or nothing |
---|
0:17:22 | scaring |
---|
0:17:23 | like before |
---|
0:17:31 | some audio |
---|
0:17:43 | yeah there's |
---|
0:17:45 | i'm a performer |
---|
0:17:49 | i |
---|
0:17:52 | i |
---|
0:17:55 | and |
---|
0:18:00 | i |
---|
0:18:05 | i |
---|
0:18:07 | i'm also |
---|
0:18:09 | diagnosed |
---|
0:18:10 | bipolar |
---|
0:18:17 | every frame that is a positive because the crazy i get on stage |
---|
0:18:21 | but no entertaining i become |
---|
0:18:25 | but that was sixteen in san francisco had my breakthrough manic episode in which i thought i was jesus christ |
---|
0:18:32 | you thought that was |
---|
0:18:35 | so that was an example |
---|
0:18:38 | and by the way from our real test set of this year |
---|
0:18:41 | so if you compare try to compare this kind of content with the previous task and the also |
---|
0:18:51 | the very popular use a translation task |
---|
0:18:54 | which has been covered by other evaluation |
---|
0:18:59 | so this table somehow summarise |
---|
0:19:03 | so look travelling networks and the news from the communication perspective so we move from a |
---|
0:19:10 | i don't to monologue communication |
---|
0:19:15 | the situation is |
---|
0:19:17 | informant for the travelling |
---|
0:19:19 | so the in the in the travelling task we have usual tourist asking for information to |
---|
0:19:24 | people on the street |
---|
0:19:26 | while in that talks i would say so semiformal sometimes it's i mean |
---|
0:19:32 | there is even some interaction with the body |
---|
0:19:35 | one uses different G format |
---|
0:19:38 | oh the email is |
---|
0:19:40 | informative for the travelling and for the news would say |
---|
0:19:45 | convey just information ask for information |
---|
0:19:48 | well yeah i would say that |
---|
0:19:51 | i |
---|
0:19:52 | the aim is more pencils leave so these people are |
---|
0:19:56 | to my view trying to convince you about something selling you one idea |
---|
0:20:01 | the style us |
---|
0:20:03 | different conversational |
---|
0:20:04 | for travelling when he i would say into training in the detector |
---|
0:20:10 | why |
---|
0:20:12 | for my use |
---|
0:20:14 | domain research with respect to the main problem |
---|
0:20:19 | is |
---|
0:20:19 | is limited it's focusing on information requests able to have the so it's troubling to me |
---|
0:20:25 | that's the general term that is |
---|
0:20:30 | well for tech talks and use it so really open you have really a variety of possible topics |
---|
0:20:36 | with respect to the lexical |
---|
0:20:39 | this might be surprising so travelling is for sure as small so the two |
---|
0:20:44 | lexical was always around five dollars |
---|
0:20:47 | it doesn't work that maximum |
---|
0:20:50 | for ten dollars i would say this medium because |
---|
0:20:54 | during it to work |
---|
0:20:57 | mean |
---|
0:21:00 | the goal is to convey something and they do with using a rather plain language so they use lots of |
---|
0:21:06 | colloquial or colloquial expressions there is no they're not looking for L accounts i mean |
---|
0:21:13 | expression unless you look for some great technical at all |
---|
0:21:16 | so it |
---|
0:21:18 | smaller differently than the vocabulary that you find in use |
---|
0:21:21 | and concerning the syntax of the complexity of the of the sentences in terms of structure |
---|
0:21:27 | you have a very simple structure in that |
---|
0:21:31 | reading task |
---|
0:21:32 | we had a maximum and average length of seven words eight words which is |
---|
0:21:37 | very short |
---|
0:21:39 | news you may have very long sentences while the tech talks sentences are typically show so |
---|
0:21:46 | okay |
---|
0:21:47 | fifteen months |
---|
0:21:49 | and also the structure is quite a |
---|
0:21:51 | quite |
---|
0:21:52 | linear let's say you have not many |
---|
0:21:55 | nested close |
---|
0:22:01 | a concerning the challenges that this task |
---|
0:22:04 | that you faced with this task |
---|
0:22:06 | from the language modeling point of view |
---|
0:22:10 | have of course limited in-domain training data |
---|
0:22:15 | you think that's |
---|
0:22:16 | the caucuses are a couple of million words which is not useful sites you expect for modeling |
---|
0:22:22 | language and then you have portability of topics and styles so each door |
---|
0:22:29 | is different from the others |
---|
0:22:30 | and has its own topic and maybe also it's |
---|
0:22:34 | maybe six time |
---|
0:22:37 | acoustic modeling ugly but speakers |
---|
0:22:40 | many speakers and you may have speakers with different accents you may have a |
---|
0:22:48 | for instance |
---|
0:22:49 | so nonnative speakers |
---|
0:22:51 | you have different fluency speaking rate style that also |
---|
0:22:58 | there is no one speaker but to |
---|
0:23:02 | and you have chosen to cope with noise |
---|
0:23:05 | so you have members that colour maybe the speech opposes last |
---|
0:23:10 | and also music like before you |
---|
0:23:12 | the guy was playing |
---|
0:23:14 | well |
---|
0:23:15 | we just like to translation modeling |
---|
0:23:19 | we can work with this collection |
---|
0:23:23 | with under-resourced languages |
---|
0:23:28 | a rabbit constraint is it would not say they're on the resource because i D C collected lots of data |
---|
0:23:32 | but there are several languages |
---|
0:23:35 | for which probably are very little power the data around |
---|
0:23:39 | and also distant languages joe languages for which you have a very different that structures like we did this year |
---|
0:23:45 | we changing |
---|
0:23:49 | you can deal with morphologically rich languages so they are well covered here |
---|
0:23:54 | concerning speech translation specifically |
---|
0:23:57 | the task |
---|
0:23:59 | that we the design it's |
---|
0:24:03 | requires going from spontaneous speech to a partition X |
---|
0:24:08 | so |
---|
0:24:11 | which means that you |
---|
0:24:13 | that you have to provide a polished X pizza with capitalisation and punctuation for |
---|
0:24:18 | it's which is a |
---|
0:24:19 | and not treated |
---|
0:24:21 | starting from speech |
---|
0:24:22 | then you i have task like detection and that annotation of non-speech events |
---|
0:24:30 | and |
---|
0:24:32 | finally i think the ultimate goal here would be to provides subtitling and translation real time |
---|
0:24:39 | well |
---|
0:24:40 | the to work is given |
---|
0:24:42 | of course |
---|
0:24:43 | we did not they can all these challenges now so for two times in that and we basically |
---|
0:24:51 | focused on the on the challenges |
---|
0:24:55 | like |
---|
0:25:02 | so the tracks we proposed for two thousand eleven O where for the first time one automatic speech recognition |
---|
0:25:11 | so we |
---|
0:25:13 | ask participants to provide transcription of doors |
---|
0:25:17 | from audio to text |
---|
0:25:18 | in english |
---|
0:25:20 | we had a spoken language translation track |
---|
0:25:23 | which requires automatic |
---|
0:25:25 | translation of dorks from audio |
---|
0:25:27 | or from the asr outputs we provided into tech |
---|
0:25:34 | and the from english to french |
---|
0:25:36 | keep in mind that's what the doors are recorded in english |
---|
0:25:40 | and then and it's machine translation tracks |
---|
0:25:44 | and this time |
---|
0:25:46 | starting from texas |
---|
0:25:48 | and |
---|
0:25:50 | from english to french |
---|
0:25:52 | from arabic to english and chinese to english so notice that for the last two translation directions |
---|
0:26:01 | we basically started from the human translations |
---|
0:26:06 | and try to translate back to the |
---|
0:26:10 | original |
---|
0:26:12 | so you might |
---|
0:26:16 | think that is |
---|
0:26:17 | it's not the best thing you can do that because |
---|
0:26:19 | it's |
---|
0:26:20 | has been started has been shown that some artifacts michael |
---|
0:26:26 | if you are brighton for instance in there |
---|
0:26:30 | in you know probably |
---|
0:26:33 | as |
---|
0:26:34 | as an active |
---|
0:26:35 | either because you write some text to because you translate some text from some of the language but from our |
---|
0:26:41 | point of view i mean this kind of artifacts are really not |
---|
0:26:45 | important it respect to the quality that you can achieve nowadays be machine translation so it's better to have some |
---|
0:26:51 | data if even if not the ideal data but it's better to use them as they are |
---|
0:27:01 | okay and finally |
---|
0:27:03 | again as in a material so provided some system combination track |
---|
0:27:09 | but for asr output of for mt output |
---|
0:27:13 | and the participants where given all the |
---|
0:27:18 | the system outputs from the collected |
---|
0:27:22 | doing the village |
---|
0:27:26 | so the sources is important aspects |
---|
0:27:30 | languages sources so for speech we did not provide data |
---|
0:27:36 | but a lot to use any publicly available a recordings |
---|
0:27:40 | they did before thirty first december |
---|
0:27:43 | two thousand ten |
---|
0:27:45 | and that's good because the evaluate the data were collected after that date |
---|
0:27:53 | as parallel data we provided a text parlour didn't sort orders |
---|
0:27:59 | for about two million words for an english french chinese english arabic english then we made available at the so-called |
---|
0:28:08 | multi the united nation corpora |
---|
0:28:10 | which |
---|
0:28:12 | is around two hundred million running words |
---|
0:28:17 | for english french chinese english and arabic english |
---|
0:28:20 | this is i would say a large out of two main corpus and then all the data made available by |
---|
0:28:27 | the works of machine translation |
---|
0:28:31 | any particular the |
---|
0:28:33 | upon a corpus of |
---|
0:28:34 | english french crawled from the web |
---|
0:28:37 | and which makes up to eight hundred million words |
---|
0:28:42 | so it's a very large particle |
---|
0:28:45 | as monolingual texts besides the modeling one part of the part of the data |
---|
0:28:51 | we provided or the transcripts of the english talk speech or more than the was the best we can |
---|
0:28:59 | and you probably they can and are we also allow two years ago but book collection problem |
---|
0:29:06 | but the english and the french |
---|
0:29:09 | then be provided datasets for asr sat T and system combination so this |
---|
0:29:17 | this but |
---|
0:29:18 | data were collected and checked by different |
---|
0:29:24 | a specification so concerning conditions |
---|
0:29:30 | we decided to go for a presegmented input this time for speech recognition it means that |
---|
0:29:37 | we provided a |
---|
0:29:40 | just segments with speech |
---|
0:29:43 | so |
---|
0:29:45 | segments of non-speech events were just oh |
---|
0:29:48 | not consider |
---|
0:29:49 | this time |
---|
0:29:53 | and the same segments were used for speech recognition speech translation also for machine translation so there were perfectly aligned |
---|
0:29:59 | in this |
---|
0:30:00 | the reason for this is also that's |
---|
0:30:02 | with a lot better means for the system combination with participants |
---|
0:30:07 | provide out before the sex |
---|
0:30:09 | same thing |
---|
0:30:12 | inputs was case then punctuated |
---|
0:30:17 | for machine translation only |
---|
0:30:21 | outputs |
---|
0:30:25 | was not required to be cases and computed for the speech recognition but it was for our machine translation systems |
---|
0:30:34 | so the output of smt |
---|
0:30:37 | man machine translation but for spoken language translation the machine translation had to be with punctuation and case information |
---|
0:30:44 | we have an automatic evaluations on all the tracks |
---|
0:30:48 | and we don't human evaluation of the machine translation |
---|
0:30:53 | spoken language translation |
---|
0:30:54 | as matrix here is the for the matrix we |
---|
0:30:58 | using |
---|
0:31:05 | about the schedule |
---|
0:31:08 | the time and show us to buy which will be a provider training data |
---|
0:31:13 | that data by the end of june and |
---|
0:31:18 | in and a four was provided data for system combination and so we basically ask participants to do with first |
---|
0:31:27 | on the dev sets and rector announced his runs |
---|
0:31:31 | the tree and then put on the website |
---|
0:31:34 | for the participants working on system combination and we had a very bad scheduling september in which we run one |
---|
0:31:41 | after the other asr evaluation |
---|
0:31:44 | asr system combination |
---|
0:31:47 | acidity and machine translation evaluation and finally |
---|
0:31:52 | machine translation system combination |
---|
0:31:54 | so we allows participants to submit one primary run |
---|
0:31:58 | in multiparty multiple secondary from |
---|
0:32:03 | this test sets references were not released |
---|
0:32:07 | so the evaluation was |
---|
0:32:09 | done through an immigration server and we are going to keep this test set as a progress test set for |
---|
0:32:15 | next year |
---|
0:32:16 | what is good is that the benchmark |
---|
0:32:19 | available on our website |
---|
0:32:21 | and that the evaluation server is also going to be a so |
---|
0:32:25 | everyone can give a try |
---|
0:32:27 | and participants and what is it |
---|
0:32:30 | scan what for there to improve the system |
---|
0:32:35 | participants heads eleven teams so we had fifteen at the beginning but for a withdrawal after |
---|
0:32:43 | a few months |
---|
0:32:44 | probably i mean for sure the task is |
---|
0:32:47 | was more difficult than the one of the of the previous year |
---|
0:32:50 | so we had |
---|
0:32:55 | see you so the centre for an extradition organisation and a conceit university difficulty |
---|
0:33:02 | in germany our research on a constellation of technology number of americans of grenoble the mce cinemas |
---|
0:33:12 | you |
---|
0:33:14 | and most of them or |
---|
0:33:15 | i'm at and i force research |
---|
0:33:19 | microsoft research us |
---|
0:33:23 | it shows you of communication |
---|
0:33:25 | because of technology one and a R W D H german |
---|
0:33:33 | submissions we received our yeah so we had five submissions |
---|
0:33:38 | for asr five for smt french english french machine translation was the most popular track seven participant |
---|
0:33:48 | and then we had for my Q for arabic english and chinese english |
---|
0:33:52 | and a couple of solutions for system |
---|
0:33:56 | really |
---|
0:33:57 | so if you look at the |
---|
0:34:00 | results for asr here is that is that is |
---|
0:34:04 | so |
---|
0:34:07 | if you look at the bottom line we had what was the baseline of last year which at the word |
---|
0:34:12 | error rate of around twenty two or three |
---|
0:34:15 | a sense |
---|
0:34:17 | this year we had to significant improvements |
---|
0:34:21 | terms of performance |
---|
0:34:23 | and |
---|
0:34:25 | you see that also system combination had quite a lot so that move from the best system fifteen not for |
---|
0:34:32 | percent water rates to thirty three |
---|
0:34:37 | if you want to give a look at the |
---|
0:34:40 | if you reminder |
---|
0:34:42 | excerpt of to what we have seen |
---|
0:34:44 | you see but |
---|
0:34:47 | the best transcription asr transcription provides |
---|
0:34:50 | which i |
---|
0:34:51 | so it's not really |
---|
0:34:53 | thus |
---|
0:34:54 | so we have a rather good performance but |
---|
0:34:59 | you remind that |
---|
0:35:01 | the guy is it dated between the microphone at the beginning and |
---|
0:35:06 | it's not |
---|
0:35:07 | was not |
---|
0:35:08 | speaker so if you look at the |
---|
0:35:11 | performance we have |
---|
0:35:15 | over the |
---|
0:35:18 | S chores provided |
---|
0:35:20 | they're quite what's a uniform so |
---|
0:35:23 | is you don't there |
---|
0:35:25 | towards for which you are over |
---|
0:35:28 | eighty percent with the best system fortunately with the system combination you are always mostly below twenty percent |
---|
0:35:37 | so our difficult or was the one seventy eight |
---|
0:35:43 | which is around |
---|
0:35:45 | fifteen percent for |
---|
0:35:49 | system combination so i give you a |
---|
0:35:53 | i show you usually transcripts for the |
---|
0:35:56 | you just one the one |
---|
0:35:58 | eight three |
---|
0:36:01 | i |
---|
0:36:02 | because of the audio |
---|
0:36:04 | the corresponding you |
---|
0:36:07 | a few years ago |
---|
0:36:09 | i felt like i was not in a row |
---|
0:36:12 | so i decided follow in the footsteps of the great american philosopher morgan's for a lot |
---|
0:36:17 | and try something for thirty day |
---|
0:36:20 | yeah yes actually pretty simple |
---|
0:36:22 | they could not something you always wanted to actually my |
---|
0:36:26 | and try |
---|
0:36:27 | for the next thirty days |
---|
0:36:29 | it turns out there it is just about a right and a time had you had |
---|
0:36:33 | or subtract |
---|
0:36:35 | like watching than it is |
---|
0:36:36 | from your life |
---|
0:36:38 | there's a few things that i learned wondering used thirty day challenge |
---|
0:36:41 | the first one is instead of the month find i forgot |
---|
0:36:46 | but i'm much more memorable |
---|
0:36:50 | so it's |
---|
0:36:51 | really if you |
---|
0:36:54 | so you have a very good so transcription |
---|
0:37:00 | now |
---|
0:37:02 | this is |
---|
0:37:03 | for what |
---|
0:37:04 | concerns |
---|
0:37:05 | speech recognition |
---|
0:37:07 | i told you now briefly about subject evaluation for mt as you might know you have we have automatic metrics |
---|
0:37:14 | for |
---|
0:37:16 | you like |
---|
0:37:17 | the bleu score is the most |
---|
0:37:19 | known one but there are others like nice to meet you are |
---|
0:37:25 | i don't |
---|
0:37:26 | a word error rate |
---|
0:37:28 | position independent error rate |
---|
0:37:30 | i know this matrix basically try to |
---|
0:37:35 | compare match the mt outputs |
---|
0:37:37 | against that one or more a reference |
---|
0:37:41 | translations |
---|
0:37:44 | it did not know is a matrix is there are there are far from being perfect |
---|
0:37:49 | if you want to measure or |
---|
0:37:52 | you want to rank or C compare system outputs you need a to rely on subjective evaluation which is of |
---|
0:37:59 | course |
---|
0:38:00 | more expensive and slow to carry out this is why he runs evaluations of |
---|
0:38:05 | is because |
---|
0:38:06 | once in a while you need to evaluate you systems |
---|
0:38:09 | and |
---|
0:38:10 | it'd be subject evaluations |
---|
0:38:13 | has been have carried out by |
---|
0:38:17 | and |
---|
0:38:18 | coding some experts and asking them either to charge in absolute terms the quality of |
---|
0:38:23 | machine translation or better |
---|
0:38:26 | which is |
---|
0:38:27 | a more focused on the final you want to rent and the outputs |
---|
0:38:31 | ten which is |
---|
0:38:32 | better |
---|
0:38:36 | but considering the right |
---|
0:38:39 | so what we did this year with respect to produce your is that |
---|
0:38:43 | nearby is a wiener based experts and the |
---|
0:38:47 | and run evaluation by crowd sourcing |
---|
0:38:50 | and |
---|
0:38:51 | it's not a new methodology because chris cut isn't large stuff that a couple of years ago with |
---|
0:38:59 | W T with the war for machine translation so we applied to us a new ideas |
---|
0:39:05 | about |
---|
0:39:07 | random subject evaluation of it also seen |
---|
0:39:10 | which are described in this |
---|
0:39:12 | design |
---|
0:39:14 | so i briefly tell you what's the what's about |
---|
0:39:20 | so i'll or |
---|
0:39:21 | core evaluation |
---|
0:39:24 | is a now |
---|
0:39:27 | one sentence pairs so we compare the output of |
---|
0:39:30 | just to system |
---|
0:39:32 | and the |
---|
0:39:33 | we provide to each of these |
---|
0:39:39 | not all real judges |
---|
0:39:41 | a reference translation and the output of to say |
---|
0:39:45 | and that we ask this the charges to rates which is the best one so they are allowed to say |
---|
0:39:52 | was that i |
---|
0:39:54 | define the translations are equally good or equally bad or to indicate which is the best transition like in this |
---|
0:40:01 | case |
---|
0:40:01 | you have |
---|
0:40:03 | three judges |
---|
0:40:04 | two of them choose |
---|
0:40:07 | system to i'll just the best one and one said that they are equally bad |
---|
0:40:16 | from these atomic |
---|
0:40:20 | evaluation we can say that this |
---|
0:40:24 | the wiener this case is just too |
---|
0:40:26 | okay |
---|
0:40:29 | of course this is just one sentence |
---|
0:40:31 | what we can do is to repeat these evaluation for all sentences over all test sets |
---|
0:40:38 | and repeat this every time so for sentence one sentence two systems we always between |
---|
0:40:43 | system one and six |
---|
0:40:45 | two |
---|
0:40:46 | and we collect all the charges |
---|
0:40:51 | judgements and the |
---|
0:40:53 | and collects |
---|
0:40:54 | final statistics about the |
---|
0:40:57 | how many wins by system one how many by system to and how many times |
---|
0:41:03 | for me this looking at the statistics we can decide that |
---|
0:41:07 | here that we know is |
---|
0:41:09 | just because i was |
---|
0:41:12 | so and this comparisons |
---|
0:41:14 | is run just for a couple of systems if you have a more system |
---|
0:41:18 | in the taking part in relation we organised yeah |
---|
0:41:24 | and from dropping tournaments |
---|
0:41:27 | and all systems |
---|
0:41:29 | so |
---|
0:41:30 | what you see in this table is that you have all the systems on the top |
---|
0:41:36 | and you have boxes in which you put wins and losses |
---|
0:41:40 | statistics |
---|
0:41:41 | and we do we do have a table which shows you all pairwise comparisons that you need to carry out |
---|
0:41:46 | of course |
---|
0:41:51 | depending from the direction |
---|
0:41:52 | and the |
---|
0:41:56 | for each of these boxes you run one of these |
---|
0:41:58 | evaluation over the full test set |
---|
0:42:02 | and you report and all the number of test set by wires wins |
---|
0:42:08 | and |
---|
0:42:09 | losses |
---|
0:42:11 | table |
---|
0:42:12 | so from these machinery |
---|
0:42:16 | we can extract |
---|
0:42:18 | some meaningful statistics for the comparison and use this quite standard |
---|
0:42:24 | scores |
---|
0:42:25 | so it is |
---|
0:42:27 | first code used it's larger than others |
---|
0:42:30 | and you report your the percentage of test sentences the system a given system was run |
---|
0:42:36 | that are against any other system |
---|
0:42:38 | so for each system we compute these |
---|
0:42:43 | actually as well as the other metric which is |
---|
0:42:46 | larger than equal which collects which in close box wins and the ties |
---|
0:42:54 | collected by |
---|
0:42:56 | and finally we have these had two heads |
---|
0:43:00 | results |
---|
0:43:01 | which counts the number of test set pairwise rankings one by the system |
---|
0:43:07 | so if you look at the |
---|
0:43:09 | figures of this year you |
---|
0:43:12 | you can appreciate was the importance of a running subject evaluations because we report what |
---|
0:43:17 | matrix automatic metrics and the subjective metrics |
---|
0:43:22 | so as you know they correlate well but |
---|
0:43:25 | you might have some surprises especially with systems with |
---|
0:43:29 | our scores |
---|
0:43:30 | rather closely with automatic metrics |
---|
0:43:33 | for instance you see a customer and the not gonna very close metrics but |
---|
0:43:39 | the |
---|
0:43:42 | rankings may change with subject evaluation |
---|
0:43:46 | so |
---|
0:43:47 | what we see as that from one side |
---|
0:43:52 | we had an improvement in terms of a bleu score with respect to produce you |
---|
0:43:57 | one an exercise of the same translation direction and we had the maximal bleu score of sixteen to fifty |
---|
0:44:03 | for this sat task so yeah these are results |
---|
0:44:07 | of machine translation |
---|
0:44:09 | starting from speech okay |
---|
0:44:11 | and ending with the partition text |
---|
0:44:14 | punctuation capitalisation |
---|
0:44:19 | so we basically doubled |
---|
0:44:21 | the bleu score which |
---|
0:44:22 | which for sure means that |
---|
0:44:27 | moreover for machine translation english french we had the similar behavior |
---|
0:44:37 | so |
---|
0:44:38 | the ranking a given but it was not |
---|
0:44:42 | confirmed yet so we have a slightly different ranking or of course the correlation is cool so |
---|
0:44:50 | you can write |
---|
0:44:57 | machine translation arabic english |
---|
0:45:00 | see here that in this case the ranking is confirmed you have |
---|
0:45:07 | more significant difference |
---|
0:45:09 | among the systems |
---|
0:45:11 | the bleu score |
---|
0:45:12 | so if you like this |
---|
0:45:14 | a large difference |
---|
0:45:15 | it's very likely that the |
---|
0:45:17 | subjective ranking is performed |
---|
0:45:20 | unfortunately you see that system combination do not really |
---|
0:45:25 | help machine |
---|
0:45:28 | for machine translation |
---|
0:45:30 | so |
---|
0:45:32 | system yeah ended up second |
---|
0:45:41 | okay |
---|
0:45:43 | the for chinese english |
---|
0:45:46 | we have again a result confirmed from the robert english so the ranking of lewis is |
---|
0:45:54 | for |
---|
0:45:55 | the |
---|
0:45:57 | with some slight difference on |
---|
0:45:59 | bottom part |
---|
0:46:02 | and this time |
---|
0:46:04 | the |
---|
0:46:07 | basically the system combination provided the |
---|
0:46:10 | best was on times to head to head comparison |
---|
0:46:13 | so |
---|
0:46:14 | you see on the bottom line is i two it figure four means that the |
---|
0:46:20 | justin commission or four |
---|
0:46:23 | matches |
---|
0:46:24 | he was |
---|
0:46:26 | applied to |
---|
0:46:27 | you one |
---|
0:46:28 | again some of the other forces |
---|
0:46:32 | now briefly about the |
---|
0:46:36 | results we can compare yeah |
---|
0:46:41 | i'll come from sat |
---|
0:46:44 | which is translation from english to french so yeah |
---|
0:46:49 | we have again a simple of given by D is a guy affected the body people a reason |
---|
0:46:55 | so you might be surprised at |
---|
0:46:58 | about something |
---|
0:47:00 | what |
---|
0:47:03 | san francisco |
---|
0:47:04 | because these a translation starting from speech recognition and you remind that |
---|
0:47:08 | in the asr actually before san francisco was not recognise red so i was also |
---|
0:47:15 | what about |
---|
0:47:17 | as you're suspicious so i looked into the asr output |
---|
0:47:22 | of the of that |
---|
0:47:24 | the best system and indeed he got san francisco |
---|
0:47:30 | so it means that the system combination output reaches the lowest word error rates eight was brought on |
---|
0:47:39 | recognizer san francisco while the |
---|
0:47:42 | one system outlier here |
---|
0:47:44 | from the best sat relation was right |
---|
0:47:51 | the quality is reasonable yeah i think you understand what's going on but |
---|
0:47:56 | can be improved |
---|
0:47:59 | different stories if you look at machine translation output so from |
---|
0:48:04 | perfect transcript |
---|
0:48:05 | clean transcripts |
---|
0:48:07 | from english into french yeah you have a |
---|
0:48:09 | rather |
---|
0:48:11 | oops |
---|
0:48:12 | translation |
---|
0:48:17 | i show you know another door |
---|
0:48:21 | which belongs to the other test sets |
---|
0:48:24 | sh |
---|
0:48:25 | i |
---|
0:48:41 | i don't |
---|
0:49:05 | i |
---|
0:49:07 | this is what we call this |
---|
0:49:10 | but |
---|
0:49:13 | and everybody agrees with this on the wall of the spectrum |
---|
0:49:23 | for tracing over the |
---|
0:49:27 | you want to |
---|
0:49:29 | right and on a good writer good |
---|
0:49:37 | the right |
---|
0:49:38 | yeah |
---|
0:49:40 | okay |
---|
0:49:42 | that does look at machine translation from arabic into english of the store so i wanted to show you because |
---|
0:49:48 | otherwise this plan |
---|
0:49:50 | that being |
---|
0:49:52 | unexplainable |
---|
0:49:55 | so he is used out from around |
---|
0:50:01 | it's not really especially the beginning nothing to show you |
---|
0:50:07 | i again you |
---|
0:50:11 | your grass |
---|
0:50:13 | and |
---|
0:50:17 | one was on the vocabulary |
---|
0:50:23 | but you can get an idea |
---|
0:50:25 | as you know chinese is much more difficult than rubber |
---|
0:50:29 | but the okay but look at these up from the bed |
---|
0:50:40 | i |
---|
0:50:41 | to apply because |
---|
0:50:47 | so there is the another colour word which is introduced which is this to do what one |
---|
0:50:54 | which means the we need |
---|
0:50:56 | as far as i |
---|
0:50:57 | just to |
---|
0:51:04 | again you last meeting it's |
---|
0:51:08 | it's a reasonable |
---|
0:51:11 | i'm from the future |
---|
0:51:17 | no i overview now briefly what are the main findings of these evaluations show a survey told the system papers |
---|
0:51:25 | by the participants and tried to figure out what where the optimal configuration and maybe ideally have some guidelines about |
---|
0:51:33 | the future participants are researchers that like to approach |
---|
0:51:38 | task so if you look at |
---|
0:51:40 | asr systems from acoustic guitar perspective |
---|
0:51:45 | participants typically download its |
---|
0:51:49 | the titles |
---|
0:51:50 | which can be downloaded |
---|
0:51:52 | and try to automatically align the manual transcripts with the with the audio |
---|
0:51:57 | so straightforward procedures |
---|
0:52:00 | and get around hundred fifty hours and then use these hundred fifty hours for training acoustic models |
---|
0:52:08 | as we see the technology |
---|
0:52:10 | instead used other data from the choir project speech lectures they own |
---|
0:52:17 | and the news |
---|
0:52:18 | for find a larger amount of hours |
---|
0:52:22 | what acoustic and linguistic features so participants use up to third order or acoustic features |
---|
0:52:32 | and |
---|
0:52:33 | large vectors |
---|
0:52:35 | twenty or hlda |
---|
0:52:37 | acoustic model training was done by the best the three systems with the discriminative training and then my and a |
---|
0:52:44 | minimum phoneme however |
---|
0:52:46 | criterion |
---|
0:52:47 | concerning language models foreground interpretations of language models were employed by combining type data and now this one |
---|
0:52:56 | a multi-pass decoding one of them all the participants from mountain pass decoding |
---|
0:53:04 | but using models of increased resolution from dawn starts to speaker adaptive a train acoustic models |
---|
0:53:11 | from trigrams to four gram language models |
---|
0:53:14 | and also applied if |
---|
0:53:16 | acoustic models in the process to do some |
---|
0:53:21 | courses |
---|
0:53:24 | so they had to use different acoustic features like you employ the neural network based the |
---|
0:53:29 | was the features alright we can use that if a lexicon |
---|
0:53:33 | sorry the use of different |
---|
0:53:36 | lexical |
---|
0:53:40 | concerning anti |
---|
0:53:43 | people working in parallel data selection criteria so we provided a lot of |
---|
0:53:49 | out-of-domain data very large collections like this eight hundred million words |
---|
0:53:54 | oral data french english |
---|
0:53:56 | you cannot use it |
---|
0:53:58 | in a system you run out of memory so that is the best you can do is |
---|
0:54:02 | to extract meaningful data from it |
---|
0:54:05 | and |
---|
0:54:09 | they use entropy over the line |
---|
0:54:10 | score criteria |
---|
0:54:14 | people work on multiple word segmentation for arabic english different alignment technique |
---|
0:54:19 | thus ending model features |
---|
0:54:21 | the work and |
---|
0:54:23 | adaptation |
---|
0:54:25 | for translation tables and language models by using interpolation log-linear interpolation or fill up |
---|
0:54:31 | interpretation of |
---|
0:54:32 | the phrase table discriminative training for |
---|
0:54:36 | translation model is done by microsoft research |
---|
0:54:40 | developing topic specific translation tables |
---|
0:54:46 | whose |
---|
0:54:48 | language models based on neural networks |
---|
0:54:51 | he pretty class language models by the key to model the style of told |
---|
0:54:56 | syntax based models based on categorial grammar by this you |
---|
0:55:01 | and then i would say concerning the comparison between phrase |
---|
0:55:05 | based hierarchical phrase based smt nothing definite can also some laps compare them |
---|
0:55:13 | some of them find one was better than other the others find all the middle |
---|
0:55:17 | sonar here |
---|
0:55:18 | results |
---|
0:55:23 | about a diversity two thousand twelve to introduce what's going on |
---|
0:55:27 | for next year |
---|
0:55:30 | so about a venue |
---|
0:55:32 | decided to be |
---|
0:55:34 | maybe in hong kong |
---|
0:55:36 | in december |
---|
0:55:37 | and that if you |
---|
0:55:40 | and some anticipation about what we are going to plan |
---|
0:55:42 | we are going to come from the text or task |
---|
0:55:47 | so the a soundtrack will be again on english and B is time will be lower contrast in france |
---|
0:55:53 | without using segmentation so you have the challenge recognise the speech and blouses |
---|
0:55:59 | but the primary round will be on the segment that stuff |
---|
0:56:04 | so english to french |
---|
0:56:06 | you're going to repeat the rubber english you are |
---|
0:56:09 | no thinking about to repeat a chinese english and we plan was to add and you want to exercise that |
---|
0:56:17 | has to be worked out so you want to support some |
---|
0:56:21 | longer term effort on our own specific languages so i think people should choose their own preferred language and have |
---|
0:56:26 | a is the possibility to war repeatedly on these language like we for instance for italian but |
---|
0:56:32 | so our friends in |
---|
0:56:34 | two okay |
---|
0:56:35 | would like to work on tradition so we're going to provide several translation directions here and would provide baselines and |
---|
0:56:42 | people will be able to separate set once on these different languages |
---|
0:56:47 | and we don't care really about having comparisons against each other but try to compare against the baseline and will |
---|
0:56:54 | try to do some comparisons across different languages we have some ideas about |
---|
0:57:00 | and as we lost a lot of this more players i mean |
---|
0:57:05 | smaller let's have students for instance we are introducing a new you're using a new small domain task could olympics |
---|
0:57:13 | corpus kindly provided by nist |
---|
0:57:18 | japan in this with the corpus of around sixty thousand sentences |
---|
0:57:22 | domain is travelling in traffic business a diamond support and was collected for the page |
---|
0:57:29 | yeah we're on a track changes |
---|
0:57:32 | some conclusions |
---|
0:57:34 | i diversity or task it's basically subtitling and translation task we add it's a asr and system combination yeah and |
---|
0:57:42 | you see a the data has been publicly released what resources language resources and benchmarks you can find it on |
---|
0:57:51 | the website |
---|
0:57:52 | and which also was subjectivity |
---|
0:57:56 | what is it once we have eleven partners |
---|
0:57:59 | random evaluations random story |
---|
0:58:04 | system on our data i must say B A I so when impressive effort in high quality research on this |
---|
0:58:11 | track and this witness by the research because you fine |
---|
0:58:14 | in the proceedings |
---|
0:58:15 | and significant improvement over the french |
---|
0:58:19 | each task |
---|
0:58:20 | so what to take on a at these detectors if you're not sure |
---|
0:58:25 | to you knew about |
---|
0:58:28 | i think that's a good interesting ideas |
---|
0:58:31 | by the participants about how to cope with this problem |
---|
0:58:36 | is it just will be online soon |
---|
0:58:38 | the proceedings are going to be published online |
---|
0:58:41 | we show the importance of subject evaluation |
---|
0:58:45 | right |
---|
0:58:45 | right |
---|
0:58:46 | crowd sourcing |
---|
0:58:47 | and you have to further normalize of these results because they are fresh one |
---|
0:58:52 | the |
---|
0:58:54 | take my invitation to try |
---|
0:58:56 | this |
---|
0:58:57 | this task |
---|
0:58:58 | and eventually join X T |
---|
0:59:00 | our |
---|
0:59:02 | yeah some references |
---|
0:59:04 | for my for my door and |
---|
0:59:08 | and finally some credits |
---|
0:59:12 | why the data |
---|
0:59:14 | people especially wood |
---|
0:59:16 | setting |
---|
0:59:26 | we have time for a couple of quick questions |
---|
0:59:28 | before we go to the next part of the session |
---|
0:59:33 | oh much thank you very much for a very interesting overview of I W S L T |
---|
0:59:39 | oh one of the things i guess that's probably very relevant for the community here is that is |
---|
0:59:44 | i an ongoing debate as to |
---|
0:59:49 | that's the way to improve speech-to-speech translation |
---|
0:59:52 | what is the speech people should talk to the translation people or whether they're both all of that are doing |
---|
0:59:58 | their own stuff and getting you know sort of slamming the two components together every once in a while and |
---|
1:00:04 | keep their distance from each other |
---|
1:00:06 | i'm wondering if you had any yeah it in any comments about the impact of having the speech people interact |
---|
1:00:14 | more or less closely with the energy people as far as advancing the state of the art in this area |
---|
1:00:23 | as |
---|
1:00:24 | right |
---|
1:00:25 | oh |
---|
1:00:30 | but |
---|
1:00:31 | you |
---|
1:00:32 | the work on the |
---|
1:00:35 | okay |
---|
1:00:35 | speech recognition |
---|
1:00:37 | she just |
---|
1:00:38 | so that yeah |
---|
1:00:39 | yeah |
---|
1:00:42 | or |
---|
1:00:43 | we do this |
---|
1:00:45 | yeah |
---|
1:00:46 | for |
---|
1:00:47 | she |
---|
1:00:48 | that is |
---|
1:00:51 | so |
---|
1:00:55 | right |
---|
1:00:55 | the |
---|
1:00:56 | i don't |
---|
1:00:58 | people |
---|
1:01:01 | that is |
---|
1:01:03 | stop |
---|
1:01:04 | work |
---|
1:01:17 | actually i had a question the cup maybe three years ago we had this somewhat disappointing discover even we were |
---|
1:01:23 | doing some of the gale |
---|
1:01:25 | research that even if the speech group managed to improve accuracy two hundred percent |
---|
1:01:31 | the translation wasn't good enough for us to meet the objectives of the program at the time |
---|
1:01:36 | and so in some sense we cut down our speech effort tremendously and put in a lower energies into translation |
---|
1:01:42 | and the hope was that one of these days translational get good enough that we can start paying attention to |
---|
1:01:47 | speech again |
---|
1:01:48 | as the I W estimate the experience been different or do have P do people accurately measure what difference it |
---|
1:01:54 | would make if |
---|
1:01:56 | the use the reference transcript on the test data have you looked at that as an evaluation question |
---|
1:02:01 | yes we are evaluation |
---|
1:02:05 | but with |
---|
1:02:07 | transcript and we |
---|
1:02:20 | i think |
---|
1:02:22 | if the war |
---|
1:02:23 | course |
---|
1:02:25 | five percent |
---|
1:02:28 | start |
---|
1:02:33 | like |
---|
1:02:36 | but |
---|
1:02:37 | well |
---|
1:02:38 | of course |
---|
1:02:41 | machine translation |
---|
1:02:45 | it's more difficult |
---|
1:02:46 | sense |
---|
1:02:48 | actually |
---|
1:02:50 | very readable result |
---|
1:02:52 | so we are not |
---|
1:02:54 | spot in errors here and there |
---|
1:02:57 | some languages |
---|
1:02:59 | with |
---|
1:03:00 | frames before saying |
---|
1:03:04 | it's far behind |
---|
1:03:06 | the level |
---|
1:03:10 | maybe |
---|
1:03:11 | iteration |
---|
1:03:12 | the goal set for machine translation work |
---|
1:03:15 | a beach |
---|
1:03:17 | all right now this is good to know because i think in gale we were seen that there was no |
---|
1:03:20 | difference even and divide error rate was fifteen to |
---|
1:03:24 | maybe not twenty but higher than fifteen percent so it's good to know that and you already starting to different |
---|
1:03:28 | so there's a reason to |
---|
1:03:30 | make the speech better |
---|
1:03:32 | other questions |
---|
1:03:34 | so let's thank our speaker once again |
---|