0:00:14 | she thank you also |
---|
0:00:18 | so the language recognition i-vector challenge had three main goals |
---|
0:00:26 | first to including attracts people from outside a regular community |
---|
0:00:32 | and to make |
---|
0:00:35 | this |
---|
0:00:37 | work that we do more accessible to that |
---|
0:00:39 | and the idea behind that was to people to explore new approaches and methods |
---|
0:00:47 | from machine learning and language recognition with the overall goal of improving performance and language |
---|
0:00:52 | recognition |
---|
0:00:55 | the task was open set language identification so given audio segments a which are and |
---|
0:01:00 | languages the audio segments spoken in or whether was and |
---|
0:01:06 | unknown language |
---|
0:01:09 | the data used was from previous and a cell l are used as well as |
---|
0:01:14 | from the i r pa babble program |
---|
0:01:17 | and the data was selected in such a manner such that multiple sources were used |
---|
0:01:25 | for each language in order to reduce |
---|
0:01:27 | the source and language fact |
---|
0:01:30 | and we're also select in order to have highly confusable languages included in the |
---|
0:01:37 | dataset |
---|
0:01:40 | accuracy the size of the data there were fifty languages and train and sixty five |
---|
0:01:44 | and dev and test |
---|
0:01:47 | about three hundred per language gender segments per language in the training and about a |
---|
0:01:53 | hundred |
---|
0:01:53 | and the devon test |
---|
0:01:55 | and we see the total number of segments all the way the right hand column |
---|
0:02:01 | fifteen hundred for training so about sixty four hundred for dev and about sixty five |
---|
0:02:05 | hundred for test |
---|
0:02:06 | and the training set did not include data that was from out of set |
---|
0:02:12 | the development set included and unlabeled out of set |
---|
0:02:15 | and the test set was divided into progress and evaluation subsets so we'll |
---|
0:02:21 | cover and just a moment |
---|
0:02:23 | people were able to upload their system outputs and receive some feedback on how that |
---|
0:02:29 | one and that was done using a progress set |
---|
0:02:32 | and then at the end of the evaluation period |
---|
0:02:36 | a feedback was given on an evaluation set in it was a partition so there's |
---|
0:02:40 | not overlap |
---|
0:02:44 | here we see data sources for each language |
---|
0:02:48 | on the |
---|
0:02:50 | right hand side i sure noisy that is to see |
---|
0:02:53 | you can see different corpora labels i think that a high-level we can say |
---|
0:02:58 | blue or conversational telephone speech green include |
---|
0:03:04 | broadcast narrowband speech and yellow is a combination of the two |
---|
0:03:09 | i think |
---|
0:03:10 | one thing to say is that if you look across |
---|
0:03:13 | the training data which is the i guess you're leftmost column |
---|
0:03:17 | the dev data which is in the middle and the test data to rest of |
---|
0:03:20 | the right |
---|
0:03:21 | the distribution across sources is very similar per language there are a few exceptions |
---|
0:03:27 | and as we mentioned there was no out of set |
---|
0:03:29 | due to the training |
---|
0:03:36 | and here we see us speech duration |
---|
0:03:41 | both in trained up and test |
---|
0:03:43 | training is this page that is green and test is blue |
---|
0:03:48 | and we see it again a similar distribution a model trained of interest |
---|
0:03:53 | this was low more |
---|
0:03:59 | the performance metric was error rates split into out of seven languages and within seven |
---|
0:04:04 | languages |
---|
0:04:06 | where the prior probability of a lot of seven languages point two three |
---|
0:04:15 | participation was |
---|
0:04:18 | wonderful a more than what will typically see and a lre |
---|
0:04:23 | was from international sites six continents and thirty one countries |
---|
0:04:30 | about eighty participants to model the data know little a fifty five per se but |
---|
0:04:34 | the results |
---|
0:04:36 | from |
---|
0:04:37 | forty four unique organisations |
---|
0:04:41 | during the evaluation period a little over seventy i'm sorry thirty seven hundred dollars emissions |
---|
0:04:46 | were submitted |
---|
0:04:49 | and that number continues to grow |
---|
0:04:54 | after which |
---|
0:04:59 | and mentioned that we |
---|
0:05:01 | i had more participation and the i-vector challenge that we need to be with your |
---|
0:05:05 | salary and we can see some other comparisons |
---|
0:05:09 | i guess i've not had said one of the main differences between the i-vector challenge |
---|
0:05:15 | and a traditional areas in the data that we distribute |
---|
0:05:19 | and the traditional battery we send a audio segments as input to systems and i-vector |
---|
0:05:26 | challenge we send i-vectors instead |
---|
0:05:30 | the task was different never to challenge as a open set identification instill detection |
---|
0:05:37 | and i-vector challenge the cost was based on a kind of total error rates per |
---|
0:05:43 | language and in the traditional laureates on miss and false alarm rates |
---|
0:05:48 | a larger number of target languages a different |
---|
0:05:52 | distribution of speech duration and mention that was log normal and i-vector challenge in the |
---|
0:05:57 | traditional array it's three ten and thirty second bins traditionally |
---|
0:06:02 | the challenge lasted much longer than the i-vector challenge |
---|
0:06:07 | and it |
---|
0:06:08 | but also the i-vector challenge results were |
---|
0:06:12 | feedback where it was given during the challenge period which is also about something we |
---|
0:06:16 | do in traditional evaluations |
---|
0:06:19 | and last there was a an evaluation platform that was online |
---|
0:06:27 | and this was something that we |
---|
0:06:30 | focused on for the i-vector challenge |
---|
0:06:33 | in particular the goal was to facilitate |
---|
0:06:36 | the evaluation process with limited human involvement |
---|
0:06:40 | all evaluation activities were conducted via this platform including receiving the data |
---|
0:06:47 | uploading submissions and been able to see how things went |
---|
0:06:56 | and now looking at some results on the y-axis we see |
---|
0:07:01 | cost |
---|
0:07:03 | and on the x-axis a time |
---|
0:07:06 | the first |
---|
0:07:07 | first diff i think is around may seventeenth the choice certainly first |
---|
0:07:12 | and the second floor |
---|
0:07:14 | large dip is on may twenty first so |
---|
0:07:18 | of about half roughly half of the progress made during the evaluation to place during |
---|
0:07:25 | the first |
---|
0:07:25 | two or three weeks or so |
---|
0:07:28 | and then during the remainder of four months the rest of the progress was made |
---|
0:07:37 | here we also see cost on the y-axis one x-axis we see |
---|
0:07:43 | participant id so these are really discrete it's sorted by best cost |
---|
0:07:49 | obtained on the evaluation |
---|
0:07:50 | a subset |
---|
0:07:52 | and so we see most of the sites be the be the baseline |
---|
0:07:59 | which is trained and a few sites be an oracle system so i guess speaking |
---|
0:08:03 | of speaking to both of these the baseline i believe is a simple |
---|
0:08:13 | a simple |
---|
0:08:17 | system that used cosine distance and oracle system used p lda |
---|
0:08:25 | so it's called oracle because there were unlabeled data that were distributed to the participants |
---|
0:08:30 | butts the oracle system used those labels |
---|
0:08:38 | and here we see the number of submissions per participant |
---|
0:08:42 | in general |
---|
0:08:43 | a participants you did well estimated more systems but there were |
---|
0:08:48 | a few exceptions i think now is a reasonable time dimension that |
---|
0:08:54 | participant id and |
---|
0:08:56 | site id the distinction between participants and site so |
---|
0:09:02 | participants as someone who signed up and maybe there were multiple participants personally so i |
---|
0:09:08 | use are not necessarily unrelated for example section three may have also been by thirty |
---|
0:09:15 | just |
---|
0:09:20 | and you receive results by a target language we have every year on the y-axis |
---|
0:09:27 | on x-axis we see language the lowest error or was received on |
---|
0:09:39 | parameters and highest on hindi |
---|
0:09:42 | what was surprising was english also had a high error rate |
---|
0:09:47 | second from can be actually of second for the right |
---|
0:09:51 | and the blue was the out of seven languages somewhere in the middle the pack |
---|
0:09:58 | and here we see results by speech duration i guess no surprise that is you |
---|
0:10:04 | get more audio |
---|
0:10:08 | you tend to do better |
---|
0:10:10 | one thing that |
---|
0:10:12 | is also may be interesting is there seems to be some diminishing marginal returns |
---|
0:10:17 | so if for example you had three seconds and you could get ten you do |
---|
0:10:26 | maybe |
---|
0:10:27 | we |
---|
0:10:28 | point to better but if you want from |
---|
0:10:34 | a ten to twenty |
---|
0:10:36 | the difference is not so great |
---|
0:10:38 | just as an example |
---|
0:10:42 | so some lessons learned |
---|
0:10:44 | wonderful participation were all very grateful for you in the audience to fit it is |
---|
0:10:51 | this was those we couldn't dryness today |
---|
0:10:55 | number of systems be the baseline that surprisingly six you're actually better than the oracle |
---|
0:10:59 | system sure hoping to learn more about |
---|
0:11:03 | a half of the improvement made as early on i which may just to reconsider |
---|
0:11:09 | the timeline |
---|
0:11:11 | surprisingly top systems do not all do so well on english |
---|
0:11:18 | performance of out of seven languages also was not is for this we might have |
---|
0:11:22 | expected |
---|
0:11:25 | we did not receive many system descriptions so it's unclear how many of the participants |
---|
0:11:32 | attended have its although |
---|
0:11:34 | later in the session will your from |
---|
0:11:38 | tops is thus able to capture stated in the a team that created top system |
---|
0:11:44 | that did develop level techniques and we'll see more that |
---|
0:11:48 | and the web platform ends up so please feel free to visit and participant the |
---|
0:11:54 | challenge now |
---|
0:11:57 | and see how see how you're doing |
---|
0:12:00 | and a quick plug for upcoming activities there's a story sixteen and workshop |
---|
0:12:06 | where the it speaker detection on telephone speech recorded over a variety of handsets |
---|
0:12:13 | similar to lre fifteen those are from layer there's now a fixed training condition as |
---|
0:12:17 | well as an open condition |
---|
0:12:20 | can see some other there so that the evaluation and there's also a twenty sixteen |
---|
0:12:25 | lre analysis workshop and all of this will be co-located with salty sixteen and |
---|
0:12:30 | send |
---|
0:12:32 | so it looks like we have time for |
---|
0:12:35 | for questions |
---|