0:00:15 | thank you very much |
---|
0:00:17 | thanks to the organisation for the enhanced percent in a hardware work |
---|
0:00:24 | which is still trying to complement well |
---|
0:00:28 | so with some post analyses the necessary they larry able to |
---|
0:00:34 | you to the due to some somebody beauties a meat couldn't come here so i'm |
---|
0:00:40 | gonna |
---|
0:00:42 | try to percent |
---|
0:00:44 | so |
---|
0:00:45 | thank you now present if you tell somewhat all overview about the other we submissions |
---|
0:00:52 | where system |
---|
0:00:55 | we have some hypotheses are not at each that they would like to show you |
---|
0:01:00 | a how we work with a development dataset and the man on interactions that we |
---|
0:01:05 | have |
---|
0:01:07 | the evaluation results and someone of these things and configurations on the lesson study we |
---|
0:01:13 | learn from this |
---|
0:01:16 | okay still |
---|
0:01:18 | very briefly the other we are able to a shown was focused on the development |
---|
0:01:23 | of language recognition systems |
---|
0:01:26 | for very closely related languages |
---|
0:01:30 | so well we have to twenty target language is a split across |
---|
0:01:35 | six different clusters and the participants have to devise their own development set |
---|
0:01:42 | so |
---|
0:01:43 | there were mean up to maine a channels the telephone speech and a broadcast speech |
---|
0:01:50 | and here we have the six different plaster probably chinese english french slide we can |
---|
0:01:56 | be very in |
---|
0:01:58 | them the performance metric was the average of the performance within each cluster so |
---|
0:02:04 | these a low to development |
---|
0:02:06 | the development of six different a separate systems for |
---|
0:02:11 | it's cluster |
---|
0:02:13 | since the we have to torture the language in each cluster |
---|
0:02:18 | okay so |
---|
0:02:20 | we have before the yellow re some hypotheses the first one was that |
---|
0:02:27 | there where the data that there where l limit mismatch between that there and the |
---|
0:02:33 | test set up |
---|
0:02:36 | as we have seen the previews salaries but of course work |
---|
0:02:41 | i say so you |
---|
0:02:43 | second one is that the bottleneck features where all |
---|
0:02:47 | good features for these kind of a task |
---|
0:02:50 | and also you that |
---|
0:02:52 | we we're right from these hypotheses |
---|
0:02:55 | later |
---|
0:02:57 | i where hypothesis here was that the fusion with multiple systems |
---|
0:03:02 | a it was a nice approached to increase their |
---|
0:03:06 | robustness |
---|
0:03:07 | and we were run |
---|
0:03:10 | finally |
---|
0:03:12 | have a good development dataset design would be crucial |
---|
0:03:15 | and we were |
---|
0:03:17 | so |
---|
0:03:19 | we have i mean three octaves here are the for one was to design a |
---|
0:03:23 | development dataset |
---|
0:03:25 | the second be below innovative approach is to dialect id |
---|
0:03:31 | on the third one select a rubber used fusion coming from the right of complementary |
---|
0:03:36 | bottleneck features so features |
---|
0:03:40 | but we were all developing on their |
---|
0:03:43 | darpa rats program |
---|
0:03:44 | and also |
---|
0:03:46 | fusion with the different backend classifier |
---|
0:03:51 | okay |
---|
0:03:52 | so first we use plead that data in eighty percent for training and twenty percent |
---|
0:03:56 | for that |
---|
0:03:58 | a constant mentioned in his last question it was but there are a decision that |
---|
0:04:04 | passage so you |
---|
0:04:05 | or it could be better |
---|
0:04:09 | and we have ten audio files per language you need you need to split |
---|
0:04:17 | we prevent to have these telephone conversational scrollers uttering and taps |
---|
0:04:23 | and in here we include a equal proportion of thirty four of telephone speech and |
---|
0:04:29 | broadcast speech in its in need to split |
---|
0:04:33 | and we screwed switchboard one and two basically because |
---|
0:04:38 | our first experiments didn't so great impact on that |
---|
0:04:43 | probably because we |
---|
0:04:45 | didn't expect these huge missed spots |
---|
0:04:48 | so |
---|
0:04:50 | and so we |
---|
0:04:53 | get their from the with that they out your we changed a the audio to |
---|
0:04:58 | the |
---|
0:04:59 | different segments of three seconds to assist a short durations |
---|
0:05:04 | so |
---|
0:05:06 | a the end we have a wrong hundred k used for they ubm and i |
---|
0:05:10 | p i ubm training and which in the training data used for take a back |
---|
0:05:17 | and classifiers |
---|
0:05:21 | we contextualized features with different methods like sdc |
---|
0:05:26 | and deltas and double deltas at run p c d or pca dct and also |
---|
0:05:32 | we fusion different i-vector system select from a traditional features and at the end they |
---|
0:05:40 | bottleneck where training with these combination of different |
---|
0:05:44 | a better original features with different context of sessions |
---|
0:05:52 | for data back and classifiers we used a the gaussian backend and a neural networks |
---|
0:05:59 | are |
---|
0:06:00 | both methods are very well known for the community |
---|
0:06:05 | and two methods for adapt that the other coalition back and which aims to better |
---|
0:06:10 | cope with a mismatch conditions |
---|
0:06:13 | basically it's a based on the a i-vector taste we try to select some i-vectors |
---|
0:06:19 | are from their from the training to train the gaussian backends |
---|
0:06:24 | and also the resolution and neural networks that |
---|
0:06:29 | it was a new method the we propose here |
---|
0:06:32 | and i aims to exploit day they this short dialect differences that we caff or |
---|
0:06:39 | with the phonetic information |
---|
0:06:42 | so a we have a different chunk durations from short directions to thirty two seconds |
---|
0:06:51 | direction a chance and the phone segment and we have a different weights for each |
---|
0:06:56 | for each |
---|
0:06:59 | for each tank |
---|
0:07:01 | okay and here we have comparison |
---|
0:07:05 | for all these five |
---|
0:07:07 | i can systems that we had |
---|
0:07:10 | they multi-resolution neural networks was performed the but the best solution we're using the best |
---|
0:07:20 | single bottleneck features and the number linux features in the case of the a multiresolution |
---|
0:07:25 | neural network we were using just the bottleneck features because |
---|
0:07:29 | we need phonetic information so as to make sense to use the bottleneck features |
---|
0:07:37 | since aware bottleneck feature for training with it for the siemens |
---|
0:07:42 | and also another thing it that the additive gaussian backend approaches were more complement are |
---|
0:07:49 | we with a normal bottleneck i-vectors |
---|
0:07:54 | we're uncle these systems as we can see here for our data |
---|
0:07:59 | and here |
---|
0:08:00 | what it would like to show you use that it clearly works much better the |
---|
0:08:04 | bottleneck features and non bottleneck features |
---|
0:08:07 | for a |
---|
0:08:10 | for the feature for the for the backends |
---|
0:08:14 | okay so this is it |
---|
0:08:15 | in general i claim or a of our system |
---|
0:08:20 | at the end of the consumptions we used fusion somehow some of this of these |
---|
0:08:26 | systems fusion like seek so or all five or six hours of them |
---|
0:08:34 | where we in clusters specific fusion or on overall the a data fusion and we |
---|
0:08:41 | with that the scores we get the look really cute conversions also or into the |
---|
0:08:45 | cluster or with a global |
---|
0:08:47 | with the global locally the huge radio and at the end this is therefore |
---|
0:08:51 | aw systems that we were percent the |
---|
0:08:55 | so the for our primary systems were used in five weight cluster based fusion |
---|
0:09:02 | cluster based log-likelihood conversions |
---|
0:09:05 | all the second one was to system we fusion a cluster based conversions the third |
---|
0:09:10 | one was used using the belgian but can only five wait a cluster based fusion |
---|
0:09:16 | and the for one was with us as the second one |
---|
0:09:20 | but we think global compression of day likely if you to reduce |
---|
0:09:24 | okay so some evaluation analyses is |
---|
0:09:29 | here |
---|
0:09:30 | after |
---|
0:09:32 | we got the |
---|
0:09:33 | test data we can see the future work that we have the difference between the |
---|
0:09:38 | data |
---|
0:09:39 | on the test we were from well |
---|
0:09:41 | three percent to twenty three percent |
---|
0:09:45 | it is huge |
---|
0:09:47 | and of course we have questions weight happened right |
---|
0:09:51 | so this is a round also for it the core to compare the data under |
---|
0:09:56 | test |
---|
0:09:58 | as we can see here this is our primary system |
---|
0:10:01 | so it's i think it's real one to say that are there is a three |
---|
0:10:06 | five percent of relative gain over the best single system that |
---|
0:10:12 | but |
---|
0:10:13 | on the test |
---|
0:10:14 | we got a eight percent lost and on the evaluation |
---|
0:10:19 | okay so |
---|
0:10:22 | for us what was more important and distribution okay |
---|
0:10:25 | t and use a different |
---|
0:10:27 | algorithms that they have to develop a and use agreed a development set up |
---|
0:10:38 | due to these several the mismatch what is more important the algorithms that use of |
---|
0:10:42 | human data |
---|
0:10:44 | and we run some analyses of to try to have some a answers to these |
---|
0:10:50 | questions |
---|
0:10:51 | using an mfcc |
---|
0:10:53 | plus deltas and double and the task weights at the nn out a gaussian backend |
---|
0:10:58 | classifier |
---|
0:11:00 | is that sixty nine twenty here |
---|
0:11:04 | so after |
---|
0:11:07 | which good discussions with something so the evaluation will there are several factors |
---|
0:11:13 | in the development least |
---|
0:11:15 | so |
---|
0:11:16 | all morse |
---|
0:11:17 | the chunking didn't help at all |
---|
0:11:21 | so we're gonna do some experiments just removing the a the a the chunks of |
---|
0:11:27 | the all on that |
---|
0:11:30 | also the different this plead |
---|
0:11:34 | most of the team square you seen sixty percent now forty or sixty percent for |
---|
0:11:39 | training and forty percent for development |
---|
0:11:42 | we |
---|
0:11:44 | would like to things the in made to guys for providing their the least that |
---|
0:11:48 | we were using |
---|
0:11:51 | and also usual the data for the final mark and training and calibration |
---|
0:11:56 | was also a key |
---|
0:11:58 | thing to do |
---|
0:12:01 | i'm unit using the uniform s p duration for the dev segments |
---|
0:12:06 | and also we run some augmentation of the data and some double algorithms that we |
---|
0:12:11 | liked |
---|
0:12:13 | okay so here is the results post evaluation results so us we can see we |
---|
0:12:20 | went from our primary system and twenty three point three |
---|
0:12:25 | to say fusion system to twenty one point nine within the fusion just that one |
---|
0:12:31 | and we keep |
---|
0:12:35 | improving if we modify the training and that this pleading we are you seen |
---|
0:12:40 | all the all the data for the training the ubm and the backend systems and |
---|
0:12:46 | diffusions and also |
---|
0:12:49 | you we are not chunking we're we are also improvement |
---|
0:12:53 | the performance so id in we could have fifteen percent a relative gain |
---|
0:13:01 | out so |
---|
0:13:03 | so that that's shows that a the development data was crucial easy solution |
---|
0:13:09 | also scenes |
---|
0:13:12 | a small leak said they where using a different ubm system for used its cluster |
---|
0:13:17 | we want to also |
---|
0:13:19 | use these solution and we also |
---|
0:13:22 | could see some improvement |
---|
0:13:25 | thanks to guys from prior for that |
---|
0:13:30 | that so we want to study how we how sensitive he's the different |
---|
0:13:36 | a blocks in our paper claim to this mismatch so we use radar so get |
---|
0:13:42 | some data from the from the test put on the development we create up for |
---|
0:13:46 | full deviations of that this they don't get some data on the different parts of |
---|
0:13:51 | the of our paper |
---|
0:13:54 | so |
---|
0:13:55 | easily we can say that they back end that a and the i-vector extractor sniffling |
---|
0:14:02 | c significantly impact the mismatch a lot because we can see there is a few |
---|
0:14:07 | percent of relative gain an s sixty percent of relative gains seen in |
---|
0:14:13 | balls |
---|
0:14:16 | steps a respectively |
---|
0:14:18 | so some message to take a means that |
---|
0:14:23 | for us it didn't work they fusion and the chunking training data for day for |
---|
0:14:30 | the classification |
---|
0:14:32 | and it works |
---|
0:14:34 | and also it works for the rest of the groups i guess the bottleneck features |
---|
0:14:39 | the gaussian and a neural networks cans |
---|
0:14:45 | and also it were so |
---|
0:14:48 | it was a low you that are the having a good development set it was |
---|
0:14:54 | something very important for this |
---|
0:14:57 | okay something top |
---|
0:15:05 | we have time core for questions |
---|
0:15:12 | all the channels cz getting they segments that we have and lead segment a speeding |
---|
0:15:20 | very short segment |
---|
0:15:22 | from the second two seconds |
---|
0:15:27 | for the backend was used for the work |
---|
0:15:29 | between |
---|
0:15:37 | and the question |
---|
0:15:41 | al |
---|
0:15:46 | just like i guess this is a commonality whatever's but we define a fact that |
---|
0:15:51 | we could be successful with an at twenty split and with doing a segment durations |
---|
0:15:58 | for all classifier trained |
---|
0:16:02 | really |
---|
0:16:03 | figure two no |
---|
0:16:04 | so we are |
---|
0:16:06 | is not the ones for this okay good to know |
---|
0:16:09 | we could you sure the spleen at least |
---|
0:16:12 | just yes i think we could we had documentations in it too so we have |
---|
0:16:17 | to talk about that part of this |
---|
0:16:19 | okay |
---|
0:16:23 | could you put up to us like the can where you didn't the twenty at |
---|
0:16:27 | the at twenty and then went down to the sixty forty splits |
---|
0:16:33 | so that it was really nice to see that because i think most groups we |
---|
0:16:37 | saw most sensitive using sixty forty than the data retrain right we didn't have an |
---|
0:16:43 | operating cycles receive you cycles what an hour training so we did we actually started |
---|
0:16:47 | to sixty which was where her track what hurt us |
---|
0:16:50 | but i think most folks of they started with the at if they didn't do |
---|
0:16:54 | a retrain probably |
---|
0:16:56 | did or did okay |
---|
0:16:58 | but i think that's actually showed really nice improvement on where exactly so when you |
---|
0:17:03 | do all |
---|
0:17:05 | you did is then all test |
---|
0:17:09 | that is the you that is the and |
---|
0:17:11 | okay |
---|
0:17:16 | to other questions |
---|
0:17:23 | okay well let's think the speaker again thing |
---|