0:00:15 | sorry this is the i want to talk i'll give a bit of the background |
---|
0:00:19 | briefly describe the system |
---|
0:00:20 | talk about the data with playing with fourteen distinct conditions of are trying to calibrated |
---|
0:00:26 | a little be existing calibration methods and then on the propose |
---|
0:00:31 | trial based calibration |
---|
0:00:34 | so we backchan a very good background already with the other talks but even very |
---|
0:00:37 | accurate sid systems maybe not well calibrated this means that i and you might have |
---|
0:00:42 | very low equal error rates for conditions evaluated independently |
---|
0:00:46 | but once you pull those together a single threshold to reach that operating point when |
---|
0:00:52 | applied |
---|
0:00:53 | also just this problem the right here the blue we distributions of target trials the |
---|
0:00:59 | red ones are impostor trials you can see be yellow threshold on the bottom of |
---|
0:01:03 | affording conditions a very quite well but |
---|
0:01:07 | so calibrating correctly for each ink each condition helps us to reduce this threshold variability |
---|
0:01:13 | among many other benefits the fertile small |
---|
0:01:17 | i probably die need a refresher of the other talks |
---|
0:01:19 | but |
---|
0:01:21 | what we want essentially is a calibrated scores that we can indicate the weight of |
---|
0:01:24 | evidence for a given trial so that is a the likelihood ratio is this is |
---|
0:01:30 | the person cuticle is not be of the us what's the application forensic evidence in |
---|
0:01:35 | cool |
---|
0:01:38 | so subsequently if we have calibrated scores we can my competent threshold bayes decisions |
---|
0:01:43 | and this isn't trivial was we've heard without represent to be calibration set and it's |
---|
0:01:48 | difficult to handle the various conditions with a single calibration model and also that later |
---|
0:01:54 | in this talk will be needed measuring system performance of the number of metrics mainly |
---|
0:01:59 | focusing on calibration loss he |
---|
0:02:01 | so this is indicating a how close we are to performing the best we can |
---|
0:02:06 | for a particular operating point |
---|
0:02:09 | and in this work with focusing on equal costs between misses and false alarm sets |
---|
0:02:13 | around the equal-error point |
---|
0:02:15 | all the matrix we using ica loss of this is a more stringent criteria looking |
---|
0:02:20 | at how well calibrated we are cross all points on the operating on need it |
---|
0:02:25 | curved sorry |
---|
0:02:27 | and we're also looking at the average equal error rate across the fourteen conditions so |
---|
0:02:30 | we wanna make sure that if we calibrating assistant we're not losing speaker discriminability |
---|
0:02:35 | and of course for all metrics low is better |
---|
0:02:39 | now i'm here's to calibrate scores across those fourteen conditions sauce that the calibration loss |
---|
0:02:44 | is minimal |
---|
0:02:45 | with a single system |
---|
0:02:48 | a brief |
---|
0:02:49 | flow diagram of the system we using this study purity i-vector a lot ubm large |
---|
0:02:54 | i-vectors trying diners the presence that you can look at hyper for reference to that |
---|
0:02:59 | one the two errors with focusing on in this work out the orange boxes |
---|
0:03:04 | calibration in particular obviously and then is a box called universal audio characterization so this |
---|
0:03:09 | is a wide extracting meta information or side-information automatically from your i-vectors |
---|
0:03:16 | use the evaluation dataset it's appalling condition dataset a given to us by the f |
---|
0:03:22 | b i its source from park different sources and arabic cross in detail liza |
---|
0:03:28 | and it's ninety nine |
---|
0:03:30 | they're number of different raw conditions here that are not you tribute conditions they got |
---|
0:03:34 | cross-language cross-channel mix of both our clean and noisy speech and got a variety of |
---|
0:03:39 | durations in |
---|
0:03:41 | a the there's more details in the paper in terms of speaker break down |
---|
0:03:45 | language break down and |
---|
0:03:47 | on the right hand side there got the equal error rate for my baseline system |
---|
0:03:50 | and just to show you the difficulty i difficulty is increases as we go through |
---|
0:03:55 | the conditions |
---|
0:03:58 | since we want to the calibration data sets with put together three different datasets for |
---|
0:04:04 | this study |
---|
0:04:05 | first one is called email a this is essentially taking that f b i dataset |
---|
0:04:10 | and doing cross validation so trying to calibration model with one half testing on the |
---|
0:04:15 | other half important that around |
---|
0:04:16 | and doing again before pulling the results in getting a matrix |
---|
0:04:20 | the second dataset with labeled matched out this isn't actually matched to the dialogue but |
---|
0:04:24 | we've done the best we can from the sre and fisher data |
---|
0:04:28 | trying to sign languages |
---|
0:04:31 | and |
---|
0:04:32 | trying to cross channel cross language trials however we were lacking in cross language cross |
---|
0:04:38 | channel |
---|
0:04:38 | trials the mixture by them and a few languages whenever them there either |
---|
0:04:43 | a funny the large variability dataset we actually didn't put emphasis on trying to collect |
---|
0:04:48 | i don't like if you are dedicated |
---|
0:04:51 | we simply took a nice variation of sre data and noise re noise that is |
---|
0:04:57 | river and rats clean that as from the darpa rats project so this is |
---|
0:05:03 | five languages of interest from the program you can look i four details on that |
---|
0:05:08 | so it is that large variability dataset was meant to be kinda like let's just |
---|
0:05:14 | try what we can of the calibration model so you wanted those in for the |
---|
0:05:18 | evaluation |
---|
0:05:19 | we're gonna be looking at three different calibration training screens the first is global which |
---|
0:05:24 | is generally calibration logistic regression i'll the standard approach many of us to probably or |
---|
0:05:29 | the u |
---|
0:05:30 | there's metadata based analysis once implemented with discriminative purely eye and universal audio characterization this |
---|
0:05:36 | is something that's been a very prominent in past sri evaluations with ball and darpa |
---|
0:05:42 | rats program very useful bit |
---|
0:05:45 | and finally we propose in that role based calibration |
---|
0:05:49 | and that's also based on universal audio characterization to provide metadata |
---|
0:05:54 | let's talk about the existing methods the calibration look at some results and the shortcomings |
---|
0:06:01 | so global or generative calibration he learning single shift and scale for converting as a |
---|
0:06:08 | rule score to a likelihood ratio |
---|
0:06:11 | just on the side he can see what happens when you've got the score distributions |
---|
0:06:15 | with that calibration enough to apply global calibration for fourteen conditions |
---|
0:06:20 | so we're focusing score just very distributions around the remarks or improving lc lost |
---|
0:06:26 | because we targeting around that there |
---|
0:06:29 | so this calibration technique as nicole explain is a effective for a single nine condition |
---|
0:06:35 | but once you but multiple conditions in the |
---|
0:06:38 | you're not actually reducing the variability of your threshold |
---|
0:06:42 | and that's a problem when you've got only condition data |
---|
0:06:48 | quick description only middle based calibration |
---|
0:06:51 | this takes into account sought informational metadata information |
---|
0:06:55 | from each side of the trials that's the enrollment side and the test side |
---|
0:07:00 | the big form of the on how we can but that to a likelihood right |
---|
0:07:03 | here and that's accomplished you can look at i for more details on that one |
---|
0:07:08 | that would discriminant purity i which is used to a jointly point minimum minimize |
---|
0:07:13 | a cross entropy objective |
---|
0:07:15 | well those parameters them bottom the |
---|
0:07:17 | and b and m t represent the u i c vectors so that on the |
---|
0:07:22 | next |
---|
0:07:23 | what that |
---|
0:07:26 | this we propose is i think those action in all the c conference a few |
---|
0:07:29 | years back universal audio characterization very simple part |
---|
0:07:34 | take a training dataset dividing classes of interest that much language channel snr |
---|
0:07:39 | gender |
---|
0:07:41 | and for each of those classes model it with a gaussian so it's a gaussian |
---|
0:07:44 | backend |
---|
0:07:45 | you continue test sample comes in |
---|
0:07:48 | on the posteriors from each of those gas in the end up with a vector |
---|
0:07:50 | on the right hand side |
---|
0:07:52 | so it's like for instance that you |
---|
0:07:54 | trying to system on french and english to distinguish those two languages and you get |
---|
0:07:58 | a spanish test segment coming in |
---|
0:08:00 | our hypothesis is that the system on sci well sounds like at the same french |
---|
0:08:05 | twenty percent english in kind of reflect that posteriors |
---|
0:08:09 | that's the only |
---|
0:08:11 | let's take a look at the definition of the class c so we want to |
---|
0:08:15 | do here's i |
---|
0:08:16 | a given we had |
---|
0:08:18 | and oracle experiment so we actually to the f b i data via crossvalidation he |
---|
0:08:22 | what we can try now universal audio characterization |
---|
0:08:26 | so we pick out three one different classes snr language and channel |
---|
0:08:31 | and we said what con calibration loss improvement we're gonna get diagram global calibration |
---|
0:08:36 | and that's was listed he and the bottom are sort is what happens if you |
---|
0:08:41 | to each of those fourteen conditions |
---|
0:08:42 | calibrated each one independently |
---|
0:08:45 | a simple the results in court shows tables |
---|
0:08:48 | so that's essentially what should be the best we could the |
---|
0:08:51 | so what we've done here they will start potential of metal based calibration on our |
---|
0:08:56 | two conditions |
---|
0:08:57 | again this is something the guy of the source on the training |
---|
0:09:01 | so we're chosen here language and channel |
---|
0:09:03 | for this item mission |
---|
0:09:05 | let's look at the sensitivity of the universal audio characterization and the training set used |
---|
0:09:10 | for the calibration model |
---|
0:09:12 | the top two lines a what happens when we using an oracle experiment again the |
---|
0:09:16 | detailed i |
---|
0:09:18 | and we comparing global emitted based calibration |
---|
0:09:22 | basically what you can see he is that |
---|
0:09:25 | vertically with the say you lost |
---|
0:09:27 | the middle based calibration improves the sale of we're getting a slight reduction in april |
---|
0:09:33 | error rate the and the c laws improving a little bit as well |
---|
0:09:37 | sorry |
---|
0:09:39 | it's i will do something there which is not to say |
---|
0:09:42 | a if we then look at what happens when we bring any matched dataset remember |
---|
0:09:46 | this is sre fisher data that's meant to try and be similar to the f |
---|
0:09:50 | b i data conditions |
---|
0:09:52 | we see something interest |
---|
0:09:54 | with global calibration if we train the model on the matched i |
---|
0:09:58 | rectly reducing calibration what severely compared to the art i guess that's expected in the |
---|
0:10:03 | set because we don't always have the data that where evaluating on |
---|
0:10:07 | but once we look at metal based calibration |
---|
0:10:11 | if we use the matched data to train the universal audio characterization and then use |
---|
0:10:16 | the actual with the i data to train a calibration model we're not doing too |
---|
0:10:19 | bad |
---|
0:10:20 | we're getting a subtle improvement in sales |
---|
0:10:22 | the problem occurs once we start using the matched data for the calibration model that's |
---|
0:10:26 | the discriminative but i mean |
---|
0:10:29 | we start to really reduce l performance in calibration and then equal error rate average |
---|
0:10:35 | equal error rate starts the ball |
---|
0:10:37 | so we've got a high sensitivity to the calibration training set here |
---|
0:10:44 | i one hypothesis that we've got than one on there is this may be due |
---|
0:10:47 | to the lack of prof language and cross channel conditions in the linear discriminant space |
---|
0:10:53 | so how do we handle beyond thing trial conditions |
---|
0:10:59 | so i'm my forensic experts to lie we can implement that |
---|
0:11:02 | we can select is represented calibration training set for each individual trial |
---|
0:11:08 | no those two point eight million trials in this database is not easy thing that |
---|
0:11:11 | can be done |
---|
0:11:13 | these drawbacks calibration |
---|
0:11:16 | so smart able body approach of forensic experts and i wasn't meant to replace then |
---|
0:11:20 | by any means |
---|
0:11:22 | but that was the motivation |
---|
0:11:24 | so one adults is the system delays the choice of calibration training data until it |
---|
0:11:28 | knows the conditions of the trial |
---|
0:11:31 | so given a trial we select a representative are representative data set of the enrollment |
---|
0:11:36 | sample then we construct trials against a tight of that's representing of the test sample |
---|
0:11:42 | as well |
---|
0:11:44 | so the challenge he is how do we found that representative |
---|
0:11:49 | i'm gonna work through the box you just showing the process we did for selecting |
---|
0:11:53 | for each individual trial a small subset of thousand target trials |
---|
0:11:59 | and how many impostor trials come at |
---|
0:12:02 | the first thing we do is to extract the u i c vectors from the |
---|
0:12:06 | enrollment sought on the test on this is predicting the conditions essentially all the by |
---|
0:12:11 | size of the trial |
---|
0:12:13 | then we rank normalized slows you icy vectors against the calibration you i see so |
---|
0:12:18 | we've got this candidate calibration dataset which could be the three sets of explain the |
---|
0:12:22 | way |
---|
0:12:23 | we extracted u i cs for each of those so we already know the conditions |
---|
0:12:27 | the calibration data from a system specific |
---|
0:12:31 | we're doing rank normalization |
---|
0:12:34 | for those who don't know rank normalization very simple process where you simply replace the |
---|
0:12:38 | actual value |
---|
0:12:39 | in a given dimensional vector |
---|
0:12:41 | with the rank |
---|
0:12:43 | against everything in the calibration so you need a |
---|
0:12:46 | set to come in that you rank against |
---|
0:12:50 | more detailed in the five related to |
---|
0:12:52 | similarity measure a very simple euclidean distance |
---|
0:12:55 | from the rank normalized calibration devices |
---|
0:12:59 | sorry here they have actually been rank normalized against |
---|
0:13:03 | a this allows us to fonn most representative calibration segments for both enrollment and the |
---|
0:13:08 | test |
---|
0:13:10 | then as the sorting process so we've done before we got to this point is |
---|
0:13:14 | actually taking the calibration candidate calibration segments and done exhaustive school comparison using acid system |
---|
0:13:20 | you get a calibration score matrix now we're doing is sorting the rose |
---|
0:13:24 | by similarity to the enrollment |
---|
0:13:26 | and then the columns by similarity to test |
---|
0:13:29 | what we end up with is the upper left point of being |
---|
0:13:32 | most representative of the trial that's in given tools here |
---|
0:13:38 | selection involves trying to get a thousand target trials |
---|
0:13:41 | and we simply add to be |
---|
0:13:44 | canada go to be selected calibration set to we get the |
---|
0:13:48 | fighting than the next most representative |
---|
0:13:51 | from the enrol side all the test site which underscores class based on a similarity |
---|
0:13:55 | measure |
---|
0:13:58 | not i think not here is that the segments without target trial as you going |
---|
0:14:02 | through this process are excluded otherwise you have might have cross-database impostor trials which are |
---|
0:14:08 | actually quite easy and that could bias the calibration model homes |
---|
0:14:14 | that something's tonight about this |
---|
0:14:16 | overcomes the intention is for star on the shortcomings of the middle based calibration but |
---|
0:14:21 | selecting the most representative trials |
---|
0:14:24 | and then learn the calibration model from that |
---|
0:14:26 | representativeness is not guarantee |
---|
0:14:29 | it's not saying that |
---|
0:14:30 | we've got this full of data we can actually find software is then it not |
---|
0:14:33 | is not the case with funding marched are presented |
---|
0:14:36 | in the case that there is nothing like the trial that are coming across it |
---|
0:14:41 | probably wouldn't but to something more like a general |
---|
0:14:44 | a randomly selected calibration model |
---|
0:14:47 | and that supposedly think it's better than overfitting possible |
---|
0:14:51 | so this is suitable for evaluation scenarios where you've gotta have a decision for anything |
---|
0:14:55 | so you've got speech from both sides of trial you need to produce a school |
---|
0:14:59 | for evaluation so that does not represent what forensic experts what the |
---|
0:15:03 | if for instance like hackett data for a given to all that much simply cite |
---|
0:15:07 | we rented this impossible without |
---|
0:15:10 | you know admit just call |
---|
0:15:12 | that's just a few things to keep in one |
---|
0:15:16 | that's look at that result this is on the matched data here results first the |
---|
0:15:20 | global calibration technique |
---|
0:15:22 | a across all the fourteen conditions we're getting a nice improvement average of thirty five |
---|
0:15:27 | percent reduction in c lost |
---|
0:15:29 | and not shown on the slide but in the type of this i twenty percent |
---|
0:15:33 | reduction in sale a more stringent metric |
---|
0:15:39 | so if we compare the three practise now on the large variability data so this |
---|
0:15:42 | is the one pooled from many different sources just to throw the system |
---|
0:15:47 | we see that middle based calibration |
---|
0:15:50 | actually reduce the average of all the theme of |
---|
0:15:54 | at the given operating point |
---|
0:15:56 | but unfortunately increases see a lot and equal error rate as well |
---|
0:16:02 | so again this is probably coming down to the overfitting issue all the lack of |
---|
0:16:06 | trials in a certain conditions |
---|
0:16:08 | where for instance |
---|
0:16:10 | if the condition was coming into the metal based calibration technique that i nice in |
---|
0:16:15 | a few trials for or few errors for |
---|
0:16:18 | it my say pretty confident about that this is the why we should calibrate when |
---|
0:16:23 | in fact it's quite mismatched of the day that's coming |
---|
0:16:26 | a based calibration ever improve the calibration metrics in both god and also improve the |
---|
0:16:33 | discrimination power of the system and this again is probably something that should be expected |
---|
0:16:38 | given that you're trying to apply a single threshold to get the equal error rate |
---|
0:16:42 | point |
---|
0:16:43 | and you fourteen conditions |
---|
0:16:48 | pictorially are found this kind of interesting just the have a threshold are going between |
---|
0:16:54 | the different conditions here trained on the large variability data and you can say you |
---|
0:16:58 | basically the metal based calibration the global calibration was to but the spread across the |
---|
0:17:02 | thresholds their trial based calibration on this time scale down one on the |
---|
0:17:07 | a starting to cluster them close to zero obviously it's not article all we haven't |
---|
0:17:13 | succeeded in getting to where we need to be |
---|
0:17:17 | but it's something in the right direction let's suppose |
---|
0:17:22 | in conclusion we can say that |
---|
0:17:24 | well it's difficult to calibrate over a wide range of conditions |
---|
0:17:28 | at a based calibration we show that was struggling |
---|
0:17:31 | when we haven't of the three training conditions or very few all we propose trial |
---|
0:17:36 | based calibration to address that shortcoming |
---|
0:17:38 | and what does this select the calibration training set at test time |
---|
0:17:43 | i'd avoid overfitting to limited trials what using the minimum target trials that one thousand |
---|
0:17:47 | target trials with round |
---|
0:17:49 | and it reverts to a more general class of calibration model if the conditions around |
---|
0:17:53 | same |
---|
0:17:55 | future work there's a lot future work here |
---|
0:17:57 | remove the computational bottleneck |
---|
0:18:00 | calibrating two point eight million trials independently |
---|
0:18:03 | so one option them up to closed forms solution but presented that a bleep industry |
---|
0:18:10 | or not the thing that jeff actually mention this |
---|
0:18:14 | some radical in |
---|
0:18:15 | a indication of how representative the calibration set that was selected is full that raw |
---|
0:18:20 | for instance here of the said to select and marched representative set it's that set |
---|
0:18:25 | is in fact something forensic experts wouldn't have chosen |
---|
0:18:29 | the user would one and i |
---|
0:18:33 | can we incorporate phonetic information |
---|
0:18:35 | relevant to joyce talk this one |
---|
0:18:38 | is that the in an i-vector framework something suitable p |
---|
0:18:43 | and finally can we actually learn a way of approximating calibration shift |
---|
0:18:48 | and scale using just the u i c vectors |
---|
0:18:52 | and that concludes montauk just leave you with the |
---|
0:18:55 | flow diagram case or questions |
---|
0:19:08 | so in your are the based calibration there's thanks to there's two components have to |
---|
0:19:15 | train the |
---|
0:19:16 | universal you a c and you try to calibrate rate that's correct |
---|
0:19:21 | what the results reminding in the results would both the use both of those used |
---|
0:19:25 | the matched dataset |
---|
0:19:30 | we well that's it |
---|
0:19:35 | so both for matched worst |
---|
0:19:39 | it was quite bad actually one of your set i think there's a time where |
---|
0:19:42 | you've got a night of the stuff |
---|
0:19:45 | for the signal |
---|
0:19:47 | so the real thank you think then this is just applying is the u s |
---|
0:19:51 | c that's file because which you usage dataset matched dataset trial used for that the |
---|
0:19:57 | data rate |
---|
0:19:58 | so you still have a map sets of this service needs to see |
---|
0:20:03 | so as to leave here it's just the u a c that's the issue |
---|
0:20:07 | ability when we look at this |
---|
0:20:09 | the us the is obviously flying of optical in the field a and also the |
---|
0:20:16 | every equal error rate but if you wanna use the matched data for trying you |
---|
0:20:20 | i c but use the actual evaluation data for the calibration set |
---|
0:20:26 | we |
---|
0:20:27 | we're doing |
---|
0:20:29 | for remote to global in the same condition so we we're actually not |
---|
0:20:33 | not be nothing too much from having that sort of my |
---|
0:20:46 | yes i also in your future work that you still thinking about measuring how representative |
---|
0:20:54 | the really is you have some ideas there because that my limited mathematical mind i |
---|
0:20:58 | would think some sort of outlier detection |
---|
0:21:01 | for the case the |
---|
0:21:02 | but |
---|
0:21:04 | a new comment on the road thought about at this point to be honest |
---|
0:21:08 | but we know it would be something |
---|
0:21:11 | i |
---|
0:21:11 | definitely of interest that equally this is a tool to go along side of forensic |
---|
0:21:15 | expert you know where we know that automatic tools can be used i in the |
---|
0:21:21 | in certain decisions and to have a system that |
---|
0:21:24 | can dynamically calibrate and provide a better decision to the expert that's already a benefit |
---|
0:21:31 | but to have the confidence of the system's calibration is also i |
---|