0:00:13 | okay |
---|
0:00:22 | so that morning |
---|
0:00:24 | i |
---|
0:00:27 | is the what we mean by a classifier fusion |
---|
0:00:31 | of |
---|
0:00:31 | classifier fusion is applicable uh |
---|
0:00:35 | whenever we have some uh and symbol of of |
---|
0:00:38 | i X |
---|
0:00:40 | and we need to come to some final decision don't |
---|
0:00:43 | uh |
---|
0:00:45 | furthermore |
---|
0:00:46 | in this um but example we we assume that those experts are able to give us off |
---|
0:00:52 | decisions |
---|
0:00:53 | in in in a a uh a form of some can fit |
---|
0:00:57 | so so perhaps the simplest and also own mostly working method uh |
---|
0:01:03 | how to fuse those scores would be just to a breach out |
---|
0:01:06 | those confidence values |
---|
0:01:09 | that sometimes we we have some prior information about uh |
---|
0:01:13 | the experts and |
---|
0:01:14 | about better uh |
---|
0:01:17 | is |
---|
0:01:17 | uh |
---|
0:01:18 | in the past |
---|
0:01:20 | um so so we would like to exploit the host |
---|
0:01:24 | this information to to uh make |
---|
0:01:27 | that there |
---|
0:01:28 | fusion |
---|
0:01:32 | so the task of |
---|
0:01:33 | classifier fusion is to take uh |
---|
0:01:36 | the of and |
---|
0:01:38 | base classifiers and uh produce one |
---|
0:01:42 | output score |
---|
0:01:43 | uh which which ideally a uh |
---|
0:01:47 | which |
---|
0:01:48 | we better performance than uh |
---|
0:01:53 | a single base classifier |
---|
0:01:58 | so we now we where we we assume uh |
---|
0:02:01 | so called |
---|
0:02:02 | linear fusion |
---|
0:02:04 | which is |
---|
0:02:04 | a very simple method that but |
---|
0:02:06 | uh |
---|
0:02:07 | i also uh |
---|
0:02:09 | used in the state of the art tools |
---|
0:02:11 | like the focal uh |
---|
0:02:13 | toolkit kit or or |
---|
0:02:15 | that's that the word to at |
---|
0:02:18 | um |
---|
0:02:22 | so a linear fusion is just wait it's sum of of |
---|
0:02:26 | the input scores |
---|
0:02:27 | uh |
---|
0:02:28 | where are the weights are trained |
---|
0:02:30 | uh from from previous |
---|
0:02:32 | uh |
---|
0:02:33 | trials |
---|
0:02:35 | with with the known based through |
---|
0:02:41 | but what we mean by uh subset fusion of uh |
---|
0:02:46 | is that uh |
---|
0:02:47 | in in |
---|
0:02:48 | subset fusion |
---|
0:02:50 | we first |
---|
0:02:51 | uh |
---|
0:02:52 | so like |
---|
0:02:53 | uh |
---|
0:02:54 | only certain classifiers from from the full set |
---|
0:02:58 | and those uh |
---|
0:03:00 | then for C to to the fusion training and and fusion |
---|
0:03:06 | what what |
---|
0:03:07 | could be the motivation for for such |
---|
0:03:10 | to something uh so first for the traditional |
---|
0:03:14 | uh approach uh with the full set |
---|
0:03:18 | it's it's |
---|
0:03:19 | the |
---|
0:03:19 | mostly used method it's |
---|
0:03:21 | forward |
---|
0:03:23 | it |
---|
0:03:23 | computationally efficient since you don't have to do the a subset selection |
---|
0:03:30 | oh |
---|
0:03:30 | but |
---|
0:03:31 | for for the lot and when we have a large number of classifiers |
---|
0:03:35 | uh we |
---|
0:03:36 | could be |
---|
0:03:37 | possibly simply over |
---|
0:03:38 | training |
---|
0:03:40 | fusion |
---|
0:03:42 | virus in in the |
---|
0:03:43 | stops case |
---|
0:03:45 | um |
---|
0:03:46 | we might |
---|
0:03:47 | possibly suitably but there |
---|
0:03:50 | that |
---|
0:03:51 | of course this this uh |
---|
0:03:54 | matt that relies on on a good subset selection |
---|
0:03:59 | so the question is can a subset fusion give better performance than the force |
---|
0:04:10 | oh forty for this system overview uh |
---|
0:04:14 | on the input we have |
---|
0:04:15 | uh |
---|
0:04:16 | speech |
---|
0:04:18 | typically two utterances |
---|
0:04:20 | a |
---|
0:04:21 | those are |
---|
0:04:23 | um |
---|
0:04:24 | uh |
---|
0:04:25 | classified by a classifier |
---|
0:04:27 | uh |
---|
0:04:29 | which |
---|
0:04:30 | i by several classifiers that we that we selected from from of full set of the classifiers |
---|
0:04:36 | and |
---|
0:04:37 | those |
---|
0:04:38 | passive that were selected that and fuse |
---|
0:04:47 | more more in detail |
---|
0:04:48 | uh |
---|
0:04:49 | how we do it |
---|
0:04:50 | is uh we first |
---|
0:04:52 | uh train uh the S skull mapping |
---|
0:04:56 | for for each of the base |
---|
0:04:58 | base classifiers scores |
---|
0:05:01 | a a S come mapping mac maps the scores in |
---|
0:05:04 | uh well calibrated log likelihood ratio |
---|
0:05:09 | um |
---|
0:05:11 | on the one that |
---|
0:05:12 | first |
---|
0:05:12 | yeah a you see you see that as kyle mapping |
---|
0:05:17 | and on this second and uh is is uh |
---|
0:05:20 | cost function C L |
---|
0:05:22 | which uh we minimize |
---|
0:05:25 | uh for the match score |
---|
0:05:31 | okay then then for each of the subset |
---|
0:05:33 | in be uh power set up two |
---|
0:05:37 | you |
---|
0:05:37 | a power of and minus one |
---|
0:05:39 | uh we train a linear fusion |
---|
0:05:43 | uh uh with a C C W L are objective function |
---|
0:05:47 | same same that that's in the focal toolkit |
---|
0:05:52 | a a that one you C |
---|
0:05:55 | in the first |
---|
0:05:56 | uh formal a |
---|
0:05:58 | uh |
---|
0:05:58 | the the prior uh |
---|
0:06:01 | with which the the C W L R |
---|
0:06:03 | function is way |
---|
0:06:05 | comes from the cost function |
---|
0:06:07 | so so for the cost function we we use the new next function |
---|
0:06:11 | but at the cost of miss type of error one cost of false alarm is one |
---|
0:06:16 | and uh a probability of target |
---|
0:06:19 | you're |
---|
0:06:20 | a target trial is |
---|
0:06:22 | zero point zero zero one |
---|
0:06:26 | okay that then after we uh use all the possible subset we we select the |
---|
0:06:32 | subset based on the smallest |
---|
0:06:34 | uh |
---|
0:06:35 | minimum uh decision cost function |
---|
0:06:38 | so the decision cost function of uh is |
---|
0:06:41 | is a function of threshold |
---|
0:06:44 | um |
---|
0:06:45 | and and |
---|
0:06:46 | the |
---|
0:06:47 | cost function parameters |
---|
0:06:49 | so so for uh |
---|
0:06:53 | we we we pick the |
---|
0:06:54 | we pick the one with with the low |
---|
0:06:57 | uh with the minimum decision |
---|
0:06:59 | function |
---|
0:07:01 | and it possible threshold |
---|
0:07:04 | and finally we we still |
---|
0:07:06 | but the actual uh a decision cost function which is |
---|
0:07:10 | the cost function |
---|
0:07:12 | in a threshold in and all the multi racial that we trained on the training |
---|
0:07:18 | a with includes |
---|
0:07:20 | uh uh also the |
---|
0:07:21 | calibration error |
---|
0:07:27 | oh of a our |
---|
0:07:28 | base classifiers |
---|
0:07:29 | uh we had |
---|
0:07:30 | well |
---|
0:07:31 | different |
---|
0:07:32 | classifiers |
---|
0:07:33 | uh |
---|
0:07:35 | which are used in the a i for you called salt to part for the nist two thousand then |
---|
0:07:40 | evaluation |
---|
0:07:42 | um |
---|
0:07:43 | we used three different sets of scores |
---|
0:07:46 | uh the so called train set and it about set one |
---|
0:07:50 | where from the extended nice |
---|
0:07:52 | uh sre sorry two thousand page |
---|
0:07:54 | files set |
---|
0:07:56 | and they are just |
---|
0:07:57 | uh |
---|
0:07:59 | a like they have very similar uh |
---|
0:08:01 | score distribution |
---|
0:08:03 | and then for um |
---|
0:08:05 | or something different you have also to |
---|
0:08:07 | is is is the |
---|
0:08:09 | uh |
---|
0:08:10 | if we shall nice |
---|
0:08:11 | two thousand and a |
---|
0:08:13 | uh evaluations |
---|
0:08:20 | ah |
---|
0:08:20 | so for the results |
---|
0:08:22 | we we divide it uh |
---|
0:08:25 | all the possible subset |
---|
0:08:27 | i size |
---|
0:08:28 | uh |
---|
0:08:29 | from one to twelve since we had |
---|
0:08:31 | twelve classifiers fires and and study different |
---|
0:08:35 | and measure |
---|
0:08:37 | we can get by selecting a good |
---|
0:08:39 | a subset |
---|
0:08:44 | uh |
---|
0:08:44 | but three |
---|
0:08:46 | uh |
---|
0:08:47 | most important point in |
---|
0:08:49 | points in this |
---|
0:08:50 | a a lot of are |
---|
0:08:51 | the worst individual subsystem |
---|
0:08:54 | the |
---|
0:08:55 | uh best individual system subsystems so that was are the sets of |
---|
0:08:59 | size one |
---|
0:09:00 | only only once is them not no fusion |
---|
0:09:03 | and uh the baseline is uh the full |
---|
0:09:06 | in sample the fusion |
---|
0:09:08 | where where all the twelve plus fires |
---|
0:09:10 | so |
---|
0:09:10 | if |
---|
0:09:13 | usual |
---|
0:09:16 | so first for for the blue line uh |
---|
0:09:19 | the blue line shows |
---|
0:09:21 | uh |
---|
0:09:22 | the non of non cheating really realistic use case where |
---|
0:09:25 | we predict the best |
---|
0:09:27 | uh subset |
---|
0:09:29 | uh from the training set and then we evaluate on the about that one |
---|
0:09:34 | so so for for this one unfortunately we we cannot but get the better result than the set fusion but |
---|
0:09:40 | we can get |
---|
0:09:41 | sometimes for for |
---|
0:09:43 | in the size if of seven right |
---|
0:09:45 | and |
---|
0:09:46 | uh we can get |
---|
0:09:48 | a a very similar result |
---|
0:09:53 | and the best subset selection or or four shows uh the best subset uh |
---|
0:09:58 | the uh performance of the best subset uh |
---|
0:10:02 | if if we knew how to select a |
---|
0:10:05 | uh |
---|
0:10:07 | then the worst subset selection or well |
---|
0:10:10 | uh i shows the case |
---|
0:10:11 | uh uh when we |
---|
0:10:12 | cell like the worst possible |
---|
0:10:14 | subset from from the power set |
---|
0:10:19 | so |
---|
0:10:19 | those are uh |
---|
0:10:21 | and and of are bound |
---|
0:10:27 | ah |
---|
0:10:28 | okay |
---|
0:10:28 | this is the same case uh |
---|
0:10:31 | only not to not for the actual dcf but for |
---|
0:10:35 | minimum dcf and you rely right |
---|
0:10:38 | so you can see we we can still uh get |
---|
0:10:41 | but their mean dcf or equal error rate by |
---|
0:10:44 | by |
---|
0:10:45 | not doing the full set fusion |
---|
0:10:48 | so |
---|
0:10:49 | but selecting a subs |
---|
0:10:55 | and finally |
---|
0:10:56 | um |
---|
0:10:57 | this is the performance on the of all set to |
---|
0:11:00 | or or of the nist two thousand ten |
---|
0:11:02 | a |
---|
0:11:03 | evaluation set |
---|
0:11:05 | um |
---|
0:11:06 | and we can also see |
---|
0:11:08 | see that for for most of the conditions |
---|
0:11:11 | interview interview uh |
---|
0:11:13 | interview telephone and telephone telephone |
---|
0:11:16 | the best subset |
---|
0:11:17 | gives |
---|
0:11:18 | that their their performance than the full and sample |
---|
0:11:21 | only only |
---|
0:11:22 | in the |
---|
0:11:23 | mike mike condition there is something wrong |
---|
0:11:26 | uh |
---|
0:11:28 | here uh even the even the full and sample |
---|
0:11:32 | it's worse |
---|
0:11:33 | results than |
---|
0:11:35 | best individual |
---|
0:11:49 | oh |
---|
0:11:49 | uh |
---|
0:11:50 | conclusion of |
---|
0:11:51 | this research is that |
---|
0:11:53 | subset fusion has |
---|
0:11:55 | a then shall to perform the full set fusion |
---|
0:11:58 | course |
---|
0:11:59 | if we knew how to select |
---|
0:12:01 | best |
---|
0:12:03 | uh |
---|
0:12:04 | there are the further study should focus on |
---|
0:12:08 | subset selection methods |
---|
0:12:14 | they i i think that |
---|
0:12:15 | it |
---|
0:12:16 | uh |
---|
0:12:23 | okay |
---|
0:12:23 | we have a a a a a a time question |
---|
0:12:28 | right |
---|
0:12:28 | you this was uh yeah back from please |
---|
0:12:31 | uh i'd like to ask if you use the same subsets for all that i was or different subsets for |
---|
0:12:35 | all the files |
---|
0:12:37 | uh |
---|
0:12:38 | you mean in one of the block |
---|
0:12:40 | or |
---|
0:12:41 | uh |
---|
0:12:43 | i generally so a this is this the system |
---|
0:12:47 | you you put a not that i was to it in |
---|
0:12:49 | yeah |
---|
0:12:50 | do you miss select a different subsets for each high or a no no no now |
---|
0:12:55 | okay |
---|
0:12:56 | so like one cell |
---|
0:12:59 | i |
---|
0:12:59 | okay |
---|
0:13:09 | did you can are you a solution with the random selection of the subset set of positions |
---|
0:13:15 | uh |
---|
0:13:18 | what we mean by a round them |
---|
0:13:20 | just |
---|
0:13:20 | see to D you can you shows one to me |
---|
0:13:24 | a so you have to plot here the |
---|
0:13:27 | to a but the two bound |
---|
0:13:29 | okay |
---|
0:13:30 | well well the random decision |
---|
0:13:33 | somewhere uh |
---|
0:13:35 | in the base |
---|
0:13:37 | oh |
---|
0:13:38 | and when you when you pick randomly you you and up with the performance between them |
---|
0:13:43 | two |
---|
0:13:44 | well |
---|
0:13:45 | and can be could be interesting to do where you these days |
---|
0:13:49 | maybe |
---|
0:13:50 | okay |
---|
0:13:50 | it |
---|
0:13:51 | the on the random selection but uh you what |
---|
0:13:54 | probably like to see a distribution |
---|
0:13:57 | oh okay |
---|
0:13:59 | but |
---|
0:14:08 | because |
---|
0:14:10 | okay do not mess up the speaker |
---|