0:00:18 | money um my name is raymond and we have from the chinese university of hong kong and the two four |
---|
0:00:23 | if a come research in singapore |
---|
0:00:24 | the topic or four |
---|
0:00:26 | today's days presentation is score fusion and calibration in multiple language detection |
---|
0:00:31 | with |
---|
0:00:31 | large performance variation |
---|
0:00:36 | score fusion and type |
---|
0:00:38 | not |
---|
0:00:39 | clearly defined a or to the best of our knowledge but uh in this paper we will define that to |
---|
0:00:44 | be a a process which combine or and uh i just the numerical by of scores from one or multiple |
---|
0:00:50 | detectors systems |
---|
0:00:51 | for low action call |
---|
0:00:54 | so a to be more a take uh uh i think of this we have a multi dimensional score factors |
---|
0:00:58 | from different detection systems or even a a different language detectors |
---|
0:01:03 | and what we want to have is to combine these multidimensional fact in some way to |
---|
0:01:07 | or |
---|
0:01:08 | obtain a scale a decision to the detection |
---|
0:01:12 | well |
---|
0:01:13 | oh of a particular language say |
---|
0:01:15 | so the question involved or in clues a how to adjust so combine the numerical value of scores |
---|
0:01:21 | and whether or not we need some criteria |
---|
0:01:23 | to uh guide these as just month |
---|
0:01:27 | so to name a few common approaches to fusion and calibration |
---|
0:01:30 | all |
---|
0:01:31 | we can just |
---|
0:01:32 | find like to detection systems line combine this score nearly with then a but |
---|
0:01:37 | a bit to wait |
---|
0:01:38 | and |
---|
0:01:39 | only need D |
---|
0:01:40 | discriminant that's is and goes in back and is another popular approach which assumes |
---|
0:01:45 | the uh |
---|
0:01:46 | multidimensional score factors of different detection "'cause" as in to a normal distribution |
---|
0:01:52 | and a popular approach is the logistic |
---|
0:01:54 | sorry but used to would question back and which combines the uh a detection scores |
---|
0:01:59 | with the uh |
---|
0:02:01 | maximum posterior probability criteria |
---|
0:02:03 | this method is O |
---|
0:02:05 | many of these could be approximated by a fine or linear transformation |
---|
0:02:11 | in this paper we going to focus in performance variation it is a finally define term but um |
---|
0:02:16 | generally we are going to cover |
---|
0:02:18 | this in our where we we will face a performance variation among different detectors systems all performance variation among different |
---|
0:02:26 | language actors |
---|
0:02:28 | in a following will have a multi-class logistic regression to deal with the situation of all variation among type systems |
---|
0:02:35 | and all error and this deviation calibration to do with the situation of a a variation model |
---|
0:02:41 | different language |
---|
0:02:44 | this is this in uh we have all we test it with the uh nice |
---|
0:02:48 | oh i are you two all nine and what we are having is one |
---|
0:02:52 | from a tactic system and one prosodic system |
---|
0:02:55 | we can see that huge performance for between these two system |
---|
0:02:58 | so a a a for the prosodic systems the eer of |
---|
0:03:02 | languages is range from i six percent to twenty seven presents |
---|
0:03:06 | intuitively uh we would hope to use |
---|
0:03:10 | the prosodic of the detector as which are |
---|
0:03:12 | more reliable i mean these languages which have have low errors |
---|
0:03:16 | and then we put more weight on the because it |
---|
0:03:19 | should be more reliable |
---|
0:03:20 | so we want to investigate deeper problem it it's setting in the common um multi-class logistic regression a a a |
---|
0:03:26 | a setting |
---|
0:03:27 | and are we going to demonstrate our reduction of this |
---|
0:03:30 | C average score scroll but look of like system |
---|
0:03:35 | so this is the uh a i a set up we have a too |
---|
0:03:39 | languished the data system P H |
---|
0:03:41 | from a to one and the P R prosodic one |
---|
0:03:43 | we have the light was scores or |
---|
0:03:46 | E for try okay in a language a language and T |
---|
0:03:50 | so we just to uh a you combination of the two systems scores |
---|
0:03:54 | i want to find is the uh |
---|
0:03:57 | you a combination weights are a and the uh ice factor |
---|
0:04:00 | number |
---|
0:04:02 | so in a time are are we just |
---|
0:04:05 | can see that this equation and uh we optimize |
---|
0:04:09 | oh of the data and got mode with a a a a maximum posterior probability criteria this equation |
---|
0:04:14 | so this time P a is the ports they were probably probability of of the cost and T |
---|
0:04:19 | and we just use a selection function to choose |
---|
0:04:22 | oh all those a in class data |
---|
0:04:25 | and uh finally we combine the posterior probability of all different classes is to have a a general posterior probability |
---|
0:04:31 | or |
---|
0:04:32 | mation |
---|
0:04:33 | to the cope with a large performance variation we just do a very slight |
---|
0:04:37 | oh oh changes as a two D R with foam |
---|
0:04:39 | we will test the language specific data which means we're going to use different by for different detector languages |
---|
0:04:46 | oh because we believe that there are some prosodic a a some some languages which we have a while in |
---|
0:04:52 | the prosodic system and some language pairs be high of back the in the prosodic system |
---|
0:04:56 | and um |
---|
0:04:58 | we going |
---|
0:04:58 | two |
---|
0:04:59 | try also or or or a a a two |
---|
0:05:02 | move the guy about the by uh in the out to see that there any facts in these but they |
---|
0:05:07 | the case where we have a |
---|
0:05:09 | yeah |
---|
0:05:09 | performing a prosodic systems |
---|
0:05:12 | so the uh in implementation we just a follow the about "'cause" focal to kid and do a slight modification |
---|
0:05:18 | in court |
---|
0:05:20 | we go to a i ran this deviation calibration for of |
---|
0:05:24 | dealing with the problem of |
---|
0:05:26 | variation among different not detect languages |
---|
0:05:29 | in know a a a a be there are ah |
---|
0:05:31 | pairs of |
---|
0:05:32 | we like the line just in that you two or two O nine so these |
---|
0:05:36 | oh pairs of languages becomes a bottleneck neck for D all |
---|
0:05:39 | or |
---|
0:05:40 | detection of different languages |
---|
0:05:42 | to be more but take that we |
---|
0:05:44 | look at the uh are you are generally it's four percent's but a what if focus on particular language just |
---|
0:05:48 | say possibly in |
---|
0:05:50 | we will have an error of twenty presents this the error problem the from a tactic |
---|
0:05:55 | a a the combine fusion system |
---|
0:05:57 | and the confusions between particular language language pairs say posse in croatian can be as high as twenty four percent |
---|
0:06:03 | and not the situation is same for D we can find a serious confusion between jean D and who do |
---|
0:06:11 | so can oh |
---|
0:06:12 | but calibration at we're from based on minimum |
---|
0:06:15 | a and E S a deviation was proposed or here |
---|
0:06:18 | and in this uh |
---|
0:06:20 | and we're from we hypothesise that they are pairs of detectors which contains similar that and |
---|
0:06:24 | complementary information because uh we have set a about |
---|
0:06:28 | serious confusion between pairs of language |
---|
0:06:31 | so we look at the uh like the ratio between |
---|
0:06:34 | well type language and one of the uh confusion line just what we call that related languages |
---|
0:06:39 | on top of all of the M are we find the optimal all combination of a a a for that |
---|
0:06:44 | means still have that is already the the the results after a and then we do a second time calibration |
---|
0:06:50 | and find the are optimal a i'll for parameter |
---|
0:06:54 | this transformation is same as uh that we seen you know mow is also a fine |
---|
0:07:00 | a is we have been talking about a confusion between particular pairs of languages |
---|
0:07:04 | so we can find our calibrations to uh |
---|
0:07:08 | selected data subsets |
---|
0:07:09 | oh of course is not possible for to guess ought to know in advance a a way |
---|
0:07:14 | oh |
---|
0:07:15 | a particular trial be to these will a language pairs all or or not |
---|
0:07:19 | so we use a whole re take uh just to choose the past |
---|
0:07:23 | two |
---|
0:07:24 | score among the a multi-class score factor |
---|
0:07:26 | and a a lot to guess |
---|
0:07:29 | to to obtain deep |
---|
0:07:30 | the the the estimated uh a trial was for our calibration |
---|
0:07:36 | so this is the uh optimisation equation for to find the optimal parameter i'll four |
---|
0:07:42 | oh we just |
---|
0:07:43 | start with a this difference time give first |
---|
0:07:46 | a a the minus the of to is just deviation of the a that the the arrow from the reference |
---|
0:07:51 | a a a which is actually is the detection threshold |
---|
0:07:55 | and this would like a it's uh |
---|
0:07:58 | like function it has a positive do for all in post to data and has a negative find of for |
---|
0:08:03 | in class data |
---|
0:08:05 | by having the uh product of |
---|
0:08:07 | this why |
---|
0:08:09 | and uh the difference time |
---|
0:08:10 | we can have all |
---|
0:08:12 | positive steve value for and is detection and active fellows does for a correct detection |
---|
0:08:18 | so have to my stations are only concerned with paul stiff very here |
---|
0:08:21 | so we optimize thing |
---|
0:08:23 | oh or two was a minimum |
---|
0:08:25 | total erroneous deviation but not the the ones with the correct detections |
---|
0:08:30 | oh we have to parameters you |
---|
0:08:35 | there's a loop seal on and also the and which are kind of the uh application dependent parameters |
---|
0:08:41 | oh if we just the |
---|
0:08:43 | oops don't is just ship the detection whole |
---|
0:08:46 | uh a if we is |
---|
0:08:47 | a a a a a just the and is scale the importance of fall detection misses versus false alarm |
---|
0:08:53 | so you is a brief comparison between ml are and uh our proposed a calibration i would from |
---|
0:08:58 | oh |
---|
0:08:59 | that the same a a in that both that were from a about affine transformation of score |
---|
0:09:05 | a all the M L O focus i know a a a a uh the |
---|
0:09:09 | a maximum posterior probability criterion where as our |
---|
0:09:13 | and with um is |
---|
0:09:14 | optimising was a minimum error and this deviation |
---|
0:09:18 | L use a lot with data set and uh in our |
---|
0:09:21 | a implementation we just select data set |
---|
0:09:25 | and then i was a standalone process |
---|
0:09:26 | yeah our calibration algorithm operates on top of M are |
---|
0:09:31 | i i am a its application independent and hours |
---|
0:09:33 | have a specific problem the settings for C on and of the importance |
---|
0:09:38 | oh against |
---|
0:09:39 | this is all false alarms |
---|
0:09:42 | so short comings all of our |
---|
0:09:44 | a posts or calibration i will one base that a target languages to be calibrated switch means the uh a |
---|
0:09:50 | target language related languages has to be predetermined in advance |
---|
0:09:54 | we want to enhance D calibration out from by allowing on-the-fly selection of the target languages for collaboration such that |
---|
0:10:00 | these kind of which i were from can work like in the general situation |
---|
0:10:04 | um |
---|
0:10:05 | we go back to or original hypothesis that a lot lighter races for indian and all contains |
---|
0:10:11 | similar as and complementary information and we do a post hoc and for this course of P trying to three |
---|
0:10:16 | detectors in the are you two O nine |
---|
0:10:19 | we just uh in or pairs |
---|
0:10:21 | from the trying three detectors |
---|
0:10:23 | and we plot or of the uh light whose scores of target classes and T against |
---|
0:10:28 | a of the classes uh and are |
---|
0:10:30 | and uh or we have a a interesting finding that for those |
---|
0:10:34 | related language which means to say sheen D and who do |
---|
0:10:38 | you can see a a very strong correlation of scores |
---|
0:10:41 | for the impulse discussed data |
---|
0:10:43 | that actually |
---|
0:10:44 | a a matches the uh |
---|
0:10:46 | are we should no hypothesis that |
---|
0:10:48 | if we want to find a |
---|
0:10:50 | pays of |
---|
0:10:51 | oh |
---|
0:10:53 | language detectors to calibrate they have to contain similar and complementary information |
---|
0:10:57 | so these is |
---|
0:10:58 | just |
---|
0:10:59 | and that the proof or our which you know how close |
---|
0:11:01 | so this is the case where the two |
---|
0:11:04 | language detectors are not source same that we do not see a of very high correlations |
---|
0:11:08 | in the the cost data |
---|
0:11:10 | so he we uh propose just imposed to to sticks first a we |
---|
0:11:15 | impose a minimum correlation of to point nine between two detect as before |
---|
0:11:19 | calibration mechanisms can be in |
---|
0:11:21 | and for every target class and T we just find a language with the highest correlation and how to act |
---|
0:11:26 | as a pair of detector for calibration |
---|
0:11:30 | so he's experiments |
---|
0:11:32 | do this experiment with the east or language recognition |
---|
0:11:36 | thanks a if two a nine |
---|
0:11:38 | a a closed set of thirty seconds language detection task |
---|
0:11:41 | we start with the uh from a to take a P P L B as M system with a a |
---|
0:11:45 | C average of four point six nine percent |
---|
0:11:49 | and uh we do the lda da sim by and |
---|
0:11:53 | i had fun that |
---|
0:11:54 | i mean of P for all all calibration uh i with |
---|
0:11:58 | and we carry out four experiment to cost first this we try different and L settings |
---|
0:12:02 | seconds we are tried the on-the-fly selection of the target language pairs |
---|
0:12:07 | and we also see D uh a new without of of these on the fly selection with that mean you |
---|
0:12:11 | room every N is |
---|
0:12:13 | deviation calibration and we and that's is the calibration result |
---|
0:12:18 | this C uh the M our results for different parameter the settings |
---|
0:12:21 | are we generally we have four point six nine percent C average scores |
---|
0:12:25 | oh we found that or the language dependent |
---|
0:12:29 | to |
---|
0:12:31 | which means the second row here only give a much you know error reduction which is a a actually not |
---|
0:12:35 | what we have a have expected |
---|
0:12:38 | home |
---|
0:12:38 | and the best |
---|
0:12:40 | results we have is |
---|
0:12:42 | here with a language dependence they pay to and uh |
---|
0:12:46 | also with the uh by vector calm up presents |
---|
0:12:49 | that is about time point five T for reduction of the C average i how with |
---|
0:12:55 | a this set |
---|
0:13:00 | so then we use our correlation method to find the R and yeah now pair for for the calibration |
---|
0:13:05 | so uh this is the pair |
---|
0:13:07 | trend three pairs we found |
---|
0:13:10 | a all the word |
---|
0:13:12 | i through high lights the pairs of related languages least that's by are are are uh the specification in these |
---|
0:13:19 | two O nine |
---|
0:13:20 | we found that the correlation method recovers or language pairs which are specified as mutually intelligible except for all rush |
---|
0:13:26 | and and uh ukrainian |
---|
0:13:28 | and in fact uh even if we use you can use scrolls to calibrate russian a |
---|
0:13:33 | we found that that the |
---|
0:13:34 | the the the a red did not a do was which means a in terms of the the data that |
---|
0:13:40 | the two a language detectors are not swayed same at |
---|
0:13:45 | a high correlation in impose the data is necessary but |
---|
0:13:48 | not actually sufficient condition for a and i'll from to work effectively |
---|
0:13:52 | and we see in the following slide |
---|
0:13:54 | oh with these trying three pairs of a |
---|
0:13:58 | similar language we carry out the uh minimum erroneous deviation calibration |
---|
0:14:03 | the C average we'd uses form or |
---|
0:14:06 | a point two percent to three point three one percent |
---|
0:14:09 | and uh |
---|
0:14:10 | as we have set the maybe some or |
---|
0:14:13 | language pass which |
---|
0:14:14 | may not be really use for so we look into the uh errors that T six off |
---|
0:14:19 | each specific languages |
---|
0:14:20 | so we got uh we picked this C average function into to a C D tax and also |
---|
0:14:25 | P ms and P false alarm of different type align just |
---|
0:14:28 | and we'll "'em" rate |
---|
0:14:30 | S three |
---|
0:14:31 | language here on the first table and uh at the worst |
---|
0:14:34 | three |
---|
0:14:35 | or |
---|
0:14:36 | languages language in the uh |
---|
0:14:38 | table in the bottom |
---|
0:14:39 | a are we have an very interesting finding that uh the a keep a stiff or negative all of the |
---|
0:14:44 | of a problem to actually correspond to the uh |
---|
0:14:48 | preference we need a |
---|
0:14:50 | two was |
---|
0:14:51 | base or false alarm |
---|
0:14:53 | if we are having a a positive if uh how for |
---|
0:14:55 | uh actually the |
---|
0:14:58 | minimum error an is a deviation calibration would give us a a small L a P ms |
---|
0:15:02 | and if we have an active uh a how for it and the uh |
---|
0:15:06 | find a we were have a few fewer false alarm |
---|
0:15:08 | and they will back in the |
---|
0:15:11 | well |
---|
0:15:13 | there error metric equation are actually the P me uh E |
---|
0:15:17 | having a a a larger weight in the overall was C average E questions so all we going to decide |
---|
0:15:22 | to prefer fewer misses and then we going to impose a and not |
---|
0:15:26 | uh a constraint to |
---|
0:15:28 | for the uh a for to be a |
---|
0:15:31 | so this is the find we cells are we have |
---|
0:15:34 | ah |
---|
0:15:35 | at the bottom you see with this uh |
---|
0:15:37 | find a one string of forcing a a of what do post if this C average fine a for that |
---|
0:15:41 | we used to three point one percent |
---|
0:15:43 | and uh |
---|
0:15:44 | the det curve of different uh |
---|
0:15:47 | stages of the calibration we have introduced this |
---|
0:15:50 | shown here |
---|
0:15:51 | is interesting to see that a a at the |
---|
0:15:53 | stages just before we carry out the um minimum error and is deviation calibration |
---|
0:15:58 | we look at the det curve here |
---|
0:16:01 | in in the region of a a high false alarm we can see that |
---|
0:16:05 | the they are also high missus which means |
---|
0:16:07 | the are in this region there are some in class data |
---|
0:16:11 | we at the uh a lot like to scroll case for negative |
---|
0:16:14 | and the irony erroneous deviation calibration |
---|
0:16:17 | a just |
---|
0:16:18 | kind of rescue these very negative score and |
---|
0:16:22 | T find of det curve |
---|
0:16:23 | seems to |
---|
0:16:25 | P |
---|
0:16:26 | oh |
---|
0:16:28 | no |
---|
0:16:29 | but a symmetric than the original curves |
---|
0:16:33 | so he the conclusion or for the |
---|
0:16:35 | two days presentation we have well |
---|
0:16:39 | it |
---|
0:16:40 | a a different problem settings for the multiple logistic regression with of variation among detect systems |
---|
0:16:46 | and we have also and has a minimum all erroneous deviation calibration i will from |
---|
0:16:50 | such that there are on the of we like to language pairs and we have also at an extra optimization |
---|
0:16:56 | constraint in calibration i were from just press a detection basis |
---|
0:17:00 | so this what is |
---|
0:17:02 | i important in the sense that it can bring this uh a calibration out from two was the general |
---|
0:17:08 | a more general applicability of the calibration i would form |
---|
0:17:11 | we have tested uh these are with from with the uh are you do all seven dataset where we did |
---|
0:17:15 | not expect a and a performance |
---|
0:17:17 | variation |
---|
0:17:18 | among detectors and |
---|
0:17:21 | i really works speak |
---|
0:17:22 | uh in |
---|
0:17:23 | in a way that it doesn't um the upper performance |
---|
0:17:26 | and uh we going to extend this and we've from two a situation where |
---|
0:17:30 | we going to consider multiple will late that languages |
---|
0:17:33 | and in this scenario the like correlation method and also of choosing which K as or choosing or what |
---|
0:17:41 | datasets sets to calibrate will become |
---|
0:17:43 | a difficult and very different and we going to work on that in future |
---|
0:17:48 | that's the end of |
---|
0:17:49 | today's presentation think is for much |
---|
0:17:56 | a |
---|
0:17:57 | i |
---|
0:17:57 | hmmm |
---|
0:18:06 | we have |
---|
0:18:08 | i |
---|
0:18:09 | i |
---|
0:18:10 | are |
---|
0:18:12 | i |
---|
0:18:13 | i |
---|
0:18:15 | a |
---|
0:18:17 | ah |
---|
0:18:19 | no i mean of |
---|
0:18:21 | you know prosodic system performance between different languages very a lot so there are some language |
---|
0:18:26 | in the prosodic system we've thing should be more reliable than then a languages in the prosodic system |
---|
0:18:32 | i |
---|
0:18:35 | i |
---|
0:18:37 | i |
---|
0:18:39 | i |
---|
0:18:42 | yeah |
---|
0:18:43 | yeah |
---|
0:18:47 | oh uh |
---|
0:18:48 | the |
---|
0:18:49 | so i |
---|
0:18:50 | actually is the uh |
---|
0:18:52 | for point two percent |
---|
0:18:53 | and uh we the |
---|
0:18:57 | yeah |
---|
0:18:59 | hmmm |
---|
0:19:00 | yes yes yeah |
---|
0:19:01 | yes |
---|
0:19:02 | yeah |
---|
0:19:03 | a |
---|
0:19:04 | oh |
---|
0:19:05 | we we we didn't show that |
---|
0:19:07 | all that the reason is that a with |
---|
0:19:09 | the |
---|
0:19:11 | a P a a a we have less than a five percent relative improvement of of of and i think |
---|
0:19:17 | the uh uh a a a prosodic this |
---|
0:19:26 | and |
---|
0:19:27 | a |
---|
0:19:29 | you |
---|
0:19:30 | to |
---|
0:19:31 | oh |
---|
0:19:32 | i |
---|
0:19:34 | hmmm |
---|
0:19:34 | yeah |
---|
0:19:36 | yes |
---|
0:19:37 | yeah |
---|
0:19:40 | uh |
---|
0:19:41 | a |
---|
0:19:43 | no |
---|
0:19:43 | yeah |
---|
0:19:44 | a |
---|
0:19:47 | or |
---|
0:19:48 | oh |
---|
0:19:51 | the situation would become very different in multi because as i have shown here |
---|
0:19:57 | when the to class then we can and have a very clear correlation between a |
---|
0:20:02 | to detect |
---|
0:20:03 | that a when the multi cost then |
---|
0:20:05 | you see like |
---|
0:20:08 | oh at different tales |
---|
0:20:10 | because the this is not multi dimensional then we cannot find correlation anymore |
---|
0:20:15 | so the situation is |
---|
0:20:16 | uh uh quite different a more complex than than than this one |
---|
0:20:20 | oh if we the not |
---|
0:20:23 | oh |
---|
0:20:23 | we we we are we are now in the in the stage of |
---|
0:20:27 | also of doing that and hopefully use see the results it |
---|
0:20:33 | date |
---|
0:20:35 | i |
---|
0:20:37 | okay |
---|
0:20:40 | yeah |
---|
0:20:40 | that |
---|
0:20:41 | oh |
---|
0:20:42 | oh |
---|
0:20:43 | okay |
---|
0:20:44 | i |
---|
0:20:50 | is a |
---|
0:20:52 | that that was a |
---|
0:20:54 | that's the uh uh for pouring to present you've seen |
---|
0:21:03 | oh this one |
---|
0:21:06 | a |
---|
0:21:09 | is |
---|
0:21:10 | yeah |
---|
0:21:11 | a that it's multi |
---|
0:21:12 | that is before we won the pairs |
---|
0:21:18 | yeah |
---|
0:21:19 | i |
---|
0:21:24 | hmmm |
---|
0:21:25 | hmmm |
---|
0:21:31 | hmmm |
---|
0:21:40 | that's exactly |
---|
0:21:41 | well |
---|
0:21:42 | same as what we've set the beginning we just feel D a score calibration all |
---|
0:21:47 | fusion as a problem of |
---|
0:21:49 | combining in multi uh the measures call factor to a scale these decisions |
---|
0:21:53 | so we don't K whether that multi dimensions mentions go back to a |
---|
0:21:56 | how comprises scores from different detection systems score |
---|
0:21:59 | scores from different a language detector we just a few that as a general it it |
---|
0:22:05 | generic multi dimensional scroll fact and try to find ways to combine them |
---|
0:22:11 | for |
---|