0:00:18money um my name is raymond and we have from the chinese university of hong kong and the two four
0:00:23if a come research in singapore
0:00:24the topic or four
0:00:26today's days presentation is score fusion and calibration in multiple language detection
0:00:31with
0:00:31large performance variation
0:00:36score fusion and type
0:00:38not
0:00:39clearly defined a or to the best of our knowledge but uh in this paper we will define that to
0:00:44be a a process which combine or and uh i just the numerical by of scores from one or multiple
0:00:50detectors systems
0:00:51for low action call
0:00:54so a to be more a take uh uh i think of this we have a multi dimensional score factors
0:00:58from different detection systems or even a a different language detectors
0:01:03and what we want to have is to combine these multidimensional fact in some way to
0:01:07or
0:01:08obtain a scale a decision to the detection
0:01:12well
0:01:13oh of a particular language say
0:01:15so the question involved or in clues a how to adjust so combine the numerical value of scores
0:01:21and whether or not we need some criteria
0:01:23to uh guide these as just month
0:01:27so to name a few common approaches to fusion and calibration
0:01:30all
0:01:31we can just
0:01:32find like to detection systems line combine this score nearly with then a but
0:01:37a bit to wait
0:01:38and
0:01:39only need D
0:01:40discriminant that's is and goes in back and is another popular approach which assumes
0:01:45the uh
0:01:46multidimensional score factors of different detection "'cause" as in to a normal distribution
0:01:52and a popular approach is the logistic
0:01:54sorry but used to would question back and which combines the uh a detection scores
0:01:59with the uh
0:02:01maximum posterior probability criteria
0:02:03this method is O
0:02:05many of these could be approximated by a fine or linear transformation
0:02:11in this paper we going to focus in performance variation it is a finally define term but um
0:02:16generally we are going to cover
0:02:18this in our where we we will face a performance variation among different detectors systems all performance variation among different
0:02:26language actors
0:02:28in a following will have a multi-class logistic regression to deal with the situation of all variation among type systems
0:02:35and all error and this deviation calibration to do with the situation of a a variation model
0:02:41different language
0:02:44this is this in uh we have all we test it with the uh nice
0:02:48oh i are you two all nine and what we are having is one
0:02:52from a tactic system and one prosodic system
0:02:55we can see that huge performance for between these two system
0:02:58so a a a for the prosodic systems the eer of
0:03:02languages is range from i six percent to twenty seven presents
0:03:06intuitively uh we would hope to use
0:03:10the prosodic of the detector as which are
0:03:12more reliable i mean these languages which have have low errors
0:03:16and then we put more weight on the because it
0:03:19should be more reliable
0:03:20so we want to investigate deeper problem it it's setting in the common um multi-class logistic regression a a a
0:03:26a setting
0:03:27and are we going to demonstrate our reduction of this
0:03:30C average score scroll but look of like system
0:03:35so this is the uh a i a set up we have a too
0:03:39languished the data system P H
0:03:41from a to one and the P R prosodic one
0:03:43we have the light was scores or
0:03:46E for try okay in a language a language and T
0:03:50so we just to uh a you combination of the two systems scores
0:03:54i want to find is the uh
0:03:57you a combination weights are a and the uh ice factor
0:04:00number
0:04:02so in a time are are we just
0:04:05can see that this equation and uh we optimize
0:04:09oh of the data and got mode with a a a a maximum posterior probability criteria this equation
0:04:14so this time P a is the ports they were probably probability of of the cost and T
0:04:19and we just use a selection function to choose
0:04:22oh all those a in class data
0:04:25and uh finally we combine the posterior probability of all different classes is to have a a general posterior probability
0:04:31or
0:04:32mation
0:04:33to the cope with a large performance variation we just do a very slight
0:04:37oh oh changes as a two D R with foam
0:04:39we will test the language specific data which means we're going to use different by for different detector languages
0:04:46oh because we believe that there are some prosodic a a some some languages which we have a while in
0:04:52the prosodic system and some language pairs be high of back the in the prosodic system
0:04:56and um
0:04:58we going
0:04:58two
0:04:59try also or or or a a a two
0:05:02move the guy about the by uh in the out to see that there any facts in these but they
0:05:07the case where we have a
0:05:09yeah
0:05:09performing a prosodic systems
0:05:12so the uh in implementation we just a follow the about "'cause" focal to kid and do a slight modification
0:05:18in court
0:05:20we go to a i ran this deviation calibration for of
0:05:24dealing with the problem of
0:05:26variation among different not detect languages
0:05:29in know a a a a be there are ah
0:05:31pairs of
0:05:32we like the line just in that you two or two O nine so these
0:05:36oh pairs of languages becomes a bottleneck neck for D all
0:05:39or
0:05:40detection of different languages
0:05:42to be more but take that we
0:05:44look at the uh are you are generally it's four percent's but a what if focus on particular language just
0:05:48say possibly in
0:05:50we will have an error of twenty presents this the error problem the from a tactic
0:05:55a a the combine fusion system
0:05:57and the confusions between particular language language pairs say posse in croatian can be as high as twenty four percent
0:06:03and not the situation is same for D we can find a serious confusion between jean D and who do
0:06:11so can oh
0:06:12but calibration at we're from based on minimum
0:06:15a and E S a deviation was proposed or here
0:06:18and in this uh
0:06:20and we're from we hypothesise that they are pairs of detectors which contains similar that and
0:06:24complementary information because uh we have set a about
0:06:28serious confusion between pairs of language
0:06:31so we look at the uh like the ratio between
0:06:34well type language and one of the uh confusion line just what we call that related languages
0:06:39on top of all of the M are we find the optimal all combination of a a a for that
0:06:44means still have that is already the the the results after a and then we do a second time calibration
0:06:50and find the are optimal a i'll for parameter
0:06:54this transformation is same as uh that we seen you know mow is also a fine
0:07:00a is we have been talking about a confusion between particular pairs of languages
0:07:04so we can find our calibrations to uh
0:07:08selected data subsets
0:07:09oh of course is not possible for to guess ought to know in advance a a way
0:07:14oh
0:07:15a particular trial be to these will a language pairs all or or not
0:07:19so we use a whole re take uh just to choose the past
0:07:23two
0:07:24score among the a multi-class score factor
0:07:26and a a lot to guess
0:07:29to to obtain deep
0:07:30the the the estimated uh a trial was for our calibration
0:07:36so this is the uh optimisation equation for to find the optimal parameter i'll four
0:07:42oh we just
0:07:43start with a this difference time give first
0:07:46a a the minus the of to is just deviation of the a that the the arrow from the reference
0:07:51a a a which is actually is the detection threshold
0:07:55and this would like a it's uh
0:07:58like function it has a positive do for all in post to data and has a negative find of for
0:08:03in class data
0:08:05by having the uh product of
0:08:07this why
0:08:09and uh the difference time
0:08:10we can have all
0:08:12positive steve value for and is detection and active fellows does for a correct detection
0:08:18so have to my stations are only concerned with paul stiff very here
0:08:21so we optimize thing
0:08:23oh or two was a minimum
0:08:25total erroneous deviation but not the the ones with the correct detections
0:08:30oh we have to parameters you
0:08:35there's a loop seal on and also the and which are kind of the uh application dependent parameters
0:08:41oh if we just the
0:08:43oops don't is just ship the detection whole
0:08:46uh a if we is
0:08:47a a a a a just the and is scale the importance of fall detection misses versus false alarm
0:08:53so you is a brief comparison between ml are and uh our proposed a calibration i would from
0:08:58oh
0:08:59that the same a a in that both that were from a about affine transformation of score
0:09:05a all the M L O focus i know a a a a uh the
0:09:09a maximum posterior probability criterion where as our
0:09:13and with um is
0:09:14optimising was a minimum error and this deviation
0:09:18L use a lot with data set and uh in our
0:09:21a implementation we just select data set
0:09:25and then i was a standalone process
0:09:26yeah our calibration algorithm operates on top of M are
0:09:31i i am a its application independent and hours
0:09:33have a specific problem the settings for C on and of the importance
0:09:38oh against
0:09:39this is all false alarms
0:09:42so short comings all of our
0:09:44a posts or calibration i will one base that a target languages to be calibrated switch means the uh a
0:09:50target language related languages has to be predetermined in advance
0:09:54we want to enhance D calibration out from by allowing on-the-fly selection of the target languages for collaboration such that
0:10:00these kind of which i were from can work like in the general situation
0:10:04um
0:10:05we go back to or original hypothesis that a lot lighter races for indian and all contains
0:10:11similar as and complementary information and we do a post hoc and for this course of P trying to three
0:10:16detectors in the are you two O nine
0:10:19we just uh in or pairs
0:10:21from the trying three detectors
0:10:23and we plot or of the uh light whose scores of target classes and T against
0:10:28a of the classes uh and are
0:10:30and uh or we have a a interesting finding that for those
0:10:34related language which means to say sheen D and who do
0:10:38you can see a a very strong correlation of scores
0:10:41for the impulse discussed data
0:10:43that actually
0:10:44a a matches the uh
0:10:46are we should no hypothesis that
0:10:48if we want to find a
0:10:50pays of
0:10:51oh
0:10:53language detectors to calibrate they have to contain similar and complementary information
0:10:57so these is
0:10:58just
0:10:59and that the proof or our which you know how close
0:11:01so this is the case where the two
0:11:04language detectors are not source same that we do not see a of very high correlations
0:11:08in the the cost data
0:11:10so he we uh propose just imposed to to sticks first a we
0:11:15impose a minimum correlation of to point nine between two detect as before
0:11:19calibration mechanisms can be in
0:11:21and for every target class and T we just find a language with the highest correlation and how to act
0:11:26as a pair of detector for calibration
0:11:30so he's experiments
0:11:32do this experiment with the east or language recognition
0:11:36thanks a if two a nine
0:11:38a a closed set of thirty seconds language detection task
0:11:41we start with the uh from a to take a P P L B as M system with a a
0:11:45C average of four point six nine percent
0:11:49and uh we do the lda da sim by and
0:11:53i had fun that
0:11:54i mean of P for all all calibration uh i with
0:11:58and we carry out four experiment to cost first this we try different and L settings
0:12:02seconds we are tried the on-the-fly selection of the target language pairs
0:12:07and we also see D uh a new without of of these on the fly selection with that mean you
0:12:11room every N is
0:12:13deviation calibration and we and that's is the calibration result
0:12:18this C uh the M our results for different parameter the settings
0:12:21are we generally we have four point six nine percent C average scores
0:12:25oh we found that or the language dependent
0:12:29to
0:12:31which means the second row here only give a much you know error reduction which is a a actually not
0:12:35what we have a have expected
0:12:38home
0:12:38and the best
0:12:40results we have is
0:12:42here with a language dependence they pay to and uh
0:12:46also with the uh by vector calm up presents
0:12:49that is about time point five T for reduction of the C average i how with
0:12:55a this set
0:13:00so then we use our correlation method to find the R and yeah now pair for for the calibration
0:13:05so uh this is the pair
0:13:07trend three pairs we found
0:13:10a all the word
0:13:12i through high lights the pairs of related languages least that's by are are are uh the specification in these
0:13:19two O nine
0:13:20we found that the correlation method recovers or language pairs which are specified as mutually intelligible except for all rush
0:13:26and and uh ukrainian
0:13:28and in fact uh even if we use you can use scrolls to calibrate russian a
0:13:33we found that that the
0:13:34the the the a red did not a do was which means a in terms of the the data that
0:13:40the two a language detectors are not swayed same at
0:13:45a high correlation in impose the data is necessary but
0:13:48not actually sufficient condition for a and i'll from to work effectively
0:13:52and we see in the following slide
0:13:54oh with these trying three pairs of a
0:13:58similar language we carry out the uh minimum erroneous deviation calibration
0:14:03the C average we'd uses form or
0:14:06a point two percent to three point three one percent
0:14:09and uh
0:14:10as we have set the maybe some or
0:14:13language pass which
0:14:14may not be really use for so we look into the uh errors that T six off
0:14:19each specific languages
0:14:20so we got uh we picked this C average function into to a C D tax and also
0:14:25P ms and P false alarm of different type align just
0:14:28and we'll "'em" rate
0:14:30S three
0:14:31language here on the first table and uh at the worst
0:14:34three
0:14:35or
0:14:36languages language in the uh
0:14:38table in the bottom
0:14:39a are we have an very interesting finding that uh the a keep a stiff or negative all of the
0:14:44of a problem to actually correspond to the uh
0:14:48preference we need a
0:14:50two was
0:14:51base or false alarm
0:14:53if we are having a a positive if uh how for
0:14:55uh actually the
0:14:58minimum error an is a deviation calibration would give us a a small L a P ms
0:15:02and if we have an active uh a how for it and the uh
0:15:06find a we were have a few fewer false alarm
0:15:08and they will back in the
0:15:11well
0:15:13there error metric equation are actually the P me uh E
0:15:17having a a a larger weight in the overall was C average E questions so all we going to decide
0:15:22to prefer fewer misses and then we going to impose a and not
0:15:26uh a constraint to
0:15:28for the uh a for to be a
0:15:31so this is the find we cells are we have
0:15:34ah
0:15:35at the bottom you see with this uh
0:15:37find a one string of forcing a a of what do post if this C average fine a for that
0:15:41we used to three point one percent
0:15:43and uh
0:15:44the det curve of different uh
0:15:47stages of the calibration we have introduced this
0:15:50shown here
0:15:51is interesting to see that a a at the
0:15:53stages just before we carry out the um minimum error and is deviation calibration
0:15:58we look at the det curve here
0:16:01in in the region of a a high false alarm we can see that
0:16:05the they are also high missus which means
0:16:07the are in this region there are some in class data
0:16:11we at the uh a lot like to scroll case for negative
0:16:14and the irony erroneous deviation calibration
0:16:17a just
0:16:18kind of rescue these very negative score and
0:16:22T find of det curve
0:16:23seems to
0:16:25P
0:16:26oh
0:16:28no
0:16:29but a symmetric than the original curves
0:16:33so he the conclusion or for the
0:16:35two days presentation we have well
0:16:39it
0:16:40a a different problem settings for the multiple logistic regression with of variation among detect systems
0:16:46and we have also and has a minimum all erroneous deviation calibration i will from
0:16:50such that there are on the of we like to language pairs and we have also at an extra optimization
0:16:56constraint in calibration i were from just press a detection basis
0:17:00so this what is
0:17:02i important in the sense that it can bring this uh a calibration out from two was the general
0:17:08a more general applicability of the calibration i would form
0:17:11we have tested uh these are with from with the uh are you do all seven dataset where we did
0:17:15not expect a and a performance
0:17:17variation
0:17:18among detectors and
0:17:21i really works speak
0:17:22uh in
0:17:23in a way that it doesn't um the upper performance
0:17:26and uh we going to extend this and we've from two a situation where
0:17:30we going to consider multiple will late that languages
0:17:33and in this scenario the like correlation method and also of choosing which K as or choosing or what
0:17:41datasets sets to calibrate will become
0:17:43a difficult and very different and we going to work on that in future
0:17:48that's the end of
0:17:49today's presentation think is for much
0:17:56a
0:17:57i
0:17:57hmmm
0:18:06we have
0:18:08i
0:18:09i
0:18:10are
0:18:12i
0:18:13i
0:18:15a
0:18:17ah
0:18:19no i mean of
0:18:21you know prosodic system performance between different languages very a lot so there are some language
0:18:26in the prosodic system we've thing should be more reliable than then a languages in the prosodic system
0:18:32i
0:18:35i
0:18:37i
0:18:39i
0:18:42yeah
0:18:43yeah
0:18:47oh uh
0:18:48the
0:18:49so i
0:18:50actually is the uh
0:18:52for point two percent
0:18:53and uh we the
0:18:57yeah
0:18:59hmmm
0:19:00yes yes yeah
0:19:01yes
0:19:02yeah
0:19:03a
0:19:04oh
0:19:05we we we didn't show that
0:19:07all that the reason is that a with
0:19:09the
0:19:11a P a a a we have less than a five percent relative improvement of of of and i think
0:19:17the uh uh a a a prosodic this
0:19:26and
0:19:27a
0:19:29you
0:19:30to
0:19:31oh
0:19:32i
0:19:34hmmm
0:19:34yeah
0:19:36yes
0:19:37yeah
0:19:40uh
0:19:41a
0:19:43no
0:19:43yeah
0:19:44a
0:19:47or
0:19:48oh
0:19:51the situation would become very different in multi because as i have shown here
0:19:57when the to class then we can and have a very clear correlation between a
0:20:02to detect
0:20:03that a when the multi cost then
0:20:05you see like
0:20:08oh at different tales
0:20:10because the this is not multi dimensional then we cannot find correlation anymore
0:20:15so the situation is
0:20:16uh uh quite different a more complex than than than this one
0:20:20oh if we the not
0:20:23oh
0:20:23we we we are we are now in the in the stage of
0:20:27also of doing that and hopefully use see the results it
0:20:33date
0:20:35i
0:20:37okay
0:20:40yeah
0:20:40that
0:20:41oh
0:20:42oh
0:20:43okay
0:20:44i
0:20:50is a
0:20:52that that was a
0:20:54that's the uh uh for pouring to present you've seen
0:21:03oh this one
0:21:06a
0:21:09is
0:21:10yeah
0:21:11a that it's multi
0:21:12that is before we won the pairs
0:21:18yeah
0:21:19i
0:21:24hmmm
0:21:25hmmm
0:21:31hmmm
0:21:40that's exactly
0:21:41well
0:21:42same as what we've set the beginning we just feel D a score calibration all
0:21:47fusion as a problem of
0:21:49combining in multi uh the measures call factor to a scale these decisions
0:21:53so we don't K whether that multi dimensions mentions go back to a
0:21:56how comprises scores from different detection systems score
0:21:59scores from different a language detector we just a few that as a general it it
0:22:05generic multi dimensional scroll fact and try to find ways to combine them
0:22:11for