0:00:13"'kay"
0:00:13Q so um my name is julie it can and uh i will present you
0:00:18uh what we have done for hasr submission
0:00:22and uh
0:00:23and all or are not this we have them after the mission and what we have done
0:00:29because of this proposition off uh
0:00:31questions
0:00:32so this work is than we have
0:00:34and nicholas as of where
0:00:36uh us not of that all is a seven arsed
0:00:38and uh we come from friend
0:00:42okay
0:00:43so
0:00:43uh the goal of as or uh
0:00:46as two
0:00:47i'm lies how can you man expert i think that but to lot
0:00:51uh use of was automatic speaker recognition technology is
0:00:54and uh how you we can have the makes on the the boss communities so
0:01:00uh it's was a very great experience the first time and so
0:01:04uh for for us it's was
0:01:07just uh the first experience and try to do something
0:01:10for this submission
0:01:12and uh
0:01:14the task was
0:01:15uh a very few good a classical a verification task we have a two point five minutes
0:01:21uh of uh samples for a it's speakers they come from a every ten and it was very difficult trials
0:01:28and this trials all's were
0:01:30uh choose and by uh need
0:01:33and give it in the the choose the them the choose of the trials
0:01:37was done for reports from a particular system
0:01:40and uh with this two sets
0:01:43has a one
0:01:44and has or two and so we poured to pay to as so with because the the the samples
0:01:51the the trials
0:01:53or uh colour include the trials of hasr or to include the has a one so
0:01:58with the all this the task
0:02:00so what
0:02:02we propose it was very simple because uh E a is the more and computer science uh
0:02:09uh rubber or or E and uh the league to so
0:02:13uh we just take as three net C french listeners
0:02:17and
0:02:18uh to for all and the males and one male
0:02:22and we all of them to exam a a examined spectrogram and uh to chance to band past
0:02:28uh filters signals so
0:02:31they can do was they want
0:02:33and uh they have to decide if it was
0:02:37the same speaker was speaking or
0:02:39it was
0:02:40two different speaker
0:02:42and give a uh a confidence score and uh if they gave as they were was that's means that they
0:02:48are not confidence in that decision and if they gave
0:02:51five
0:02:52that's means that the are very confident and Z C
0:02:55that for the submission
0:02:57uh to um
0:02:58i missed
0:02:59uh we choose to use the majority voting that's mean because
0:03:03three people so it's easy to have a majority so i
0:03:07uh we we we choose to use this indication to to to to the the choose of the decision
0:03:15and uh for the score no uh we said may it's uh we try to do it do we choose
0:03:20to do a mapping
0:03:22between the
0:03:23you men decision and uh the score we have we are as the and gmm system
0:03:29uh so has to compare
0:03:30uh the in try to find version
0:03:34score uh after
0:03:35and to compare
0:03:38uh if you have question on the mapping i can and sir but i think that's it's
0:03:42more interesting to to take uh all their results
0:03:46and see all things
0:03:47so
0:03:48um a
0:03:49the fact is that's a nice provide us a very long um samples so
0:03:56because we decide to do receptive test ten
0:03:59to uh listen to lots of
0:04:02the the samples
0:04:04uh we decide to
0:04:06chance
0:04:06L a little bit the things and not to give to the listeners
0:04:10all the two minutes
0:04:12of uh
0:04:13speech for each speaker
0:04:15so um it's
0:04:17and that we we decide so to to cats the signal and to sell at
0:04:22the the part which i with them more energy and uh we where we we are sure is that there
0:04:27is a lot of speech
0:04:28and the
0:04:29maybe a lot of information
0:04:32so uh and
0:04:33so we select a um the short things around six six second for it's samples
0:04:40and because of the in you have a lot of uh in perceptive test and for
0:04:45um
0:04:47which are only in psychology they use this kind of duration so that's why we we choose
0:04:53this kind
0:04:55and the uh so high have some example four
0:04:58that's you you use can see
0:05:01and her what's
0:05:02i am talking about
0:05:04because the idea was to have a beep between each sample to knows that's we are changing
0:05:11uh of sample yeah that are is a chance of them
0:05:15so that's my first example
0:05:17but
0:05:19a
0:05:21i
0:05:22a
0:05:23i
0:05:24i
0:05:25i
0:05:26i
0:05:28i
0:05:28i
0:05:29i
0:05:31i
0:05:34i
0:05:34a
0:05:35i
0:05:37oh
0:05:38i
0:05:39i
0:05:39i
0:05:42i
0:05:42so
0:05:43same-speaker same different speaker
0:05:46what these thing
0:05:50okay
0:05:51and it's not the same i
0:05:54it's always is the sensing i mean we we choose the
0:05:57or difficulty so
0:05:59yeah it's not the same but um yeah you have different sample in you can
0:06:04uh have
0:06:05you you
0:06:06not memorise because you don't have in house to memorise but with two minutes at exact same thing we can
0:06:11have a i mean your voice leaf two minutes so
0:06:14it's something you can compare
0:06:16very quickly and try to do to take a decision and
0:06:20at the consequence of this kind of
0:06:22the
0:06:23the steam is is that's uh are are are are listener take a decision very quickly
0:06:28um
0:06:29in
0:06:30around thirty sec and they they take these decision so
0:06:34that's why we we
0:06:36we choose this
0:06:37okay so i come back here
0:06:45so yeah and so they
0:06:47they can uh use uh L
0:06:49and they can yeah or or try to listen but
0:06:53you all the usual is they they just take the decision very quickly
0:06:58this as their results yeah you will have the other side and
0:07:01i i think that
0:07:02what's
0:07:03is very very thing is that
0:07:05uh are
0:07:06a some i think system is
0:07:08batter that the the decision we takes
0:07:11by you man
0:07:14and
0:07:15um are first question why as
0:07:17two because
0:07:19the the question is the human performance at that is is the very important things so
0:07:23uh a a first things that's we can have four and good information to know if the U we can
0:07:29cook have a confidence of the decision of human
0:07:32is to to see if they are agree and uh uh what's up and when they are we did they
0:07:38take
0:07:39the
0:07:40good decision or uh these they are wrong
0:07:43so
0:07:44you can see that
0:07:46here
0:07:48yeah
0:07:48we count
0:07:49no if they are if they agree it's not
0:07:53uh a good uh indication of the fact that's
0:07:57uh we can have confidence of that our on their decision because
0:08:01they
0:08:02do um
0:08:03yeah if you hear you have
0:08:05this this is the the good
0:08:07um that would then sir okay
0:08:09that the correct answer and the here are the the trials and so you can see that
0:08:15here
0:08:16that seems that they take that good did the correct decision but here
0:08:20on
0:08:20uh when they are working on a target as target
0:08:24uh
0:08:26we can know if it's good or not because here you have exactly the same proportion
0:08:31and and
0:08:33the confidence score as gave
0:08:35uh is not a a good indication to so you can't trust the people when they are as they say
0:08:41say okay i'm sure is that is the same that's not a good thing irrigation to say okay we can
0:08:46trust them
0:08:47so
0:08:48that's a problem
0:08:52and
0:08:53for the um
0:08:55the
0:08:56the um yeah we we are we have some discussion on the of the protocol of has or
0:09:02because the first one is first thing is that a listener
0:09:05uh have the the feelings that's it's was more and evaluation
0:09:09to no he they can come pence the cup to channels
0:09:13the and to evaluate the proximity of the voice for to speakers so but it's because it's just
0:09:19not that i as they'll days they ha
0:09:22uh uh usually do so
0:09:24yeah it's was
0:09:25difficult
0:09:26um
0:09:27yeah we it's thin only it's not it's
0:09:31for our summation mission is more a perceptive says as that an an acoustic and then is is because they
0:09:37don't use they just filter when they have very different channels and something like this but they don't use
0:09:44some
0:09:45part of the signal to know uh where is the end to take they decision it is just press the
0:09:51tips things
0:09:52and for the limitation of the protocol the question is
0:09:55is we have in a house to to dues that the T sense you know exactly what's happened
0:10:01and
0:10:01what is very important is that we can't
0:10:04randomized
0:10:05the the trials is that's means that yeah all the speaker
0:10:09um her in the same time
0:10:12the set the trails and it's
0:10:14clears is that
0:10:15you don't have the same attention when used charts
0:10:18and when it is the
0:10:20a hundred of trials you are listening so as that's important and and college is a we do always that's
0:10:27that's to two
0:10:28to have the
0:10:30to randomise the the the C
0:10:31so
0:10:33that's
0:10:33after is that we have a lot of question of of does this submission our first question was okay
0:10:39um what is the influence of the number of speakers because we have only needs tree speakers so what's up
0:10:45an if we increase the the number of speaker
0:10:47um and the uh what is the difference between experience and and not experience and listener these we have express
0:10:55sort of expert
0:10:57and what is the compliment charity T between the you men and the system decision
0:11:01because
0:11:02uh we just said made the decision of you men
0:11:05so which ends a little bit the the protocol from has or
0:11:08uh we have more listener
0:11:11search non experiments and ten experience listener
0:11:14we randomized because we have of all the trial so we randomised of trials
0:11:19and we balanced to the number of non-target and target
0:11:22uh because uh the first time the idea of this these an are are there okay i have to
0:11:28to it's
0:11:29so yeah it's a balanced so i will take
0:11:31them
0:11:32the
0:11:33there were point five of
0:11:35um my natural priori is result
0:11:37and so and
0:11:38for uh we only uh allows them to to listen
0:11:43one
0:11:44the trials and not to repeat the trials again and
0:11:47and so
0:11:48what are the result of it
0:11:51we have a only for non experience and listener that's for
0:11:56uh above chance level so if you take
0:11:59uh
0:11:59a occur on and here take it exactly the same thing for the majority of the listeners
0:12:04but
0:12:05what's is
0:12:06very interesting for us is that you have a very large gap of performance according to the the trials
0:12:13you have some trials where
0:12:15ninety percent of the listener are are
0:12:18core are right our core but give the good the correct answer
0:12:21and for all other trials you are only strip or or and of the listeners that gave uh the good
0:12:27answer so
0:12:29we don't find uh difference between the male and the female trials it's
0:12:33exactly same thing
0:12:35and
0:12:35we have sir different be if your of are we have
0:12:39some is no was say oh always yes yes yes it the same is the same and or there's that's
0:12:44are always thing no it's not the same as not the say
0:12:47so we all as that for for the from the the listeners
0:12:50and we find a correlation between the performance
0:12:54and the in level of the the listeners because here it's for
0:12:58and not not of uh in people so
0:13:02yeah we find that
0:13:03so
0:13:04the last question was the complementarity between the you men and the system and
0:13:09that's
0:13:10what we find is that's um for non-target trials
0:13:14the as be ham of
0:13:16uh
0:13:17a lot off correct answer and it's the only correct then some for the N the M and not for
0:13:22the you min
0:13:23but
0:13:23it's the contrary for you and we have a lots of uh a a big for per oh
0:13:28sorry
0:13:30a we have a be a big for version here
0:13:32uh of
0:13:34correct answer only for the you and so maybe we can find a compliment terry T
0:13:40and uh
0:13:43yeah um and the not yeah
0:13:45that's so
0:13:46and
0:13:47the after we have are known the experiments so then
0:13:50experience and listener
0:13:52and we don't to find difference on the performance for the non expert that and the experience a listener yeah
0:14:00it's exactly same thing you'll have
0:14:02the th
0:14:04so for the suggest and the the first or work is
0:14:08more question and all those things
0:14:11yes
0:14:12because the first
0:14:13question is how house the you men can help the system so
0:14:16um maybe you uh we have to eggs i mean the trials
0:14:20with the scores that are near the threshold of the system because we observed in the compliments are T that
0:14:26that is that
0:14:27it's is the that it is the them
0:14:30the trials
0:14:31where uh you man a right and uh system is wrong so
0:14:36maybe it's
0:14:37something thing we can do you
0:14:38and
0:14:39yeah the second question is okay
0:14:42i have some trials which are very easy and all their very difficult for you man what's are the different
0:14:48between
0:14:49so trials
0:14:50and
0:14:51it's its clear that it's important to rip gates this kind of experiments
0:14:55we have not to have listen at that i'm sure that joe or
0:14:58the next paper will answer to this question
0:15:01thank you
0:15:27you describe performance or with experience to of experienced was listeners
0:15:31uh how how someone of experience to image bruce was but the question
0:15:35and and experience that listener is the fun addition but
0:15:40who doesn't
0:15:41the fire and they
0:15:43don't to work on for and C E core they don't are uh interested in the the the speaker they
0:15:50they work on the language and uh so they
0:15:53they are not eight uh everyday day novels
0:15:56they are very yeah they they
0:16:00but it's not so experts a for and six something because
0:16:03in france we don't have
0:16:04this kind of people
0:16:05i
0:16:06yeah
0:16:16so this a lot of people saying just always a all makes me fine
0:16:21where it would be possible to how real human judge
0:16:24just give the mumbled string as might come from from list model
0:16:28and do post facto dollar bleep each room thing to of the pressure your bit of the best to perform
0:16:35to do do that but it we make them but that the problem is that for the first two D
0:16:39we have done we the three people the three people
0:16:42um we don't have a correlation be good uh correct answer and
0:16:47a confidence score
0:16:48which is good so we comes use
0:16:50the car the
0:16:52you see you we can't
0:16:53trust the the the the the the listener that are not it's not because they say i i'm sure is
0:16:58that it's the of my decision that's made is that's sim is that's
0:17:03sin if you K eight that
0:17:04they are right
0:17:06so
0:17:07it's difficult to have a liberation
0:17:09and to use this confidence score