0:00:14 | uh figure out much check |
---|
0:00:16 | um low |
---|
0:00:17 | uh i think you all very much for coming |
---|
0:00:19 | uh i was strongly encouraged to be brief in order to allow time for questions |
---|
0:00:24 | uh but if i a a like to begin by technology my file authors uh jack free george doddington |
---|
0:00:29 | and i one martin |
---|
0:00:31 | uh is well as this uh that has or participants |
---|
0:00:34 | a many of whom uh are in this room |
---|
0:00:36 | uh for there |
---|
0:00:38 | a a hard work and effort and conducting has a reason |
---|
0:00:43 | so the question or trying to address dresses |
---|
0:00:45 | how can you human experts effectively |
---|
0:00:47 | um |
---|
0:00:49 | uh you to lies automatic speaker recognition technology |
---|
0:00:52 | uh to our knowledge this is still an open question |
---|
0:00:55 | uh so we included a small pilot test in the twenty ten nist speaker recognition evaluation |
---|
0:01:02 | um |
---|
0:01:04 | that |
---|
0:01:05 | uh task and has determine whether two different speech segments were both spoken by the uh same speaker |
---|
0:01:11 | the has evaluation valuation included two test |
---|
0:01:14 | uh the first court has or one consisted of fifteen trials uh that is fifteen pairs of speech segments |
---|
0:01:19 | uh i and uh the second has or to consist of a hundred and fifty trials uh the first fifteen |
---|
0:01:24 | of which |
---|
0:01:25 | a where the has one trial |
---|
0:01:27 | has or systems could use human listeners uh or machines or both |
---|
0:01:31 | and anyone who wish to participate uh was welcome |
---|
0:01:37 | again uh each trial consisted of two speech segments |
---|
0:01:41 | uh and the task is to determine whether they were spoken by the same speaker |
---|
0:01:45 | uh there was no time limit on the amount of the scheme presented |
---|
0:01:48 | uh but it was required that trials be processed separately and independently one at a time and and C |
---|
0:01:55 | each trial |
---|
0:01:56 | each system provided that same speaker or see uh or different speaker decision |
---|
0:02:00 | uh as well as a numeric score |
---|
0:02:03 | where a higher score indicated greater confidence |
---|
0:02:05 | in a speaker |
---|
0:02:06 | a same speaker |
---|
0:02:09 | because of the limited number of trials the evaluate uh evaluation metric consisted of simply tallying the number of misses |
---|
0:02:15 | and false or more |
---|
0:02:17 | uh uh let me note that a miss is deciding the segments were spoken by different speakers were were spoken |
---|
0:02:22 | by the same speaker |
---|
0:02:23 | i and of false alarm is deciding segments are spoken by a uh the same speaker when in fact there |
---|
0:02:28 | were spoken by different |
---|
0:02:35 | uh do you to the limited number of trials it was necessary to select challenging segment errors |
---|
0:02:41 | uh in each case one of the segments was a three minute recording of an interview uh of of one |
---|
0:02:45 | of several different microphone |
---|
0:02:47 | uh and in the other |
---|
0:02:49 | uh uh the other segment was a five minute call recorded over a telephone channel |
---|
0:02:54 | for has or one segment pair similarity was determined using an automatic system uh and the most similar different speaker |
---|
0:03:01 | pairs |
---|
0:03:02 | uh were selected for |
---|
0:03:04 | uh different speaker trials and at least |
---|
0:03:06 | similar speaker segments uh are chosen for |
---|
0:03:09 | uh same speaker true |
---|
0:03:12 | he's pairs were then screen by human |
---|
0:03:14 | to select the most difficult trials more them eight any content cues |
---|
0:03:18 | a has or to a selected in the same way uh the only difference being the screen |
---|
0:03:23 | was |
---|
0:03:28 | alright right |
---|
0:03:29 | now that we know all about the hasr evaluation |
---|
0:03:31 | uh let's play a game |
---|
0:03:33 | it's called same speaker different speaker |
---|
0:03:35 | and it's played by listening to uh a a a speech segments and uh voting whether they were spoken by |
---|
0:03:41 | the same speak |
---|
0:03:46 | one |
---|
0:03:50 | a |
---|
0:03:56 | i |
---|
0:03:59 | i |
---|
0:04:00 | a |
---|
0:04:05 | i |
---|
0:04:06 | i |
---|
0:04:10 | i |
---|
0:04:12 | i |
---|
0:04:14 | i |
---|
0:04:15 | i |
---|
0:04:15 | i |
---|
0:04:16 | i |
---|
0:04:17 | i |
---|
0:04:18 | i |
---|
0:04:19 | i i |
---|
0:04:23 | okay how many people believe was the same speaker |
---|
0:04:27 | K a how many different speakers |
---|
0:04:29 | okay overwhelmingly same but some different |
---|
0:04:32 | okay and and the second row |
---|
0:04:35 | i |
---|
0:04:38 | i |
---|
0:04:39 | i |
---|
0:04:40 | i |
---|
0:04:45 | i |
---|
0:04:48 | i |
---|
0:04:53 | i |
---|
0:04:55 | i |
---|
0:04:57 | i |
---|
0:04:57 | i i |
---|
0:04:58 | i |
---|
0:04:59 | i |
---|
0:04:59 | i |
---|
0:05:01 | i |
---|
0:05:03 | i |
---|
0:05:04 | i |
---|
0:05:06 | i |
---|
0:05:08 | i |
---|
0:05:09 | i |
---|
0:05:11 | i |
---|
0:05:12 | i |
---|
0:05:12 | i |
---|
0:05:14 | i |
---|
0:05:14 | i |
---|
0:05:16 | all right how many people think same speaker |
---|
0:05:19 | uh just a couple |
---|
0:05:20 | uh i i |
---|
0:05:21 | how many uh how many different speaker |
---|
0:05:24 | um well |
---|
0:05:27 | you |
---|
0:05:28 | there's a set of a little differently yeah but you may be surprised to learn that the first one was |
---|
0:05:31 | different speaker |
---|
0:05:33 | and the second was same speaker |
---|
0:05:35 | um |
---|
0:05:37 | yeah it's true it's absolutely true |
---|
0:05:39 | and let us know that these were the trials and has or one that had the most missus and false |
---|
0:05:42 | alarm |
---|
0:05:54 | okay so let's see how that has or uh one systems did |
---|
0:05:57 | uh on the top or same-speaker trials and on the bottom different speaker trials |
---|
0:06:02 | uh there were twenty systems that participated from fifteen sites uh in six different countries |
---|
0:06:07 | uh the green portion of the bars represents correct decisions |
---|
0:06:11 | the blue misses |
---|
0:06:12 | and the red false alarms |
---|
0:06:14 | uh as we look from left to right we sea trials increase in uh an increasing difficulty of for the |
---|
0:06:19 | systems |
---|
0:06:20 | yeah and we just listen to |
---|
0:06:23 | uh this trial and |
---|
0:06:24 | and the strong |
---|
0:06:31 | uh here we see individual system performance uh a on the hasr one trials |
---|
0:06:36 | a each bar represents the total number of errors divided by the total number of trials uh that's fifty in |
---|
0:06:42 | this case |
---|
0:06:43 | a again blue indicates misses and read false alarms |
---|
0:06:46 | uh this system with the fewest |
---|
0:06:48 | errors |
---|
0:06:49 | uh i had to as and no false alarms |
---|
0:06:52 | and the system with the most |
---|
0:06:54 | had four missus and uh seven four |
---|
0:07:07 | i |
---|
0:07:09 | okay um here we consider the performance of uh uh uh was uh from the sites that participated in hasr |
---|
0:07:16 | one and hasr two |
---|
0:07:18 | uh the bar on the left for each system repair uh represents uh has or one trials and the on |
---|
0:07:23 | the right uh represents uh errors has a two trials |
---|
0:07:27 | uh sorry |
---|
0:07:28 | left uh has or one |
---|
0:07:31 | and then right |
---|
0:07:32 | as or two |
---|
0:07:34 | uh again blues misses and and are there are false alarms |
---|
0:07:37 | and a on average |
---|
0:07:40 | um |
---|
0:07:42 | the has or one uh |
---|
0:07:43 | prove more challenging uh then has a two trials |
---|
0:07:47 | no if you took your time and carefully read the fine print |
---|
0:07:51 | of this or G of a a uh read ten evaluation plan |
---|
0:07:54 | uh you would discover that we embedded in the automatic uh uh system evaluation the the hasr trials |
---|
0:08:00 | uh i to uh see how the automatic systems to |
---|
0:08:05 | so we when we look at the uh three leading systems in the main evaluation and and look of they |
---|
0:08:10 | did on the |
---|
0:08:12 | uh a has or trials |
---|
0:08:13 | um this is what we see here on the right |
---|
0:08:16 | i think we should note on this uh uh is that |
---|
0:08:19 | the actual decisions |
---|
0:08:21 | are being displayed here for the hasr systems |
---|
0:08:23 | uh but we were not able to do that for the um |
---|
0:08:26 | automatic systems uh due to a thousand to one different speaker the same speaker prior probability uh a given in |
---|
0:08:33 | the main evaluation |
---|
0:08:34 | uh so we |
---|
0:08:37 | uh a the decision threshold |
---|
0:08:39 | uh of the automatic system so as to produce equal counts of misses and false more |
---|
0:08:48 | uh so we saw that leading automatic systems had noticeably fewer errors than the has or systems uh and the |
---|
0:08:54 | tests proved quite challenging |
---|
0:08:57 | i |
---|
0:08:58 | in fact uh have the systems got more trials right them long and has are one |
---|
0:09:04 | yes thank you |
---|
0:09:05 | i |
---|
0:09:07 | um |
---|
0:09:08 | so uh we leave you um with |
---|
0:09:11 | uh a couple questions |
---|
0:09:13 | uh first was this data appropriate for support in has a research |
---|
0:09:17 | um and where do we go from here |
---|
0:09:22 | we are planning in another has or evaluation to be held in conjunction with that you twelve |
---|
0:09:27 | uh we expect there be two test |
---|
0:09:29 | uh of the first row twenty trials and the second with two hundred |
---|
0:09:33 | and the trial selection process is plan to be similar as and has or ten |
---|
0:09:37 | uh but hopefully with less human screen |
---|
0:09:40 | uh the data will uh still be in english only |
---|
0:09:43 | uh and the evaluation period is plain to be form months |
---|
0:09:46 | uh which is three much longer than the automatic system evaluation is typically |
---|
0:09:52 | um we or you are for your feedback |
---|
0:09:55 | uh so please E or |
---|
0:09:56 | or speak with this |
---|
0:09:58 | um |
---|
0:10:00 | i should note that statistical significance is of great importance |
---|
0:10:04 | to nist |
---|
0:10:05 | so if you interest to us |
---|
0:10:07 | uh but with so few trials unowned can be assigned |
---|
0:10:10 | uh to these result |
---|
0:10:12 | uh we are also interested in ideas on how to improve uh the channel selection process so again please |
---|
0:10:18 | uh |
---|
0:10:18 | sure with us |
---|
0:10:20 | uh for more information uh we're to provide feedback |
---|
0:10:23 | um you're some websites or speak with us |
---|
0:10:26 | uh you know is on the paper |
---|
0:10:28 | very much |
---|
0:10:35 | so for questions please come to the mike |
---|
0:10:49 | right |
---|
0:11:07 | okay i would like to have more explanation i can |
---|
0:11:11 | and uh the proximity had difficulty and that approximately optimized how you |
---|
0:11:17 | next year it is proximity |
---|
0:11:19 | exactly sure um |
---|
0:11:21 | well |
---|
0:11:29 | so uh we ran a full matrix of |
---|
0:11:33 | uh uh |
---|
0:11:35 | um uh |
---|
0:11:36 | interview train interview test on target trials of all speaker pairs |
---|
0:11:40 | uh the three seven speaker pairs uh were identified |
---|
0:11:43 | uh using a threshold of |
---|
0:11:45 | six scores where the idea was |
---|
0:11:47 | uh |
---|
0:11:48 | the score was included if the scores including the top one percent of |
---|
0:11:51 | um |
---|
0:11:53 | scores in the direction |
---|
0:11:55 | so |
---|
0:11:55 | of those thirty seven acres were chosen and then |
---|
0:11:58 | um combinations of segments for each speaker pair |
---|
0:12:01 | um listen to |
---|
0:12:03 | to determine which would be used for |
---|
0:12:05 | one |
---|
0:12:06 | uh for non-target |
---|
0:12:07 | there's four |
---|
0:12:08 | uh a target roles |
---|
0:12:10 | or same speaker true |
---|
0:12:12 | um |
---|
0:12:13 | uh we did a a full matrix |
---|
0:12:15 | uh of the actual sect |
---|
0:12:18 | a and then this to the sec |
---|
0:12:19 | errors |
---|
0:12:20 | that way |
---|
0:12:20 | and that was for has or one for as a two uh that was |
---|
0:12:23 | very care |
---|
0:12:24 | uh screen |
---|
0:12:25 | a process was similar just with a a a a large |
---|
0:12:35 | i i quick what was |
---|
0:12:36 | the percentage of non sing have two data |
---|
0:12:40 | uh uh uh uh just a non non-native |
---|
0:12:43 | have that you what i |
---|
0:12:45 | it |
---|
0:12:46 | present present of non-native speakers in the hash is some people who were not native us english speakers |
---|
0:12:52 | um let's see |
---|
0:12:54 | um |
---|
0:12:55 | something in one |
---|
0:12:57 | two |
---|
0:12:59 | um |
---|
0:13:04 | uh i'm thinking of two |
---|
0:13:07 | right |
---|
0:13:12 | oh |
---|
0:13:13 | three |
---|
0:13:14 | oh |
---|
0:13:17 | oh i'm sorry misunderstood |
---|
0:13:19 | or or or or maybe a was source are you're asking are you asking for the trials are for the |
---|
0:13:22 | participant |
---|
0:13:24 | oh i'm sorry yeah i |
---|
0:13:26 | yes |
---|
0:13:30 | uh i do not know that off and but that something we can uh find that with so for port |
---|
0:13:34 | them i will note that everyone who uh was recorded was reported in philadelphia |
---|
0:13:39 | uh but that's of for a leader national city so i |
---|
0:13:49 | i believe that's correct but uh sometimes |
---|
0:13:54 | i |
---|
0:13:57 | i |
---|
0:14:02 | yes |
---|
0:14:26 | give a there's another question |
---|
0:14:32 | well what was the gender breakdown to you specifically select for uh could divide or did you |
---|
0:14:38 | choose based upon |
---|
0:14:39 | a challenge in the past but for |
---|
0:14:42 | uh sure a get i don't have the gender breakdown handy but this was a um |
---|
0:14:47 | um |
---|
0:14:48 | this just fill out we did not try to uh about this but a whole trials |
---|
0:14:52 | also of course but all trials were |
---|
0:14:54 | a same sex |
---|
0:14:56 | true |
---|
0:14:59 | that you very much |
---|