Speech Transcript - The 2009 NIST Language Recognition Evaluation

0:00:06	yeah
0:00:08	i
0:00:09	well
0:00:10	describe the
0:00:12	two thousand and nine
0:00:14	on this
0:00:15	language
0:00:16	recognition evaluation
0:00:18	lre O nine
0:00:20	um
0:00:21	this
0:00:23	uh
0:00:24	discordant in this evaluation now
0:00:27	and your
0:00:28	U S
0:00:28	government sponsorship and
0:00:30	this work was largely done with
0:00:32	great greenberg like owing
0:00:34	in them this yeah
0:00:35	multimodal
0:00:36	information group
0:00:42	so the two thousand nine
0:00:44	evaluation
0:00:46	was the fit in the
0:00:48	series of
0:00:50	this coordinated lre is the first was in ninety six
0:00:54	and everything yeah
0:00:56	and we're evaluations in two thousand
0:00:59	three and two thousand
0:01:00	five
0:01:01	two thousand seven
0:01:03	two thousand nine
0:01:05	uh oh
0:01:05	one might suspect that the
0:01:07	could be another evaluation
0:01:09	twenty eleven
0:01:10	um
0:01:12	trying to to that in nine
0:01:15	he changes
0:01:16	uh we're in the
0:01:18	nature of the data
0:01:19	say more about that
0:01:21	the treatment of dialogue
0:01:22	dialect
0:01:23	mutually intelligible languages
0:01:26	and in the
0:01:27	set of evaluation test condition
0:01:29	we will get to those
0:01:30	um the data the
0:01:33	oh oreo nine or
0:01:35	indicated there there were
0:01:36	eighteen total
0:01:38	participating sites
0:01:42	um
0:01:43	the prior
0:01:44	nist evaluations
0:01:47	used conversational telephone speech
0:01:50	this involved
0:01:51	paying subjects
0:01:52	yeah
0:01:53	they call it
0:01:54	nature
0:01:55	language recognition you just wanna make a single call
0:01:58	in their native language
0:02:00	ah
0:02:01	in the U S preferably control channel conditions
0:02:04	this
0:02:06	paradigm is becoming expensive and impractical
0:02:09	it's hard to pay people to make single call these days
0:02:12	talk to me
0:02:14	um
0:02:14	um
0:02:15	helpful
0:02:17	um
0:02:18	access
0:02:19	is easy
0:02:21	so lre O nine
0:02:22	attempted to use primarily
0:02:24	down data
0:02:26	in this case
0:02:27	down
0:02:28	uh
0:02:29	from what voice of america
0:02:31	right yes
0:02:33	data
0:02:34	um
0:02:35	this was this
0:02:36	sampled by the
0:02:37	with the
0:02:38	uh data consortium actually
0:02:40	found data from
0:02:42	three different yours of
0:02:44	um
0:02:46	voice american to where you started about data
0:02:49	the L D C S
0:02:51	other conferences separately reported on this
0:02:53	data collection
0:02:54	the a feasibility study of using this data for lre
0:02:59	was done before and then
0:03:01	done in the by
0:03:02	researchers
0:03:04	uh
0:03:04	here at the brno university of
0:03:07	of of technology
0:03:08	uh that was a key part
0:03:10	a lot in the data for this evaluation
0:03:13	um
0:03:13	the selected
0:03:14	segments that were
0:03:15	actually
0:03:16	used for testing also does it for
0:03:18	development work
0:03:20	uh segments
0:03:21	uh
0:03:22	determined by wanting to be involved narrow bands
0:03:24	speech
0:03:25	and we want to get as many different speakers as possible
0:03:28	um
0:03:29	evaluation also use
0:03:30	cts data that had been collected previously but for various reasons
0:03:34	and not been used in the uh
0:03:36	in the prior evaluation
0:03:41	so
0:03:43	um
0:03:45	here is our list of target languages for this evaluation and then using found data is that
0:03:51	have
0:03:52	um
0:03:54	more
0:03:55	target languages
0:03:56	indeed we had that
0:03:59	twenty three
0:04:00	in this case
0:04:01	ah
0:04:02	in some cases we just list it is languages quite that would trees have been created is
0:04:07	dialect
0:04:08	um
0:04:09	english american and in american english and indian english or
0:04:13	also uh
0:04:14	indian
0:04:16	and working so this one says we just do these into it
0:04:20	a single part languages will talk about the language here condition
0:04:25	um
0:04:27	any D
0:04:29	we specified eight
0:04:32	um
0:04:33	language pairs as being a
0:04:34	particular interest
0:04:36	um
0:04:37	they either languages that are
0:04:39	similar patient
0:04:40	english dialects uh
0:04:43	indian
0:04:44	or do may be viewed as a dialect station
0:04:47	other languages are
0:04:49	many cases mutually intelligible post processing croatian
0:04:53	a real
0:04:54	haitian and french are of interest
0:04:57	uh
0:04:57	include such pairs
0:04:59	it's cantonese mandarin spanish
0:05:01	portuguese
0:05:02	so we specify these as
0:05:04	these eight as being the
0:05:06	of particular interest for those who
0:05:09	wanted to investigate um
0:05:13	uh
0:05:14	so the
0:05:16	evaluation
0:05:18	ah consist of a long
0:05:20	series of trials for each of the
0:05:23	in addition
0:05:24	and as in the past
0:05:25	we
0:05:26	charlie's
0:05:27	test segment
0:05:29	our approximately thirty or approximately ten or parts
0:05:31	really
0:05:32	a three seconds of speech
0:05:36	uh i for each trial
0:05:39	you have
0:05:40	a target language hypothesis
0:05:43	and
0:05:44	and alternative
0:05:45	i thought this
0:05:48	and for each
0:05:49	ah
0:05:50	a trial we require
0:05:52	i passed a decision
0:05:54	and the score
0:05:56	yeah we
0:05:57	specify three
0:05:58	different
0:05:59	has conditions this year
0:06:01	the close second edition this is the
0:06:04	traditional condition it
0:06:06	but part of all the evaluation is required condition
0:06:09	and this reach
0:06:11	language segment
0:06:14	oh you have a one of the target languages
0:06:17	as i thought
0:06:18	each segment is running
0:06:19	really
0:06:20	target is a part
0:06:21	the alternative hypothesis is
0:06:23	it's a different target language
0:06:25	one of the other twenty two
0:06:27	the open second edition
0:06:29	the alternative i thought this is
0:06:32	not simply that
0:06:33	one of those twenty two languages it could be that they could also be some other language an unknown how
0:06:38	does that language
0:06:40	and finally we introduce the C of the language here condition
0:06:44	which is designed to look at
0:06:46	ah
0:06:47	i just distinguishing here so that the
0:06:50	i have this in all cases a single
0:06:53	line you know
0:06:55	target languages english the alternative
0:06:57	uh is that it
0:06:59	it's french
0:07:00	um
0:07:01	ah so
0:07:02	there are two and we twenty three target languages there are two hundred fifty three pairs and
0:07:07	a part of language you want to look at this way and
0:07:10	systems were invited to do
0:07:11	all of them
0:07:13	only a couple chose to do so
0:07:14	or selected ones in particular the
0:07:17	eight a
0:07:18	mentioned above
0:07:23	uh this gives you some
0:07:24	indication of the
0:07:27	um
0:07:27	training and
0:07:28	test segments
0:07:29	that will provide in there
0:07:31	there's a
0:07:33	source so
0:07:34	a green
0:07:35	language it
0:07:37	it indicating the number of segments and between segments of each duration
0:07:41	uh
0:07:43	um they were providing it'd be a weight training or be away
0:07:48	ah yes
0:07:49	where cts that that all the
0:07:51	cts data from previous evaluations where
0:07:54	we're also available
0:07:55	um
0:07:56	and the B Y training we provided
0:07:58	you know we provided lots of data and not just limited to these selected segments but
0:08:03	oh
0:08:04	a corporate move around
0:08:06	terabytes
0:08:07	uh
0:08:08	a drive the route
0:08:10	but we're distributed people but
0:08:12	rich language we haven't had about two hundred uh
0:08:15	from the really data
0:08:16	a segment of each
0:08:18	duration separated
0:08:19	but that S yeah
0:08:21	we had open
0:08:22	three or four hundred
0:08:23	alright
0:08:25	quite depending on availability
0:08:26	and
0:08:27	we we had
0:08:28	training in languages which
0:08:32	i i'm a
0:08:34	and that we had with training data
0:08:36	are all the languages for which i was not
0:08:38	ah
0:08:40	previous cts data in many languages that relevant data with cts but the new
0:08:45	you data could be the other way
0:08:47	um
0:08:48	so
0:08:49	that's
0:08:52	numbers are there eighteen side
0:08:54	they're listed here are many lamar
0:08:56	represented in
0:08:57	this room
0:09:00	evaluation metric with the
0:09:02	traditional metric yeah
0:09:04	we have used a
0:09:05	is essentially something like yeah
0:09:07	total error rate
0:09:09	we
0:09:10	equally weight
0:09:11	a lot of miss the cost of false alarm
0:09:15	take an average of miss rate the false alarm rate but we
0:09:18	average that over all possible
0:09:21	oh
0:09:21	uh target languages all possible alternative languages
0:09:25	and they ended
0:09:26	uh
0:09:27	computed this way
0:09:28	there's also waiting indicator for the open second edition
0:09:31	of how we wait that
0:09:33	the outer set alternative to the
0:09:35	for the
0:09:36	are actual target languages
0:09:40	so it's turn
0:09:42	results so
0:09:44	terms of the official metric
0:09:46	uh these are the results
0:09:47	four systems are
0:09:49	the average scores uh
0:09:51	the close any open set
0:09:53	in addition
0:09:54	uh the scores are cumulative so that the three seconds or
0:09:59	is the total of the green and the yellow and there
0:10:01	red bar
0:10:04	oh
0:10:05	opens
0:10:05	it's close to laugh opens another right we have
0:10:08	labels
0:10:09	oh some systems indicate yeah
0:10:12	the same system close at an open set
0:10:14	traditionally we have not identified
0:10:17	systems with their scores are
0:10:20	ah
0:10:21	in public presentations but you can
0:10:23	uh they're open close
0:10:25	i was in languages and you know it
0:10:27	you know
0:10:27	it's really
0:10:28	three seconds or ten seconds the three seconds that takes the
0:10:32	big performance
0:10:32	yeah
0:10:33	it
0:10:33	close in all three
0:10:35	his clothes and language here is a
0:10:38	oh but
0:10:39	two sides
0:10:40	be
0:10:41	yeah yeah
0:10:43	uh language we wouldn't see
0:10:45	the relatively
0:10:47	uh
0:10:48	good performance as you might expect on
0:10:50	and language pairs
0:10:55	and we traditionally put these on
0:10:57	yeah what
0:10:58	uh
0:11:00	there are that part with the
0:11:02	close to have them alive we have the various uh
0:11:06	that's another
0:11:07	thirty second
0:11:08	uh and of the right for once
0:11:10	we give a flavour that
0:11:11	different or in thirty seconds
0:11:13	and second
0:11:14	three seconds the
0:11:16	linearity of the
0:11:17	most of what
0:11:18	uh
0:11:19	suggest underlying
0:11:20	normal distributions
0:11:22	uh
0:11:24	it was open set and you can
0:11:26	see that
0:11:27	problem you taking going
0:11:29	what was that the
0:11:30	open so that uh
0:11:34	oh there we
0:11:35	on the right but up of the
0:11:37	close that an open set for each of the
0:11:39	a three durations are
0:11:41	uh give you
0:11:43	a sense there
0:11:49	findings an analysis
0:11:53	um
0:11:54	yeah
0:11:55	and i will talk about the effect of
0:11:57	averaging
0:11:58	in that while the other terms
0:12:00	pulling back at work
0:12:02	moving away from the term
0:12:03	cool
0:12:04	we had a long discussion at the workshop is it right to
0:12:09	average
0:12:09	get
0:12:10	across multiple we have the same data multi try out the multiple languages
0:12:15	and we
0:12:16	then resolve that
0:12:17	what with all that but
0:12:18	see that
0:12:20	funny thing that happened in particular for the
0:12:22	and here is is is
0:12:24	two systems were right then
0:12:26	ukrainian
0:12:29	ah
0:12:30	uh
0:12:31	so the regions where they
0:12:33	cranium language type uh
0:12:34	that's in the
0:12:35	lou the russian language that uh this
0:12:38	and these
0:12:39	yeah
0:12:40	inherently a symmetry
0:12:43	uh
0:12:44	uh between these these cars 'cause
0:12:46	this is the page that
0:12:47	i think the only possibility does it
0:12:49	russian or ukrainian
0:12:51	and if you
0:12:52	average those pulling together
0:12:55	what happens or system on the combined curve and black
0:12:58	all right through the middle
0:12:59	that's what you
0:13:01	expect random one
0:13:03	system too
0:13:06	the
0:13:06	binder
0:13:09	where uh
0:13:11	uh i mean
0:13:12	lester combined performance
0:13:14	one is that um
0:13:18	uh we show the
0:13:20	distributions
0:13:21	uh on the road
0:13:23	um the rhino records for the two languages and then
0:13:26	different shapes
0:13:27	and uh another thing to note
0:13:29	is the
0:13:30	choruses
0:13:31	show
0:13:33	the actual decision points the circles we
0:13:36	a minimum
0:13:38	the average
0:13:38	point
0:13:40	and
0:13:41	the first
0:13:42	system on
0:13:43	they're right on top of one another in the middle of
0:13:46	but
0:13:47	calibration
0:13:50	two
0:13:51	the
0:13:52	right there
0:13:53	way
0:13:54	at the extremes
0:13:56	in the case and uh with the
0:13:57	sort of the middle indicating it indicating for calibration
0:14:01	combine them
0:14:02	but
0:14:03	hello
0:14:04	to it
0:14:04	i is what you see
0:14:07	um
0:14:08	so as i said their questions
0:14:10	is it the right thing to
0:14:12	average
0:14:13	across languages
0:14:14	um
0:14:15	we have done so
0:14:18	if you look at language pairs
0:14:22	uh this is for one system
0:14:24	one of the system that all the language pairs
0:14:26	are we look
0:14:27	at
0:14:28	george dunning created
0:14:29	the curve i believe
0:14:31	ah
0:14:31	this looked at
0:14:32	although there isn't shows the ones that have the
0:14:34	why
0:14:35	that
0:14:36	um
0:14:37	average error rate
0:14:39	um
0:14:40	so all the others were
0:14:42	low two percent
0:14:43	ah
0:14:45	most confusable up of the top word
0:14:48	in the or do
0:14:49	and by then
0:14:51	croatian
0:14:52	um
0:14:53	these were among the black
0:14:55	pairs of interest in
0:14:56	uh you know these are certainly mutually intelligible they may be considered dialect
0:15:01	and indeed
0:15:03	oh yeah
0:15:04	at least arguable that
0:15:06	he
0:15:06	these
0:15:07	language or dialect distinctions are based
0:15:09	but also and political
0:15:11	boundaries are
0:15:13	are rather than um
0:15:17	then uh more inherent language patterns
0:15:19	any case those two of the most confusable
0:15:22	next one for russian ukrainian
0:15:24	the
0:15:24	english
0:15:26	dialect
0:15:26	and
0:15:27	a dari farsi which are
0:15:29	generally considered
0:15:31	usually
0:15:32	palatable given to you
0:15:33	you are there is a god in there
0:15:35	real french and
0:15:36	is
0:15:37	is uh
0:15:39	in the list
0:15:39	um
0:15:42	uh when we
0:15:42	several of them
0:15:44	no
0:15:46	a little
0:15:47	list of leading
0:15:48	one
0:15:49	two that were in our that's the
0:15:51	pairs of interest
0:15:52	yeah nice and mandarin
0:15:54	portuguese and spanish
0:15:55	maybe certain
0:15:57	different ways
0:15:58	languages that might be regarded a similar effect
0:16:00	um
0:16:02	maybe aren't in at least
0:16:03	for the
0:16:05	a system involve or not
0:16:07	all that hard
0:16:08	distinguish
0:16:12	all that we can look at
0:16:13	uh
0:16:14	the terms were in the right
0:16:16	to a particular target languages towards the
0:16:20	if you of everything price
0:16:21	languages here we do so looking at the training corpus
0:16:25	type
0:16:25	the
0:16:26	they show the various
0:16:27	languages for the
0:16:29	that he had a training on the
0:16:32	be away data
0:16:33	and then we look at the ones that training on
0:16:35	cts data
0:16:37	um
0:16:37	you see kind of a movement
0:16:39	how would be either way
0:16:41	ah yes
0:16:42	performance
0:16:44	was on
0:16:45	one two Q is that we're languages
0:16:48	um
0:16:49	but uh
0:16:50	done previously among many cases the training the cts and the
0:16:53	yeah but
0:16:54	realigned unless spanish korean
0:16:57	mandarin
0:16:58	for example were among the best performing languages
0:17:00	worst performing or several
0:17:02	indian languages i mean other confusions there in the
0:17:05	or do indian english
0:17:12	oh yeah we look at performance by
0:17:15	but the
0:17:16	what was it
0:17:17	test corpus whether it be away or cts
0:17:20	thirty hand and three
0:17:22	um
0:17:25	and
0:17:25	one thing we were
0:17:26	sorry please with
0:17:27	you know we just introduced
0:17:29	using the only data
0:17:30	you know with the
0:17:32	we we recognise well in fact
0:17:34	the overall performance was probably comparable
0:17:38	um
0:17:39	this even though for some of the V O A languages that are
0:17:42	training with cts
0:17:43	four
0:17:44	some reason i don't know we know why
0:17:46	the uh
0:17:47	cts
0:17:48	curves here appear less linear
0:17:53	and some history
0:17:56	so we like to
0:17:58	but back
0:17:59	over the course of several evaluation
0:18:01	how things change
0:18:03	are we seeing better performance there have yet
0:18:05	that that
0:18:06	ah
0:18:08	okay we have occurs over there
0:18:09	evaluation use of the numbers of target languages
0:18:12	go on
0:18:14	up in recent evaluation
0:18:16	number of participants will open up in them too much recent evaluations but we're
0:18:20	yeah slightly into the nineteen thirty seven seven wonderful
0:18:23	hereby try to
0:18:25	uh
0:18:26	you're simply blah
0:18:27	and with an increasing number
0:18:30	of um
0:18:31	out of seven languages
0:18:37	as for the basic
0:18:38	one of the major
0:18:40	um
0:18:43	for thirty seconds
0:18:44	with that
0:18:46	nice
0:18:46	uh
0:18:48	right and uh
0:18:50	you know
0:18:50	garcia good
0:18:51	data exchange languages it
0:18:53	type change that but
0:18:54	are we think uh improved results for
0:18:57	three second
0:18:58	four
0:18:59	every second for the past
0:19:01	couple evaluations are
0:19:03	we seem to
0:19:04	yeah
0:19:04	have but a
0:19:06	i'm terms of the
0:19:07	the system
0:19:08	also noted this year's three second performance was at the level
0:19:12	thirty second performance
0:19:14	in nineteen ninety six
0:19:19	oh here we
0:19:19	do some history looking at the best system
0:19:22	you know caviar differences reflect
0:19:25	well
0:19:25	systems
0:19:26	and
0:19:27	someone changes in the task definition and of course
0:19:29	different data in it
0:19:31	hard to sort those out of it
0:19:33	a different vol
0:19:34	no less
0:19:35	what can we say about how well
0:19:38	romances
0:19:39	there
0:19:40	ah
0:19:42	um
0:19:43	i think we hinted that before but
0:19:45	three seconds um
0:19:48	we see a
0:19:49	it was lacking
0:19:50	oh nine wounded or seven media
0:19:52	anything ewing performance improvement but
0:19:55	in the
0:19:56	there can second bite out in the
0:19:59	thirty second maybe we
0:20:01	right progress
0:20:02	a bit
0:20:08	oh really
0:20:09	look at a couple of individual languages
0:20:11	uh
0:20:12	that's for sure
0:20:13	tend to do the same language uh
0:20:15	oh nine
0:20:16	O seven in the
0:20:17	of the 'cause O nine minutes of seven and the colours are one of the three durations
0:20:22	and here
0:20:24	to kind of language in which they were we have language
0:20:26	pair since
0:20:27	for korean
0:20:29	oh
0:20:31	we haven't seen improvements
0:20:32	throughout
0:20:33	right
0:20:34	but
0:20:35	the recycling three in two thousand nine is
0:20:38	uh
0:20:38	perfect the results are are
0:20:41	ah
0:20:42	languages
0:20:43	part is one
0:20:44	we see the overall having the
0:20:46	we sing for the evaluation the whole
0:20:48	ah
0:20:49	improvement at three seconds uh
0:20:51	a little change or even ridge regression
0:20:54	thirty five
0:20:55	and of course there are going to do that or
0:20:57	new this year
0:20:59	as well
0:21:04	oh
0:21:04	also here
0:21:06	but dialect kind of has to be done previously to that
0:21:10	american english and
0:21:11	indian english uh
0:21:14	uh
0:21:15	that and we
0:21:16	do see improvement like two thousand nine
0:21:19	which is that the minutes
0:21:21	thirty seconds
0:21:24	and second
0:21:27	and even more
0:21:28	uh
0:21:34	uh
0:21:35	predicament
0:21:37	a big there's three seconds
0:21:38	american indian english
0:21:42	and
0:21:42	going to
0:21:44	in the or do
0:21:46	do you
0:21:47	known to be a challenging language here
0:21:49	but we see improvement thirty seconds
0:21:52	three seconds
0:21:55	yeah
0:21:55	there's ten seconds
0:21:59	oh
0:21:59	and wait
0:22:00	a three seconds
0:22:01	ah
0:22:02	three seconds
0:22:03	well maybe this improvement
0:22:04	yeah
0:22:05	but have it
0:22:06	three seconds in the order was that
0:22:09	or too hard
0:22:10	comparison
0:22:11	performance little better than
0:22:13	and random
0:22:16	your words in summary
0:22:19	are we experiment with a new
0:22:21	data collection paradigm
0:22:23	and we're reasonably satisfied with that producing a
0:22:26	and effective evaluation get berkeley
0:22:29	have trouble performance
0:22:31	repeating this trick when the right data for future evaluations that remains a challenge
0:22:36	uh we shall continue performance improvement
0:22:39	uh of having a son
0:22:41	a real nice based on the
0:22:43	shorter segments
0:22:46	um
0:22:48	for both coding open say condition
0:22:51	a language
0:22:52	pairs was introduced
0:22:54	here in particular for marketers it
0:22:56	relative interesting poses challenges more likely
0:22:59	you part of any
0:23:00	in in
0:23:00	if your evaluation that we do
0:23:03	um
0:23:03	this story
0:23:05	an issue we've argued about about
0:23:07	whether used actors average cross language
0:23:10	and i think that yeah
0:23:11	uh includes might
0:23:13	right off
0:23:14	thank you
0:23:21	and
0:23:22	information
0:23:30	this is
0:23:30	just
0:23:31	a common
0:23:32	on the
0:23:33	comparing
0:23:34	uh
0:23:34	she happens
0:23:36	tween
0:23:37	uh
0:23:37	that's done
0:23:38	nist evaluations yes with the
0:23:40	number
0:23:41	target languages
0:23:42	uh
0:23:43	that's
0:23:45	uh
0:23:46	it
0:23:47	uh
0:23:48	they're more languages than the weight vanished that's it
0:23:52	the hypothesis
0:23:55	mostly
0:23:57	so
0:23:58	there are more languages
0:24:01	uh
0:24:01	you know list
0:24:02	about five months
0:24:04	'cause you know
0:24:05	you less sure about which one
0:24:07	to be
0:24:08	so
0:24:09	um
0:24:10	that makes it a little bit on that
0:24:13	it makes it a little bit harder
0:24:14	just one
0:24:16	makes it a little bit
0:24:17	not not a lot
0:24:19	if you were doing just fine
0:24:21	identification
0:24:22	obviously
0:24:23	the number of languages as a strong stick
0:24:26	second autistic
0:24:28	which
0:24:29	which don't have
0:24:31	so
0:24:32	arguably if we just
0:24:33	apparently tread water but it made the problem are doing
0:24:36	introduce language people haven't seen before
0:24:38	i'd argue that
0:24:39	that
0:24:40	it'd be apart
0:24:41	it's also predicate argument for the
0:24:43	language pairs condition which
0:24:45	well
0:24:47	so
0:24:47	tenderly
0:24:48	i think that affect yeah
0:24:54	we should
0:24:57	you plan to
0:24:58	to use it
0:24:59	voice of america
0:25:01	uh it uh for the nist evaluation
0:25:04	oh
0:25:06	there
0:25:07	other than your right
0:25:08	he
0:25:08	it we need to discuss this with i don't
0:25:11	think
0:25:12	we can hope thing
0:25:13	just get more voice of america data we're
0:25:16	exploring
0:25:17	um
0:25:19	other
0:25:20	similar type
0:25:21	or or that may be available that have multiple languages
0:25:25	um are there any
0:25:26	recommendation that people with them
0:25:34	yep
0:26:02	uh i'm honestly wondering why
0:26:04	four
0:26:05	uh
0:26:06	identification
0:26:08	oh
0:26:08	just
0:26:09	sure
0:26:10	so make or break
0:26:11	two
0:26:12	cation
0:26:13	four
0:26:14	i mean
0:26:14	to do that
0:26:15	uh
0:26:16	you should
0:26:16	and i'm i'm
0:26:18	and you are using uh
0:26:19	uh that it
0:26:20	editions
0:26:21	oh
0:26:23	um
0:26:24	i would like to do i need to find a direct
0:26:27	just
0:26:27	hmmm
0:26:28	you know
0:26:29	to
0:26:30	yeah
0:26:31	see
0:26:31	oh
0:26:32	and identification
0:26:34	and i wonder why
0:26:35	um
0:26:36	you you could try your interesting identification with
0:26:40	recognition
0:26:42	because
0:26:42	it's a
0:26:44	whatever
0:26:44	this
0:26:46	if you use
0:26:47	you
0:26:48	correlation
0:26:48	we thank you
0:26:50	right but
0:26:51	there's no
0:26:52	no
0:26:53	you you can
0:26:54	some
0:26:55	oh
0:26:56	accuracy
0:26:57	yeah
0:26:57	yeah
0:26:58	i i always wonder why
0:27:00	i wanna see how well it does
0:27:02	yeah
0:27:04	yeah
0:27:05	you use you understand
0:27:06	well
0:27:08	and i am not
0:27:09	sure you're saying you're interested in
0:27:11	bring in distinguishing particular
0:27:13	mostly related to i
0:27:15	or are you saying i think of the identification problem
0:27:19	yeah the language of their and possibilities which one is it
0:27:23	it yeah i
0:27:24	yeah
0:27:25	you target for that
0:27:26	dialect
0:27:27	i think
0:27:27	yeah i'm interested in education
0:27:30	like
0:27:31	see
0:27:31	a comparison
0:27:33	uh
0:27:34	this
0:27:35	if you use your
0:27:37	oh
0:27:38	but i mean the language here
0:27:41	condition does that computing
0:27:44	yeah
0:27:45	but
0:27:45	but
0:27:46	yeah
0:27:47	what
0:27:48	you have
0:27:49	oh
0:27:50	oh
0:27:52	right
0:27:52	yeah
0:27:53	okay
0:27:54	oh
0:27:55	uh
0:27:56	no
0:27:58	one
0:27:58	like
0:27:59	yes
0:27:59	the
0:28:00	i comparison
0:28:03	oh
0:28:04	not
0:28:06	yeah
0:28:06	and
0:28:07	and
0:28:08	i
0:28:09	okay
0:28:10	yeah
0:28:10	i think
0:28:11	yeah
0:28:11	huh
0:28:12	and
0:28:13	right
0:28:14	yeah
0:28:15	no
0:28:16	yeah
0:28:19	as opposed to
0:28:23	i'm not okay nectar
0:28:24	a couple of that but maybe that's something we can talk about for the wrong one
0:28:35	yes
0:28:35	right
0:28:35	i can like
0:28:36	combine
0:28:37	um
0:28:38	uh
0:28:39	i've
0:28:42	it's one thing
0:28:43	so
0:28:44	yes
0:28:45	um
0:28:46	i've uh
0:28:47	i think that's
0:28:48	discuss
0:28:49	but
0:28:49	yeah
0:28:51	uh
0:28:51	qualitative
0:28:52	uh
0:28:54	this and
0:28:55	and other what this
0:28:57	oh
0:28:58	someone is
0:28:59	so
0:29:00	elements of this
0:29:01	the
0:29:01	the
0:29:02	pulling of the day good
0:29:04	and equal error rate being one point
0:29:06	oh
0:29:07	on the go
0:29:08	it's part of that discussion
0:29:10	um
0:29:11	so
0:29:12	i'm not going to
0:29:14	start that again
0:29:15	no
0:29:16	uh
0:29:17	i think i have something useful to say about
0:29:19	average
0:29:20	which
0:29:21	so
0:29:22	if you doing
0:29:24	identification
0:29:25	uh
0:29:27	given a speech segment
0:29:28	you told
0:29:30	you're in languages
0:29:32	speech segment can be in one of these in language
0:29:35	then
0:29:36	uh
0:29:36	you also have to assume some prior
0:29:39	so you can assume a flat prior
0:29:41	of the of those languages that you would
0:29:44	uh
0:29:47	yeah
0:29:47	likely
0:29:48	uh
0:29:49	before you look
0:29:50	the speech
0:29:51	that that would be
0:29:52	uh
0:29:52	the identification problem
0:29:54	so what nist is done
0:29:57	is
0:29:58	that i
0:29:58	uh
0:29:59	so
0:30:01	something this problem
0:30:02	if they're in languages at in doesn't apply
0:30:06	so
0:30:08	the in doesn't primes is
0:30:10	uh
0:30:12	target language number one
0:30:13	as a prior
0:30:14	oh
0:30:16	and
0:30:16	all of the other languages
0:30:18	uh
0:30:19	susan between them
0:30:20	uh
0:30:22	oh a probability of heart
0:30:24	so it's
0:30:25	it's just you try
0:30:27	and
0:30:28	then
0:30:28	you go to the next topic
0:30:30	two
0:30:30	you say you know this one has probability of false
0:30:33	all the others
0:30:34	uh
0:30:35	i have a smaller probability
0:30:37	then
0:30:38	you missus
0:30:39	D
0:30:40	uh
0:30:41	uh
0:30:42	essentially i didn't
0:30:43	cation yeah right
0:30:45	given that probably
0:30:46	in times
0:30:47	and you and all those
0:30:50	it it right
0:30:51	that's the that's the secret
0:30:54	um
0:30:56	so
0:31:00	okay
0:31:02	and he to be
0:31:03	to go on
0:31:04	the next
0:31:04	speaker
0:31:05	again
0:31:07	interesting
0:31:12	yeah

The 2009 NIST Language Recognition Evaluation

SESSION 7: Speaker and Language recognition - Evaluations and performance testing

Added: 14. 7. 2010 11:08, Author: Alvin Martin, Craig Greenberg (National Institute of Standards and Technology), Length: 0:31:13