Speech Transcript - Out-of-Set i-Vector Selection for Open-set Language Identification

0:00:14	hello i'm having problems from university of east and feel and
0:00:19	well it's my pleasure two presents my guess that the in this workshop i dunno
0:00:23	it's good to be the last
0:00:25	among the last the speakers or not but
0:00:28	well
0:00:29	in the following fifteen twenty minutes i will present
0:00:34	and effective and simple a out of the detection method over i-vector space in the
0:00:39	context of a language identification
0:00:43	well
0:00:45	language identification can be done in two ways one is closed set
0:00:50	where the language of a test segment corresponds to one of the instead or target
0:00:56	languages
0:00:57	and in open-set
0:01:00	where the language of a test segment may not
0:01:03	be any of the target languages
0:01:07	the task is to classify
0:01:09	the test segment
0:01:11	either into one of the inset languages for
0:01:15	and out of set model
0:01:17	well
0:01:18	one way to perform open set language identification is to training
0:01:25	i out of set model from additional data
0:01:29	but
0:01:31	then the data is huge and on and only build
0:01:35	the practical key question is
0:01:38	how to select the most representative out of set data
0:01:42	to model to be all this out of set model in other words
0:01:47	how to obtain
0:01:50	the higher quality
0:01:52	out of set data or additional data to train
0:01:56	this an out of set model
0:01:59	well
0:02:01	in the context of language identification the good candidates for out of that they do
0:02:07	have some properties deductible of their main properties or
0:02:12	i don't set candidates should come from a different lingo is the language families
0:02:19	by language families i mean that those languages that have the same kinds of the
0:02:24	common ancestor for example a russian ukrainian polish are all from the slavic a language
0:02:33	family
0:02:34	and the second property
0:02:37	is that open-set candidates should be pillows
0:02:41	into instead languages while others for of a well i because of having at various
0:02:48	general out of set model which represents which better represent the ward of out of
0:02:54	set data or out of set languages
0:02:57	well
0:02:58	and are some ways to do this
0:03:01	dorsum classical approaches one is one class svm where the idea is to enclose the
0:03:07	data with an hypersphere
0:03:11	and collapsible new data has an or model if they fall within this hypersphere and
0:03:18	as out of set out otherwise
0:03:21	to other classical approaches are k nearest neighbor where
0:03:26	given each data a the sum of its distances between this data and it's k
0:03:33	nearest neighbours are computed and
0:03:37	the higher this task is the more a confidence we ought to say that this
0:03:43	data is outlier is out of set
0:03:46	and another classical approaches distance the class means of l if we assume that the
0:03:51	data is a gaussian
0:03:54	those data that long
0:03:57	two or three the standard deviation a bill or eyeball the class name
0:04:02	are considered as out of set data
0:04:06	what we consider in this study is to use of a nonparametric statistical test known
0:04:12	as a whole marker of the smirnoff test
0:04:15	it's a non parameter
0:04:17	nonparametric
0:04:18	and the idea is to
0:04:21	we have two samples
0:04:25	we estimate
0:04:26	but their these two samples have the same underlying distribution
0:04:31	but computing the maximum difference between their
0:04:34	empirical cumulative distribution functions
0:04:38	well as you could see in this picture this maximum difference is known ask i
0:04:44	guess value if it is a great an accurate critical value
0:04:49	we can in indicates that this these two samples are from different distributions or in
0:04:56	our case from different classes
0:04:58	okay how we adopted and two are open set language identification task
0:05:04	well even and unlabeled vector w us up a script on i and all their
0:05:09	all i-vectors in class barely language l we can he would
0:05:15	that a the empirical cumulative distribution functions between this w only and all directors
0:05:22	then we will have a
0:05:24	if you have a and samples in this language
0:05:28	language l
0:05:29	we will come up with l individual k s values so we take average for
0:05:35	on this
0:05:37	individual king is values and then become a bit average k s e
0:05:42	that corresponds to
0:05:44	and outlier a score of w on i in language
0:05:49	well
0:05:50	we repeat this work other l target languages
0:05:54	and then become a bit l average k s values and then we take the
0:05:59	minimum value
0:06:00	as the final outlier a score
0:06:03	for and w only
0:06:05	this unlabeled i-vector
0:06:07	well
0:06:10	it's interesting that this that the distribution of this case you values
0:06:15	have also a distribution
0:06:18	in this in this picture
0:06:21	and the and the red bars shows the instantaneous in values meaning that for example
0:06:26	if you're in the data class
0:06:28	and the red ball strolls the shows that
0:06:33	for computing the red bars the in the data
0:06:37	those data that correspond to derek the last very used to compute the k s
0:06:41	z values and the for the and for the blue wires and the outputs that
0:06:45	they to those they don't that do not belong to their equal ask for example
0:06:50	very use the computer used you values
0:06:53	and interestingly
0:06:55	the incipiency values
0:06:58	tends to values close to zero and out of set
0:07:02	casey value stands to
0:07:04	and values close to one
0:07:06	so we couldn't see this problem where do directly about looking at that the data
0:07:13	the beginning but now
0:07:15	we have a tool that shows how instead that out of set data are separated
0:07:20	well that's
0:07:22	applied in our open set language identification task
0:07:26	well
0:07:29	be applied idea and the and used language i-vector challenge two thousand fifteen
0:07:35	the training set corresponds to prevent house and
0:07:37	utterance s
0:07:39	fifty in that languages
0:07:42	and development sets has six thousand five hundred on labeled
0:07:48	data and the same amount of data for the test set
0:07:52	well the data that was balance between each languages
0:07:56	and the dimensions of the i-vectors were four hundred
0:08:00	and to be did some post-processing like within class covariance normalisation and
0:08:05	linear discriminant analysis
0:08:08	and the i-vectors
0:08:11	well
0:08:12	to perform
0:08:15	evaluation of the out of the detection methods we need labeled data because the development
0:08:21	set didn't have a label was not labeled be used for training set to
0:08:28	to be segmented training set into three different portions training you have and test portions
0:08:34	so that we have certainly we assign thirty instead languages and twenty out of set
0:08:39	languages
0:08:41	and the test portions has all the languages of the instead
0:08:45	and twenty out of set
0:08:47	and the data was
0:08:49	what's didn't have any overlap between these three portions
0:08:53	well
0:08:57	if here is an example of labeling of the out of set and for the
0:09:01	out of set a evaluation for example for those data that and their true language
0:09:08	was one of the instead languages for example data id one
0:09:13	be a label it as instead
0:09:14	and for those data that there
0:09:17	two language was different done
0:09:19	one of things that line from the instead languages
0:09:23	we label
0:09:24	we label them as out of set
0:09:29	here is the results of
0:09:31	on a out of the detection methods and our proposed
0:09:35	method well case devalues yes i a method outperforms other classical approaches
0:09:41	for example in case of svm and knn we have fourteen and sixteen percent relative
0:09:47	it all error rate reductions in out of set detection
0:09:52	well
0:09:54	before their f use this baseline systems with k s and we have improvement we
0:10:00	have improved all individual systems by
0:10:02	by fusing k s e with them
0:10:05	and the best performance is fusing k is a bit one class it's we have
0:10:09	that resulted in twenty percent
0:10:12	it while error rates of around twenty eight
0:10:14	individual t s a we dropped
0:10:17	the equal error rate to twenty percent
0:10:20	well
0:10:23	let us look at the open set language identification results
0:10:28	here
0:10:28	the table and the different roles in the table shows
0:10:34	and
0:10:35	the they differ based on the data selected for out of set modeling
0:10:40	for example we have random
0:10:42	we use all the training set
0:10:44	all the development set combination of training and development set
0:10:48	and the last rule is the proposed selection method
0:10:52	as a for the reference purposes we include that the colours that result
0:10:56	this results are based on the svm classifier and dark directly reported from the news
0:11:02	evaluation website
0:11:04	well
0:11:05	the proposed selection method
0:11:09	based on identification results sorry i didn't mention that
0:11:12	the
0:11:14	bill the lines are that identification "'cause" is twenty six around twenty six
0:11:18	a performance that nist baseline
0:11:21	buys thirty three percent relative
0:11:23	improvement the best relative improvement was fifty
0:11:27	fifty five percent
0:11:31	well
0:11:33	looking at the for the first rose
0:11:35	i think i think additional data well hand held to reduce the identification cost but
0:11:43	what not was not bitter and then selecting
0:11:47	so selecting in a supervised by selecting out of set a date or in a
0:11:51	supervised a
0:11:53	well
0:11:56	here we look at be we compare the
0:12:02	casey with other out of the detection methods in the open set language identification
0:12:07	well all of them help to
0:12:10	all of them and outperforms the that the candles that results
0:12:14	but they contain is it is the wiener system with twenty six
0:12:19	identification cost
0:12:21	well
0:12:23	we had one thousand five hundred out of set data
0:12:27	and you set and fifteen
0:12:31	out of that language as we were able to detect what around one thousand of
0:12:35	them
0:12:36	with this method
0:12:38	it can use them as that
0:12:40	so that the and important thing in this challenge was
0:12:44	two bitter detect out of set it change your level when you correctly detect out
0:12:50	of set data
0:12:51	well in the conclusion
0:12:55	in this study
0:12:57	we propose to use a simple and effective method to detect out of the data
0:13:03	over i-vector is space we showed that
0:13:06	this no
0:13:08	the that the case in values the proposed method
0:13:12	has it nicely distribution
0:13:15	and then been integrated to the open set a like this is that we receive
0:13:20	thirty three percent relative reduction in identification cost
0:13:24	or a closed set
0:13:26	system
0:13:27	okay thank you for attention
0:13:49	so if you if you go back to slide fifteen
0:13:56	making did you
0:13:59	did you try different partitions of in set not observed and the this
0:14:06	make much of the difference for your
0:14:09	well no we select that's their twenty percent
0:14:12	is there content you languages or c
0:14:15	so this was on the next slide but you the thirty and twenty you didn't
0:14:18	write different portions now do you think this would have made a difference
0:14:25	in our offset detection yes
0:14:30	yes it
0:14:33	i dunno what you mean by making a difference but
0:14:37	the results maybe difference but the output
0:14:40	will be the same this is the this case system
0:14:43	it's something are
0:14:44	among other systems
0:14:46	i see but maybe the amount by which one
0:14:49	whatever the
0:14:51	different had you selected
0:14:54	which we ran the random it's not supervising on the selected target languages
0:14:59	and set and twenty s out of that
0:15:02	and the other are there other questions
0:15:17	one classes them what the couldn't that used
0:15:22	investment coding what was the current that linear yes polynomial kernel
0:15:29	and
0:15:30	between the two images that used
0:15:33	that you can that he scanned and one and the ones
0:15:37	which one is more efficient
0:15:40	which was the first one
0:15:42	fast this one
0:15:43	well
0:15:47	my method was fast
0:15:50	and knn was also first not a
0:15:54	i didn't look carefully at that well the speed but
0:16:00	i think goes and this one class svm this the this nonstick plastered to cluster
0:16:07	mean and
0:16:08	gaussian and canyon unless it
0:16:12	the speech or more or less the same
0:16:16	but i didn't look at the speaker now step by step
0:16:20	evaluation
0:16:30	if there are no the questions let's take the speaker again please

Out-of-Set i-Vector Selection for Open-set Language Identification

NIST 2015 Language Recognition i-Vector Machine Learning Challenge

Hamid Behravan, Tomi Kinnunen, Ville Hautamäki