Speech Transcript - Bhattacharyya-based GMM-SVM System with Adaptive Relevance Factor for Pair Language Recognition

0:00:16	Changhuai
0:00:17	you Haizhou Li
0:00:21	and Ambikairajah Kong Aik Lee and
0:00:25	oh
0:00:27	presented by
0:00:48	good afternoon every one
0:00:51	the paper i would like to present is entitled
0:00:53	Bhattacharyya based gmm
0:00:56	SVM system with adaptive from
0:00:59	relevance factor for pair language recognition
0:01:06	and outline
0:01:07	oh for this presentation is shown here
0:01:11	in this pair language recognition system and we major focus by using
0:01:17	a studying the three
0:01:20	techniques including Bhattacharyya based gmm svm
0:01:25	an adaptive relevance factor as well as strategies for pair language recognition
0:01:34	given a specified language pair the task of
0:01:38	recognition of
0:01:39	language pair is to decide which of these
0:01:43	two languages is in fact spoken in the specified in a given segment
0:01:50	so we develop pair language recognition systems by studying bhattacharyya base gmm svm
0:01:59	by introducing mean supervector and the covariance supervector and we merge these two kind of
0:02:07	sub kernels together to form a better performance
0:02:12	for this
0:02:13	a hybrid system
0:02:15	we also
0:02:17	in order to compensate those duration effect
0:02:21	and we introduce adaptive relevance factors
0:02:25	alright
0:02:27	of
0:02:27	and MAP in gmm svm systems
0:02:31	and for the purpose of pair language recognition we introduce two set of strategies
0:02:39	for this a big
0:02:40	condition purpose
0:02:42	and we report our system design
0:02:45	for each progress
0:02:47	for LRE twenty eleven submission
0:02:54	so
0:02:56	in a speaker and language recognition system normally
0:03:01	and there are two typical kernals for gmm svm they are
0:03:07	kullback leibler kernel and bhattacharyya kernel
0:03:11	used
0:03:12	conventional kl kernel only includes mean information
0:03:19	for recognition that modeling
0:03:22	however
0:03:23	a Symmetrized version of the k l
0:03:27	can extend
0:03:28	it to include the covariance term
0:03:33	here
0:03:38	so why we choose
0:03:40	Bhattacha ryya based kernel for language pair
0:03:44	recognition
0:03:46	so based on many experiments
0:03:50	for speaker and language recognition systems
0:03:54	we observed the bhattacharyya based kernel has better performance than k. l.
0:04:01	so
0:04:02	in the bhattacharya kernel
0:04:05	there are
0:04:07	this kernal actually could be splitted
0:04:09	can be splitted into three terms the first term
0:04:13	can contribute is contributed by mean and covariance of
0:04:18	gmm
0:04:21	and the second term
0:04:22	involves the covariance term only the third term is
0:04:27	involves weight but
0:04:29	parameter of gmm only
0:04:32	so actually these three terms can be independently used to give
0:04:37	the recognition decision score
0:04:40	with different degree of information contribution
0:04:46	so by using the first term of the Bhattacharyya kernel
0:04:50	so with
0:04:51	keeping covariance
0:04:54	not updated
0:04:55	that
0:04:56	we can get the mean supervector train
0:04:59	stress
0:05:00	so
0:05:02	so these kind of kernel could be independently used as a sub
0:05:08	modeling
0:05:10	and then
0:05:11	second term only includes the covariance term
0:05:14	ah so we can get the
0:05:18	covariance supervectors from this term
0:05:21	we only use
0:05:22	the first two terms of the bhattacharyya kernel
0:05:26	for our
0:05:28	for our pair language recognition
0:05:31	system design
0:05:35	so the NAP for both
0:05:39	a mean supervector and the covariance supervector of Bhattacharyya
0:05:47	are trained by using different
0:05:49	a database with
0:05:51	a certain amount of overlap
0:05:53	this purpose is to
0:05:56	oh
0:05:57	to increase those compensation factors
0:06:03	so for this UBM database and the
0:06:07	relevance factor database training
0:06:10	we can
0:06:12	use the common to both
0:06:15	supervector mean and covariance
0:06:21	so in order to compensate duration variability we introduce adaptive relevance factor
0:06:28	sure
0:06:29	and this adaptive relevance factor of MAP
0:06:33	in gmm svm
0:06:35	here we show the MAP position
0:06:38	in gmm svm system
0:06:41	so this equation is the mean updated
0:06:46	of MAP
0:06:48	so here the x_i is the first of sufficient
0:06:52	statistic statistics
0:06:54	so you can see the relevance factor gamma_i can indirectly affect the degree of update
0:07:02	for the mean vectors of gmm
0:07:06	so
0:07:09	so we assume
0:07:12	once we
0:07:13	we have this relevance factor be a function of duration it is possible to do
0:07:19	some compensation work
0:07:21	in this
0:07:24	mean update
0:07:27	so far there are two types of relevance factors
0:07:30	one is in the classical MAP
0:07:34	usually we use fixed value of relevance factor
0:07:38	so the relevance factor also can be data dependence by this question
0:07:45	this equation is derived from
0:07:48	from the factor analysis research
0:07:53	here the phi is a diagonal matrix that can be trained by using development database
0:08:01	so assume this relevance factor be function of k. is related to the number of
0:08:09	features that is connected to duration
0:08:14	so we can see the occupation
0:08:18	count N_i
0:08:20	we do the expectation
0:08:22	on this occupation count and we can see this
0:08:26	the expectation of the occupation count is directly
0:08:30	proportional to proportional to the durations
0:08:34	so if we choose this function as the duration function for
0:08:43	for the relevance factor so we can have expectation of adaptation coefficient
0:08:51	of MAP mean adaptation trends to a constant vector so we can get this
0:08:58	adaptive relevance factor by this equation
0:09:03	so this equation will result in
0:09:06	g.m.m. being independent of duration
0:09:13	now we go to the third point of our presentation
0:09:17	we propose two strategies for pair language recognition the first one is one
0:09:25	to all strategy
0:09:27	also called core to pair modeling
0:09:32	this modeling means we train gmm svm models for certain
0:09:38	target language against all other target languages
0:09:42	so we can have the score vectors here
0:09:45	with this score vector and by using our development database for all the target
0:09:53	languages and we can have the back
0:09:56	the gaussian backend modelings
0:09:58	for this the end
0:10:01	for these N languages
0:10:04	so
0:10:04	finally
0:10:07	and language pair scores can be obtained
0:10:10	through the log likelihood ratios shown here
0:10:16	so the second
0:10:17	strategy is a pairwise strategy also called pair modeling
0:10:22	this modeling is very simple just use
0:10:28	two languages' database from the language pair
0:10:31	directly train the model of gmm svm and we get
0:10:36	this modeling
0:10:38	and we get
0:10:39	the scores
0:10:44	for the fusion of the two strategies
0:10:46	we only apply equal weights
0:10:48	for this
0:10:50	that means we assume
0:10:52	that importance of the two strategies
0:10:54	are the same
0:10:55	so we get the final score by fusion the two strategies
0:11:03	here we show a hybrid
0:11:05	pair language recognition system
0:11:10	we get the test utterance we can have
0:11:13	Bhattacharyya mean supervector and covariance supervector
0:11:19	together input to
0:11:21	the two
0:11:22	strategies
0:11:24	and we get the merging of the two supervectors in each of the
0:11:31	strategies
0:11:33	finally we fusion these two strategies together and we get the final score
0:11:42	we do the evaluation for our
0:11:45	pair language recognition design
0:11:47	by using
0:11:49	NIST LRE 2011 platform
0:11:53	here there are twenty-four target languages so totally
0:11:58	there are
0:12:00	two hundred and seventy six language pairs
0:12:03	so we choose
0:12:05	five hundred and twelve Gaussian components for gmm
0:12:09	and ubm and
0:12:11	oh we
0:12:13	do these experiments
0:12:16	and show the results based on thirty second task in this paper
0:12:22	but we also do other duration parts in our experiments
0:12:29	so here we use eighty dimensions MFCC SDC
0:12:36	and this MFCC SDC features
0:12:39	with energy based vad
0:12:42	and the performance is computed
0:12:45	as average cost
0:12:47	for the N worst language pairs
0:12:51	here we list
0:12:52	the training data base
0:12:54	for both CTS and BNBS
0:12:59	set
0:13:02	for our language pair recognition training
0:13:06	now we show the experiment results
0:13:09	by comparing firstly we compare the fixed relevance factor and adaptive relevance factor
0:13:16	effect
0:13:17	the table one shows
0:13:19	under
0:13:21	the core to pair
0:13:22	strategy we show
0:13:25	this
0:13:27	fixed relevance factor set to three different
0:13:31	value zero point two five eight thirty two and we give
0:13:36	the eer and the minimum cost
0:13:39	here and compare with arf that is
0:13:43	adaptive relevance factor and we compare these two
0:13:48	compare these data we can say
0:13:54	the adaptive relevance facotr performs
0:13:56	better than any of the
0:13:59	fixed relevance factor settings
0:14:02	so the similar observations
0:14:04	found
0:14:08	in this pair strategy
0:14:11	here and say twelve point
0:14:13	seven five percent for
0:14:15	in terms of eer
0:14:17	and
0:14:19	and the other one is higher one
0:14:22	with the relevance factor settings
0:14:28	the second experiment we are doing
0:14:30	is for
0:14:32	the effect of the merging.
0:14:34	the two sets of supervectors
0:14:36	mean supervector and covariance supervector
0:14:40	the blue color means the mean supervector
0:14:44	the green color we present
0:14:46	the
0:14:48	Bhattacharyya covariance
0:14:49	supervector with eighty dimension
0:14:52	MFCC sdc features
0:14:54	and arf is adaptive relevance factor
0:14:58	so we
0:14:59	we do this experiment
0:15:02	under
0:15:03	core to pair strategy and we show the red
0:15:10	color
0:15:11	this merging effect
0:15:12	in the red color and we can see
0:15:15	performance is obviously
0:15:18	over the previous one that's mean and covariance
0:15:26	this figure is based on
0:15:28	N top
0:15:29	language pairs that is
0:15:33	the worst
0:15:36	performance of EER
0:15:38	with N times N minus one divided by two
0:15:42	language pairs
0:15:45	so the similar
0:15:47	results
0:15:50	can be found in the
0:15:52	pair strategies
0:15:54	also the red color always
0:15:57	so
0:15:59	most of the language pairs is lower it gives
0:16:03	lower minimum detection cost
0:16:10	finally
0:16:11	we will show the fusion effect
0:16:17	with
0:16:19	the two pairs
0:16:20	the first one
0:16:22	the blue one is core to pair and the green one is for the pair
0:16:27	strategies after we merging this two strategies we can get the final results
0:16:34	with eer of
0:16:36	ten point
0:16:37	something percent
0:16:38	and the minimum cost is zero point zero nine
0:16:46	oh we come to conclusions for my presentation we have developed a hybrid
0:16:52	Bhattacharyya based gmm-svm system for pair language recognition
0:16:57	for the purpose of LRE twenty eleven submission
0:17:03	performance after the merge of
0:17:06	mean supervector and covariance supervector is obvious
0:17:10	we compare to the fixed relevance factor
0:17:14	and we aobserved the adaptive relevance factor is effective
0:17:18	for the pair language recognition
0:17:21	and
0:17:22	finally
0:17:24	we can say the fusion of core to pair and pair strategies
0:17:29	is useful
0:17:32	here we show some reference papers especially for the first one from patrick kenny he
0:17:39	proposed this database
0:17:41	data dependent relevance factor
0:17:44	thank you
0:18:11	oh okay
0:18:14	firstly we choose these
0:18:16	mean and covariance super vectors
0:18:20	this means we don't want to merge
0:18:24	this mean and covariance informations in one kernel
0:18:29	we want to separate it because we find if we separate it
0:18:35	we may get better performance after merging these two
0:18:39	supervectors together
0:18:44	we ever compared it
0:18:49	so that is when we
0:18:53	is
0:18:53	when we do the kernel with the first term and the second term merging together
0:18:59	to produce only one kernel and compare with the separated kernels that is mean kernel
0:19:05	and covariance kernel after that fusion together
0:19:12	the latter effect is better
0:19:17	okay
0:19:24	oh
0:19:24	okay
0:19:28	that is
0:19:32	i think at least
0:19:37	because it is based on different training and testing environment
0:19:42	and database
0:19:44	so totally the effect is obvious
0:19:51	oh

Bhattacharyya-based GMM-SVM System with Adaptive Relevance Factor for Pair Language Recognition

SESSION 11: Language Recognition - Feature, Classifier and Fusion

Changhuai You