Speech Transcript - BINAURAL EXTENSION AND PERFORMANCE OF SINGLE-CHANNEL SPECTRAL SUBTRACTION DEREVERBERATION ALGORITHMS

0:00:15	i
0:00:15	but
0:00:18	thank you
0:00:19	and
0:00:20	the work has been carried out uh and good morning and the work has been carried out uh in the
0:00:25	department of electrical and computer engineering at the university of buttons in greece
0:00:29	by a at yeah your run D professor them would open as and my sense
0:00:33	and uh the work is on the binaural extension of
0:00:37	single-channel channel spectral subtraction
0:00:39	reverberation i'm
0:00:42	reverberation has been a challenging research is you for at least forty kate
0:00:47	and
0:00:48	now the verb techniques are applied either there are standard as standalone process
0:00:53	in order to enhance the reverberant signals quality
0:00:56	or even uh increase this reverberant speech intelligibility
0:01:01	or or as preprocessing steps before other several signal processing algorithms and applications in know the to increase their performance
0:01:10	and one one is developing uh binaural dereverberation algorithms
0:01:15	uh it should also take into account some constraint
0:01:18	that are imposed from the binaural aspect of the or a system
0:01:23	so as we all know um when the sound that i've in the left and that a right E channel
0:01:29	here
0:01:29	of the listener
0:01:30	it does as with a relative delay and the relative late uh level different
0:01:36	and these so-called binaural cues are important for the localization of
0:01:40	sound the sound space
0:01:42	and this should definitely be preserved from the binaural signal processing in general or
0:01:48	and more specifically for the binaural from the binaural dereverberation algorithms
0:01:53	on the other hand binaural reverberation
0:01:55	has very appealing applications
0:01:58	it can be applied in hearing aids
0:02:00	in binaural telephony in hands-free devices
0:02:03	in most of the code
0:02:04	a telecommunications
0:02:07	so uh recently we have proposed in our lab uh some single channel dereverberation algorithms
0:02:14	we have proposed a framework for improving single-channel channel existing spectral subtraction dereverberation algorithms
0:02:21	we have also uh presented a novel method um of high computational complexity that gives
0:02:27	uh perceptual sick need
0:02:29	perceptually good results
0:02:31	which is based on perceptual reverberation modeling
0:02:34	and also a fast uh semi-blind reverberation with
0:02:38	that's which is based on the hand club recording
0:02:40	which targets
0:02:41	speech application
0:02:43	so
0:02:43	the state for what step for us was to extent
0:02:47	uh sets a technique and the binaural context
0:02:51	and
0:02:52	and
0:02:53	the most of those uh thing to do was to extend uh the spectral subtraction dereverberation which is
0:02:59	uh techniques of low computational complexity when compared to sophisticated
0:03:04	and what that the remote pro
0:03:07	so the specific gains of this work
0:03:10	uh is to propose a single frame frame for the extension of single-channel channel spectral subtraction dereverberation algorithms
0:03:17	two
0:03:18	uh into use and efficient way
0:03:20	to prevent of estimation errors
0:03:23	and also to evaluate the proposed framework in several state-of-the-art spectral subtraction dereverberation technique
0:03:32	um
0:03:32	expect that subtraction was originally proposed for D knows in application
0:03:37	but recently it has been applied for the suppression of late reverberation
0:03:42	we all know in room acoustics that after the direct sound
0:03:46	the L reflections are i've these are discrete echoes that come from the close surface and produce spectral
0:03:52	uh degradation that is perceived as colouration
0:03:55	in the diffuse feed the late reverberation arrives
0:03:58	which has a the gang noise or a like characteristics
0:04:02	and he's perceived as the well-known signal a reverberant tails
0:04:07	so in the late reverberation suppression some context spectral subtraction
0:04:11	uh gives the any coke estimation by simply um subtracting from the reverberant signal and and then uh and uh
0:04:17	estimation of late variation
0:04:20	and mostly liberation separation methods that work can this way
0:04:24	how to uh estimate exactly these late reverberation spectrum more power spectrum depending on the method
0:04:31	and let's look some state-of-the-art methods
0:04:34	yeah
0:04:35	the methods proposed by where wine gone for we can cut out come from a one will refer to them
0:04:40	as W W an S K A
0:04:42	i i taking someone assumptions on the reverberant signals that these six
0:04:46	while the well known uh reverberation technique from bar to and then be
0:04:50	uh uh from oh no we refer to this as a B
0:04:53	is um a concern assumption on reverberation characteristic
0:04:57	keep in mind that
0:04:59	we can easily express the subtraction um
0:05:03	a principle as again multiplication
0:05:06	uh in the frequency domain by deriving the appropriate gain
0:05:11	so the
0:05:11	a straightforward approach would be to implement separately in the binaural context uh independently this uh late reverberation suppression technique
0:05:20	for the left and the right channel
0:05:23	but it has been proved that the lateral signal processing will destroy this binaural cues and
0:05:29	uh it will make the localization in the produced signal uh be distorted so
0:05:34	and in the bibliography
0:05:37	be i hitting a can team has proposed
0:05:40	uh spectral subtraction extension which is based on the delay and sum beamformer
0:05:45	uh
0:05:46	by beamforming by actually at thing at the left and the right the channels and synchronizing then
0:05:51	um
0:05:52	it produces a reference signal it then makes the late reverberation estimation and the signal and then it apply spectral
0:05:59	subtraction independently
0:06:01	uh in the left and the right yeah
0:06:03	and so the binaural cues are present
0:06:08	in these work
0:06:09	uh i will make an extra samson that the relative delay between uh that to um E S i actually
0:06:16	depends on the weight of the human head and
0:06:18	it can be assumed that it would be uh smaller than the typical analysis windows
0:06:23	so we for this work uh we meet the delay and sum beamformer state
0:06:28	and we propose a binaural extension which is based on a single channel uh spectral subtraction dereverberation on
0:06:35	based and lateral again of station
0:06:38	a see the signal flow of the proposed approach
0:06:41	uh
0:06:42	separately from the web left and the right a rubber and frames with the two different estimations
0:06:49	and uh know the to derive the bi lateral games
0:06:52	then these gains are combined
0:06:54	with a chosen a again of the patient seen
0:06:57	in order to to give us the binaural game
0:06:59	then
0:07:00	again my to the regularization seem that prevents from of very estimation roles that we introduce here is applied
0:07:06	in order to give us a constraint binaural again
0:07:09	which is separately independently
0:07:12	applied on the left and the right frame
0:07:16	the gain adaptation for the gain adaptation in this work was chosen the or to use uh started is
0:07:21	by taking the marks again in it's frequency being uh we had seemed more it's operation and fewer processing artifacts
0:07:28	by taking the average gain would be the compromise between the reverberation reduction and the processing folk
0:07:34	while the minimum gain give significance of print so oppression but
0:07:38	it can be easily introduce artifacts
0:07:41	so the selection of the gain of the patients one was made according to the application scenario
0:07:47	you know there
0:07:48	these blind method as are uh use and introducing uh signal artifacts and to not to to prevent from such
0:07:55	of estimation not different
0:07:57	um
0:07:58	we have
0:07:59	uh probe proposed here we introduce here again a market to the regularization step
0:08:04	which is implemented
0:08:06	uh in the low signal to reverberation or should detector
0:08:09	the assumption here is that um
0:08:13	musical noise or yeah other of estimation that the facts
0:08:16	will a um
0:08:18	we are more probably to uh be present in low signal to reverberation racial frames
0:08:24	and this these and didn't regularization sing
0:08:28	uh depends on a regularization application of to see that
0:08:32	and
0:08:33	on a regularization ratio are
0:08:35	these are user defined parameters that can be a just
0:08:39	in order to um control the suppression rate
0:08:42	so this that um
0:08:44	while properly uh just adjusting these parameters can compensate for estimation error
0:08:49	and prevent musical noise
0:08:52	further explain uh the use of these parameters
0:08:56	these are typical spectral gain functions
0:09:00	and now by keeping seat that to zero point two and are equal to
0:09:05	uh are equal for an a equal or are equal eight we can see how the gain functions
0:09:10	saying
0:09:12	and
0:09:12	but keeping think to uh uh are constant we can change the
0:09:17	two zero point four and zero point sick
0:09:20	so we
0:09:21	from here we can see that a that can be used for the but note um
0:09:26	control of the separation range
0:09:28	why of the parameter R can be used for fine tuning the method
0:09:34	uh let's present some results
0:09:36	uh these results
0:09:37	um
0:09:39	are uh um made with um measure at um
0:09:44	i impulse responses
0:09:45	these uh a specific uh a is since a given from the i can that the base yeah that the
0:09:50	base
0:09:50	in the stairway away for uh with a reverberation time of
0:09:54	zero point seven approximately
0:09:56	note the to evaluate the results
0:09:58	uh we used to metrics the signal to reverberation
0:10:01	or a should difference when compared to the reverberation
0:10:04	to the reverberant signal
0:10:06	so pos difference is be note that the um
0:10:09	more significant reduction
0:10:11	and also um medic the pesq Q uh difference when comparing to the reverberant signal
0:10:17	which relates more to the perceptual
0:10:19	uh quality of the final result
0:10:22	uh we implement
0:10:24	uh this
0:10:25	three by a binaural gain adaptation the patient started is
0:10:28	as well as a delay and sum beamformer or in three state of the art a spectral subtraction dereverberation algorithms
0:10:34	V L B W W gone of gay
0:10:37	and as we can see
0:10:38	uh all of the then any can me significantly reduce reverberation
0:10:43	as we expected the mean gain adaptation seem we'd uses more reverberation while the marks gain less
0:10:49	and when seeing the
0:10:51	where P Q difference which makes more sense in a from a perceptual point of view we can see that
0:10:56	the W W method with the mean game technique
0:10:59	uh gives slightly but the results
0:11:03	these results are taken in the at the uh from the all the would that the base
0:11:07	and
0:11:09	these cafeteria has uh
0:11:11	high reverberation time of one point three seconds
0:11:15	and um
0:11:17	ooh
0:11:17	as we can see that is the reverberation reduction here is um
0:11:23	smaller
0:11:24	and it seems that
0:11:26	such techniques in the sets reverberant conditions
0:11:30	uh and enhance the final signals
0:11:33	but on the other hand uh the enhancement is less than the previous case
0:11:39	again uh the W W to can uh technique i had achieved uh but the results
0:11:45	in terms of
0:11:46	um S R are and press
0:11:49	and uh the best results were uh were observed for the average gain adaptation seen
0:11:57	so we not there to presents some further evaluation we conducted
0:12:01	um
0:12:02	subjective evaluation test
0:12:04	this test was based on the I T U B
0:12:08	eight thirty five and recommendation
0:12:12	and
0:12:13	seventeen test subjects participated in the test
0:12:16	uh we made by a look test not the to get to test the um two
0:12:20	choose the best of the station
0:12:22	and seem for the set it's techniques so for the L B and W W technique
0:12:28	the average gain adaptation was chosen while for the S T A an meaning i the M meaning gain technique
0:12:33	was chosen
0:12:35	and the test subjects were asked
0:12:37	two or rate the speech not real nice
0:12:40	they reverberation intrusive an S and the overall quality of
0:12:44	this speech signals
0:12:46	um
0:12:47	for a in a most K from zero to five
0:12:51	so from these results
0:12:54	we can see that uh the test subjects
0:12:57	rate the dereverberated signal
0:13:00	i net less natural in all cases
0:13:03	however
0:13:04	and we notice a significant reverberation reduction
0:13:09	and also
0:13:10	at least the L B and W W techniques preserve the signal quality the overall signal while
0:13:18	and a for gently we need
0:13:20	um
0:13:21	headphones phones know the to diffuse some them one
0:13:23	but if anyone is interested
0:13:26	uh that then was out of a are available in the web of our group
0:13:30	um B website is also in the paper uh is written in the paper
0:13:36	so to sum up
0:13:38	and
0:13:39	we have introduced a framework for five binaural spectral subtraction dereverberation
0:13:43	which is based on bi lateral gain adaptation
0:13:47	the gain map and the regularization seeing that we introduced can read use the over estimation errors
0:13:52	and produce some uh uh and
0:13:54	um
0:13:56	preserve
0:13:56	from some uh
0:13:58	uh the gradations uh processing the gradations
0:14:02	the selection of the adaptation seem and the D M parameters
0:14:05	uh can be made according to the application scenario
0:14:09	and there is also significant reverberation reduction
0:14:12	uh while the overall speech quality and the binaural cues are
0:14:17	can be present
0:14:19	how there
0:14:20	we noticed some loss of speech naturalness
0:14:24	so for the for us this indicates the need for native binaural mode it's
0:14:28	models that take into account the binaural properties of the to the system
0:14:33	and
0:14:33	this is on what where working right now
0:14:37	thank you very much
0:14:42	okay um we have time for a few questions
0:14:46	you that questions can you just use the microphone over there
0:14:52	any questions from the audience
0:14:59	and questions
0:15:01	okay maybe i just start
0:15:03	okay a how do you
0:15:05	oh man on the uh uh the accuracy of this been all real
0:15:09	oh and uh cues preservation
0:15:12	uh this is a big problem because actually we
0:15:15	the a perceptual test
0:15:17	that can and exactly and um
0:15:22	read the of on the on these
0:15:24	these need to really control the um
0:15:27	and um
0:15:29	environment
0:15:30	and so it was really difficult to do so it's that's actually
0:15:34	uh i think that i i'm not aware of uh and it test for reverberation a graph that
0:15:40	and and um
0:15:42	uh exactly uh predict the these
0:15:44	uh binaural cues preservation
0:15:47	um this is the for the for further investigation
0:15:51	so you have not done any subject you test on this
0:15:53	on these snow
0:15:59	the questions
0:16:03	you know we best you really
0:16:08	uh
0:16:08	another question is how do you did the mean this power meters
0:16:12	you know G R G M R
0:16:14	or
0:16:14	uh
0:16:15	these parameters
0:16:16	actually depend on the frame length
0:16:19	and on the reverberation time on how to store to this you not signal
0:16:24	and we give some uh range for the parameters in the paper
0:16:29	so
0:16:30	uh
0:16:31	actually the they are
0:16:33	different frequencies range
0:16:35	for it's sampling frequency needs frame length
0:16:38	that's the user can that know the to take the optimal results
0:16:41	so for your experiment or for simulations are a bit sorry
0:16:45	use no we made by look test
0:16:47	to tune the parameters
0:16:49	for these
0:16:50	a rules what different environments yes
0:16:58	any questions
0:16:59	so you've not last thanks the speakers again

BINAURAL EXTENSION AND PERFORMANCE OF SINGLE-CHANNEL SPECTRAL SUBTRACTION DEREVERBERATION ALGORITHMS

Industrial Technology for Speech Processing Applications

Presented by: Alexandros Tsilfidis, Author(s): Alexandros Tsilfidis, Eleftheria Georganti, John Mourjopoulos, University of Patras, Greece