Speech Transcript - Speech Technology Research at ICSI

0:00:11	hi my name is are free a i'm a computer science phd student graduating then
0:00:15	this year
0:00:15	and i work at the international
0:00:19	i'll give you guys abroad interview of icsi is recall
0:00:22	it's a off campus down by the bart station
0:00:25	we have multiple groups down there are have an open house
0:00:29	where will have a shuttle leaving here also just look for me an orange jacket
0:00:32	or just walk down for support set nineteen forty seven centre street
0:00:37	so my group in particular dixie is the speech group where you would lead by
0:00:41	professor nelson morgan he's the each professor
0:00:44	able recently transitioned a task even like men as a leader he's
0:00:48	gotta industry background he's been and you want to voice signal
0:00:52	dragons this go companies or a well-known in speech recognition field
0:00:56	about a half dozen phd students clean myself
0:01:01	so our main research areas we have speech recognition
0:01:04	this is speech-to-text taking an audio signal and converting it to words it's a field
0:01:09	with a long history that's a system from the really nineteen sixties at i b
0:01:13	m the recognized just the digits zero to ten
0:01:16	later in the nineties and now it's used for dictation in medical nickel confessions
0:01:21	and most recently you know it's been adopted in smart phones and digital assistants likes
0:01:25	your
0:01:27	we also do work in speaker recognition this is sort of voice biometrics
0:01:31	and we have a multimedia processing group which works in collaboration with a computer vision
0:01:35	researchers
0:01:36	both on campus and i dixie
0:01:40	so speech recognition this is the main thrust of our groups work
0:01:44	one of the project we have is looking at hidden markov models which underlie every
0:01:48	single speech recognition system
0:01:51	but it's based on a few flawed assumptions so years a graphical model representation of
0:01:55	a hidden markov model
0:01:57	we have observations and the darker shaded scores above and
0:02:00	hidden variables states on the bottom
0:02:03	and if you are familiar with the graphical model for lance were saying that all
0:02:07	the observations are conditionally independent
0:02:10	given the current state
0:02:11	i know that a simplifying assumption that makes algorithms tractable
0:02:15	but has consequences because reality is that these observations do you have a lot of
0:02:19	contextual
0:02:20	correlation
0:02:22	our work in acoustic features we look at basically dealing
0:02:25	with these shaded circles on top on their very noisy
0:02:28	you know speech recognition what's great clean conditions when real conditions the data is different
0:02:32	there's been a resurgence of interest in neural networks this is sort of a throwback
0:02:36	to the nineteen eighties that's been
0:02:39	kind of in some excitement because the restricted boltzmann machines and more computational power allows
0:02:46	a bill much larger
0:02:48	before networks and the research that i do specifically is related to
0:02:52	accordingly system that are very complex and somewhat targeted two major european languages
0:02:57	to languages i have fewer resources
0:03:00	so on this map of language families of the world
0:03:03	the parts and blue and red are the major european languages and the rest the
0:03:07	world most the stuff on the right
0:03:10	i have languages very different characteristics
0:03:14	in speaker recognition our work is really its robustness
0:03:18	because this work is related to
0:03:20	to security this work is funded by the army for the air force research labs
0:03:25	and darpa
0:03:25	and we're dealing with very difficult signals collected like
0:03:28	and
0:03:30	jet fighter cockpit and spire planes
0:03:33	but there's also another application which is speaker diarization
0:03:36	and we've done this in collaboration so far left building a real time online system
0:03:40	which given up
0:03:42	recording of a meeting say with multiple speakers
0:03:45	can
0:03:46	segment and label the regions which correspond to different speakers this helps you if you
0:03:51	want to build a transcript afterwards and have a labeled with different speakers
0:03:57	like the other work we do is related to this in
0:03:59	in terms of speech activity detection language identification and these dialogue systems we have speakers
0:04:06	who
0:04:07	speaker over each other they speak with each other and so we have to deal
0:04:11	this differently
0:04:12	a lot of it has to do with source speech understanding also rather than just
0:04:16	getting the words we try to understand what dynamics are what the intent is behind
0:04:20	the language and some of the more exciting work we're doing also involves looking at
0:04:24	actual
0:04:25	i data collected by scientists who
0:04:27	have access to
0:04:29	to brain measurements from small mammals or from patients
0:04:33	for example they do surgery on epilepsy patients and they stick electrodes quality in the
0:04:38	surgery
0:04:39	in the range
0:04:40	thus circles at icsi are we wanna have more robust signal processing for realistic conditions
0:04:46	we want to understand the scientific principles behind these engineering systems
0:04:52	and in general we support open collaborative research we work with
0:04:55	a lot of international universities and false domestically and i encourage you guys to check
0:05:00	up website and company open house today
0:05:07	policy of harmony

Speech Technology Research at ICSI

Video recordings