Speech Transcript - Speech Technology Research at ICSI

hi my name is are free a i'm a computer science phd student graduating then

this year

and i work at the international

i'll give you guys abroad interview of icsi is recall

it's a off campus down by the bart station

we have multiple groups down there are have an open house

where will have a shuttle leaving here also just look for me an orange jacket

or just walk down for support set nineteen forty seven centre street

so my group in particular dixie is the speech group where you would lead by

professor nelson morgan he's the each professor

able recently transitioned a task even like men as a leader he's

gotta industry background he's been and you want to voice signal

dragons this go companies or a well-known in speech recognition field

about a half dozen phd students clean myself

so our main research areas we have speech recognition

this is speech-to-text taking an audio signal and converting it to words it's a field

with a long history that's a system from the really nineteen sixties at i b

m the recognized just the digits zero to ten

later in the nineties and now it's used for dictation in medical nickel confessions

and most recently you know it's been adopted in smart phones and digital assistants likes

your

we also do work in speaker recognition this is sort of voice biometrics

and we have a multimedia processing group which works in collaboration with a computer vision

researchers

both on campus and i dixie

so speech recognition this is the main thrust of our groups work

one of the project we have is looking at hidden markov models which underlie every

single speech recognition system

but it's based on a few flawed assumptions so years a graphical model representation of

a hidden markov model

we have observations and the darker shaded scores above and

hidden variables states on the bottom

and if you are familiar with the graphical model for lance were saying that all

the observations are conditionally independent

given the current state

i know that a simplifying assumption that makes algorithms tractable

but has consequences because reality is that these observations do you have a lot of

contextual

correlation

our work in acoustic features we look at basically dealing

with these shaded circles on top on their very noisy

you know speech recognition what's great clean conditions when real conditions the data is different

there's been a resurgence of interest in neural networks this is sort of a throwback

to the nineteen eighties that's been

kind of in some excitement because the restricted boltzmann machines and more computational power allows

a bill much larger

before networks and the research that i do specifically is related to

accordingly system that are very complex and somewhat targeted two major european languages

to languages i have fewer resources

so on this map of language families of the world

the parts and blue and red are the major european languages and the rest the

world most the stuff on the right

i have languages very different characteristics

in speaker recognition our work is really its robustness

because this work is related to

to security this work is funded by the army for the air force research labs

and darpa

and we're dealing with very difficult signals collected like

and

jet fighter cockpit and spire planes

but there's also another application which is speaker diarization

and we've done this in collaboration so far left building a real time online system

which given up

recording of a meeting say with multiple speakers

can

segment and label the regions which correspond to different speakers this helps you if you

want to build a transcript afterwards and have a labeled with different speakers

like the other work we do is related to this in

in terms of speech activity detection language identification and these dialogue systems we have speakers

who

speaker over each other they speak with each other and so we have to deal

this differently

a lot of it has to do with source speech understanding also rather than just

getting the words we try to understand what dynamics are what the intent is behind

the language and some of the more exciting work we're doing also involves looking at

actual

i data collected by scientists who

have access to

to brain measurements from small mammals or from patients

for example they do surgery on epilepsy patients and they stick electrodes quality in the

surgery

in the range

thus circles at icsi are we wanna have more robust signal processing for realistic conditions

we want to understand the scientific principles behind these engineering systems

and in general we support open collaborative research we work with

a lot of international universities and false domestically and i encourage you guys to check

up website and company open house today

policy of harmony

Speech Technology Research at ICSI

Video recordings