Speech Transcript - The Neural Mechanisms of Speech Production: From computational modeling to neural prosthesis

so it is my on privilege this morning to introduce a our keynote speaker frank on some

we see a computational and cognitive neural scientist specialising in speech and sensory motor control

is from the from the

department of speech language hearing sciences and biomedical engineering at boston university when i also obtained his phd

and is research combines theoretical modelling

with behaviour or and your imaging experiments to characterise the neural computation underlying speech and language so this is a

fascinating research field

which we thought would advantages the informal all in research

and so without further ado

like a you to help me welcome a corpus of frank and

morning thanks for showing up to thirty in the morning i'd like to start by thanking organisers for inviting to

this conference in such a beautiful location

and that also like to acknowledge my collaborators before it gets started the main collaborators on the work i'll talk

about today include

people from my lab at boston university including adjacent orville jonathan rumble or remember

supper gauche alfonso the other yet to cast an on my a pave elise a cop annapolis and or in

C V A

but in addition we collaborate a lot with outside labs and i'll be talking about a number of projects that

involve collaborations with people at mit including just a perk L melanie matthias and harlan lane

we've work we should you my a to create a speech synthesizer we use for much of our modelling work

and phillip kennedy and his colleagues at neural signals to work with us on our neural prosthesis project which i'll

talk about at the end of the lecture

the research program in our laboratory has the following goals

we are interested in understanding the brain first and foremost and

we're in particular interested in a lucid aiding the neural processes that underlie a normal speech learning and production

but we are also interested in looking at disorders and our goal is to provide a mechanistic model based account

and by model here i mean a neural network model that mimics the brain processes that are underlying speech and

using this model to on understand communication disorders problems that happen when part of the circuit is broken

and i'll talk a bit about communication disorders today but will focus on the last part of our work which

is developing technologies that eight individuals with severe communication disorders and i'll talk a bit about project involving a patient

with locked in syndrome who was

given a brain implant in order to try to restore some speech processing

the methods we use a include neural network modelling we use a very simple neural networks the neurons in our

models are simply actors that i have a nonlinear thresholding a of the output

we have other equations that define synaptic weights between the neurons

and we adjust these weights in a learning process is better described in a bit

we test the model using a number of different types of experiments we use motor and auditory cycle physics experiments

to look at speech look at the formant frequencies for example drinks different speech task

and we also use functional brain imaging including fmri but also i'm E G and E G to try to

verify the model or i help us improve the model by pointing out weaknesses in the model

and the final set of things we do a given that we're a computational neuroscience department we're interested in

producing a technologies also that are capable of helping people with communication disorders and i'll talk about one project involves

the development of neural prosthesis or a allowing people to speak to have problems with their that speech out

the studies we carry out are largely organised around one particular model which we call the diva model and this

is a neural network model of speech acquisition and production that we've developed over the past twenty years in our

lab

so in today's talk up first give you an overview of the diva model including a description of the process

of learning that allows the model to tune up so that it can produce speech sound

i'll talk a bit about how we extract simulated fmri activity from the model fmri is functional magnetic resonance imaging

and this is a technique for measuring blood flow in the brain and areas of the brain that are active

during that

have increased blood flow one so we can identify from fmri what parts of the brain are most active for

a task and differences in activities for different at task

condition

this allows us to test the model and i'll show an example of this where we use auditory perturbation of

speech in real time so that a speaker is saying word but they hear something slightly different

and we use this to test a particular aspect of the model which involves auditory feedback control of speech

and then model and the talk with a presentation of a project that involved

communication disorders in this case an extreme communication disorder in a patient with locked in syndrome was completely paralysed and

unable to move

and so we are working on prosody sees more people in this condition to help restore their ability to speak

so that they can communicate with people around them

this slide usable schematic of the diva model i will not be talking about the full model much i will

use a simplified schematic in a minute

a what i want to point out is that the different blocks in this diagram correspond to different brain regions

that in include different

what we call neural maps a neural map in our terminology is simply a set of neurons that represent a

particular type of information so and motor cortex for example down here in the vector motor cortex part of the

model we have articulator velocity imposition map

what these are neurons basically that command that positions of speech articulators in and articulatory synthesizer

i would just schema ties here so the output of our model is a set of commands to an articulatory

synthesizer this is just a piece of software which you provide a set of articulator positions as input this a

synthesiser we use the most is creative actions you my dad involve

seven articulatory degrees of freedom there's a job degree of freedom three talking degrees of freedom to live degrees of

freedom for opening in profusion

and a larynx height degree of freedom and together once you specify these positions of these articulators you can create

a vocal tract area function and you can use that area function to synthesise a and acoustic signal that would

be produced by vocal tract of bad shape

the models

productions are that back to model in the form of auditory since mada sensory information that go to maps

for auditory statements madison's restate located in auditory cortical areas in herschel drivers and the posterior superior temporal gyro

and this may have sensory cortical areas in the central some at a sensory cortex and supra marginal gyro

each of the large boxes here represents a map in this report cortex

and the smaller boxes represent represents a sub cortical components of the model most notably a base of anglia loop

for initiating speech output

and sarabelle or loop

which contribute to several aspects of production i'm going to focus on the cortical components of the model today for

clarity

and so i'll use this simplified version of the model which doesn't have all the components but it has all

the main processing levels that will need to go to today's talk show the highest level processing in the model

is what we call a speech sound at

and this is corresponds to cells in the left entropy motor cortex and inferior frontal gyros

in what is commonly called broke "'cause" area and then the promoter court are cortex immediately behind broke as area

in the model each one of these cells comes to represent a different speech sound and a speech sound in

the model can be either a phoneme or syllable or even a multi syllabic phrase the key thing here is

that it's something that's produce

very frequently so that there's a stored motor program for that speech sound and the canonical sort of speech sound

that we use

is the syllable so for the remainder the talk i'll talk mostly about yeah syllable production when referring to the

speech sound map

so cells in the speech sound map project

both to be primary motor cortex through what we call a feed-forward pathway at which is a set of learned

commands for producing these speech sounds and the activate associated cells in the motor cortex that command the right articulator

movement

but also be speech map sound map cells project to sensory areas

and what they do is they send

targets to those sensory area so if i want to produce a particular syllable such as bar

when i say bah i expect to hear certain things i expect certain formant frequencies that as a function of

time and that information is represented by synaptic projections from the speech sound map over to what we call an

auditory error my

where this target is compared to incoming auditory information

similarly when we produce a syllable we expected to feel a particular way when i say a for example i

expect my lips to touch for the B E and then to release

for the vowel this sort of information is represented in a smack sensory target that projects over to this matter

sensory cortical areas where it is compared to incoming smell sensory information

these targets are learned as is this feed forward command during learning process that'll describe briefly in just a minute

the arrows in the diagram represent synaptic projections from one type of representation to another

so you can think of these synaptic projections is basically transforming information from one sort of representation frame into another

representation frame and the main representations we focus on here are

phonetic representations in the speech sound map

motor representations in the articulator velocity and position maps

auditory representations in the auditory maps and finally estimate of sensory representation and smacked sensory map

the auditory dimensions we use in the model are typically corresponding to formant frequencies and all that talk about that

quite a bit as i go on in the talk

whereas this matter sensory targets correspond to things like

a fresher tactile information from the lips and the tong while you're speaking as well as muscle information about

lengths of muscles that give you a read of where you're articulators are in the vocal tract

okay so just to give you feel of what the model does so i'm gonna show the synthesizer the articulatory

synthesizer with just purely random movements now so this is

at what we do in the very early stages of learning in the model we randomly move the speech articulators

that creates auditory information it's mada sensory information

from the speech and we can associate auditory information and the smell sensory information with each other and with the

motor information that was used to produce the movements of speech so these movements don't sound anything like speech as

you'll see here

so this is just a randomly activating the seven dimensions of movie

so this is what the model does for the first forty five minutes we call this a babbling cycle take

about forty five minutes real time to go through this

and what the model does is it tunes up many of the projections between the different areas so here for

example in red are the projections that are turn tune during this random babbling cycle

so the key the key things being learned here are relationships between motor command

mada sensory feedback and auditory feedback

and in particular what the model needs to learn for producing sounds later is how to correct for sensory errors

and so what the model was learning largely is if i need to change my first formant frequency in an

upward direction for example because i'm too low

then i need to activate a particular set of motor commands and this will come a flow through a feedback

control mapped to the motor cortex

and will translate this auditory error into a motor corrective command

and similarly if i feel that my lips are not closing enough for be there will be a smack sensory

error representing that and that's ml sense rare will then be mapped into a corrective motor command in the motor

cortex

these arrows in red here are the transformations basically or synaptic weights their encoding these transformations and they're tuned up

during this babbling cycle

well

after the babbling cycle so to this point the model still has no sense of speech sounds this is correspond

very early babbling in infant

up to about six months of age before they start really learning in producing sounds from a particular language and

the next stage of the model handles the learning of speech sounds from a particular language and this is the

imitation process in the model

and what happens in the imitation process is we provide the model with an auditory target so we give it

a sound file of somebody producing a word or phrase

the formant frequencies are extracted and are used as the auditory target for the model

and the model then attempts to produce the sound by reading out whatever feed forward commands it might have if

it just heard the sound for the first time for the first time it will not have any feed forward

commands because it hasn't yet produce the sound it doesn't know what commands are necessary to produce the sound

and so in this case it's going to rely largely on auditory feedback control in order to produce the sound

because all it has an auditory target

the model attempts to produce the sound it makes some errors but it does some things correctly due to the

feedback control and it takes whatever commands are generated on the first attempt and uses them as the feed forward

command for the next attack

so the next attempt now has

a better feed forward command so there the there will be fewer errors will be a less of a correction

but again both the

a feed forward command and the correction added together that's the total output that's then

turned into the feed forward command for the next iteration and with each iteration the air gets smaller and smaller

due to the incorporation of these corrective motor commands into the feed forward command

just to give you an example of what that sounds like so here is an example that was presented to

the model a ford learning

the dog

this is a speaker saying good doggy and

here that are more

a dog

and what the model is going to now try to do is it's going to try to mimic this with

initially no feed forward command and just using auditory feedback control auditory feedback control system was tuned up during the

earlier babbling stage

and so it does a reasonable rendition but it's kind of sloppy

this is the second attempt it'll be significantly improve because the commands feedback commands from the first attempt to been

now moved into the feed forward command

and then by the sixth attempt the model has perfectly learn the sound meaning that it there are no errors

in its formant frequencies which is all i can hear from the sound pretty much and so it sounds like

this

this was the original

a dog

so what you can here is that the formant frequencies pretty much track the original formant frequencies in this case

they tracked imperfectly we looked at just the first three formant frequencies of the speech sound

when doing this and so in this case we would say the model has learned to produce this phrase now

so it would have a speech sound map sell devoted to that phrase if we activate that sell it reads

the phrase out now with no error too

well an important aspect of this model is that it's a neural network in the reason we chose the neural

network construction is so that we could

investigate brain function in more detail so what we've done is we've taken each of the neurons in the model

and we localise them in a standard brain space a stereo tactic space

that is a commonly used for analysing neuroimaging results from experiments such as fmri experiments and so here these orange

dots represent the different components of the model

a here for example this is the central focus in the brain where the motor cortex is in front of

the so central focus on the smell sensory cortex is behind it

and we have representations of the speech articulators in this region in both hemispheres

the auditory cortical areas include state cells and auditory error cells which was a novel prediction we made from the

model that these cells would reside somewhere in the higher level auditory cortical areas and i'll talk about testing that

prediction in you minute

we have some at a sensory cells in the us mass entry cortical areas of the super marginal drivers here

and these include are some have sensory error cells also crucial to

feedback control

and so forth so in general the representations in the model are bilateral meeting there are other neurons for

representing the lip are located on in both hemispheres but the highest level of the model the speech sound map

is left lateralized and the reason it's left lateralized is that

a large amount of data from the neurology literature suggests that

the left hemisphere is where we store our speech motor programs

in particular if there is damage to the left entropy motor cortex or adjoining brokers area here in the inferior

frontal drivers

speakers have what's referred to as a proxy of speech and this is an inability to read out the motor

programs for speech sound so they hear the sound they understand what the word is a and they

they try to say it but they just can't get the syllables to come out and this in our bus

because their motor programs represent about the speech sound map cells

are damaged due to the stroke if you have a stroke in the right hemisphere in the corresponding location there

is no upper active speech is largely spare

and in our view this is because the right hemisphere as all described about that are is more involved in

feedback control then feed forward control

an important insight is that once an adult speakers learn to produce the speech sounds of his or her language

and their speech articulators of largely stop growing

they don't need feedback control very often because their feed forward commands are already accurate

and if you for example listen to the speech of a somebody who became deaf as an adult for many

years many years there's speech remains largely intelligible a presumably because these motor programs are intact

and they by themselves are enough to produce the speech properly

in an adult however if we do something novel to the person such as

block their job why they try to try to speak or we perturbed auditory feedback of their speech then we

should reactivate the feedback control system by first activating sensory error cells that detect that they sensory feedback isn't what

it should be

and then motor correction takes place to the feedback control pathways of the model

okay so just to high like the

use of these locations what i'll show you now is a typical simulation where we have the model produce an

utterance in this case it saying how the

and what you'll see you'll hear first the production in our model the activities of the neurons correspond to electrical

activity in the brain

fmri actually measures blood flow in the brain and blood flow is a function of the electrical activity but it's

quite slow down relative to the activity peaks for five seconds after the speeches started and so what you'll see

the brain activity starting to build up in terms of blood flow over time after the utterances produced

so here the utterance was at the beginning but only later D C they hemodynamic response and this is actually

quite useful for us because we can do neuroimaging experiments

where people speak in silent

and then we collect data after they're done speaking at the peak of this blood flow so what we would

do is basically have them speak in silence and

at this point we would take scans with an fmri scanner is very loud which would interrupt the speech if

it was going on during your speech but in this case were able to scan after the speech is completed

and get a measure of what brain activity what brain regions where active and how active they were during speech

production

okay so that's an overview of the model next what i'll do is going to a little more detail about

the functioning of the feedback control system

and my main goal here is simply to give you i feel for the type of experiment we do we've

done many experiments of this sort to test and refine the model over the years

and the experimental talk about in this case is an experiment involving auditory perturbation of the speech signal well subject

is speaking in an M R I scan

so just to review then the model has the feed forward control system shown on the left ear and the

feedback control system shown on the right

and feedback control has both an auditory and isomap sensory component

so during production of speech when we activate this speech sound map cell to produce the speech sound

in the feedback control system we read out these targets to the sum at a sensory system into the auditory

system and those targets are compared to the incoming auditoriums mada sensory information

the targets take the form of regions so there's an acceptable region of F one that they can be in

if they're anywhere within this region there okay but if they go outside of the region and ever cell is

activated and that will drive the

oh and by driving articulator movements that will move it back into the appropriate target region

if we have an error arising in one of these maps and in particular we're gonna be focusing on the

auditory error map

what happens next in the models that the sarah gets transform

through a feedback control map in the right up we motor cortex

and then projected to the motor cortex in the form of a corrected motor command and so what the model

is essentially learned is how to take auditory errors and correct them with motor movement

in terms of mathematics this corresponds to a pseudo inverse of that you colby in matrix that relates the articulatory

and auditory spaces

and this can be learned during babbling simply by moving the articulators around and seeing what changes in some at

a sensory and auditory state take place

the fact that we have this feedback control map in the right entropy motor cortex now when the model that

was partially the result of the experiment that i'll be talking about this was not originally in the model originally

these projections what's of the primary motor cortex

i'll show the experimental result the cost us to change that component of the model

okay

so i based on this feedback control system we can make some explicit predictions about brain activities during speech

and in particular we made some predictions about what would happen if we shifted your first formant frequency during speech

so that when we set it back to you over earphones in fifty milliseconds you hear something slightly different than

what you're actually producing

well according to our model the should "'cause" activity of cells and auditory error map which we have localised to

posterior superior temporal drivers and that the adjoining plan and temporal these regions in these still be in fig

on the temporal lobe

so we should see increased activity there if we perturbed the speech

and also we should see some motor corrective activity because according to our model the feedback control system will kick

in when it hears this error even during that particular and

and it will try to correct if the utterance is long enough it will try to correct the error that

is her

now keep in mind that auditory feedback takes time to get back up to the brain so that i'm from

motor cortical activity tomb movement and sound output to get hearing that sound output in project

ejecting about up to your auditory cortex is somewhere in the neighbourhood of a hundred two hundred fifty milliseconds

and so we should see a corrective command kicking in not at the instant that the perturbation start

what about a hundred or one twenty five milliseconds later because that's how long it takes to process this auditory

feedback

so what we did was we developed a digital signal processing system that allowed us to shift the first formant

frequency in real-time meaning that a subject hears the sound with a sixty millisecond delay which is pretty much unnoticeable

to the subject

even unperturbed speech has that same sixty millisecond delay so they're always hearing

a slightly delayed version other speech over headphones we play a rather loud over the headphones and they speak quietly

as a result of this and the reason we do that as we want to minimize things like bone conduction

of the actual speech

and make them focus on the auditory feedback that we're providing them which is the perturbed auditory feedback

and what we do in particular is we take the first formant frequency and in one fourth of the utterances

we will perturbed it either up or down so three out of every four utterances are unperturbed

one in four is perturbed well excuse me one in eight is perturbed up and one in eight is perturbed

down so

they get these perturbations randomly distributed they can't predict them because first of all the direction changes all the time

and secondly because many of the productions are not prepare

and oh what we did well here's what this sounds like so the people were producing vowels

that the bout and so the words that they would produce work are words like that and pack and

pads

and here's an example of on shifted speech before the perturbation

and here is a case where we've shifted F one upward and upward shift about one corresponds to a more

open mouth and that should make the pet

a vowel sound a little bit more like an ad

and so if you hear the perturbed version of that production

it sounds more like that then yeah in this case so that original

sorry

so it's consciously noticeable to you now when i play to you like this but most subjects don't notice what's

going on during the experiment we asked them afterwards that they notice anything sometimes will say

occasionally my speech sound a little odd but usually they didn't really notice that much of anything going on with

their speech and yeah their brains are definitely picking up this difference and we found that without them or i

we also look at their formant frequencies so what i'm showing here is

a normalized for F one

and what normalize means in this case is that the F one in a baseline on perturbed utterance

is what we expect to see that will take the F one in a given utterance we'll compared to that

baseline

it's exactly the same then we'll have a value of one so if they're producing the exact same thing is

they do in the baseline they would stay flat on this value of one

on the other hand if they're increasing their F one then we'll see the normalized F one go about one

in if they're decreasing F one will see go below one

the

gray shaded areas here are the competence in ninety five percent confidence intervals of the subjects productions in the experiment

and what we see for the down shift is that over time the subjects increase their F one to try

to correct for the ad decrease of F one that we

given them with the perturbation

and in the case where we up shift their speech they decrease F one as shown by this confidence interval

here

the split between the two occurs right about where we expect which is somewhere around a hundred two hundred and

fifty milliseconds after the first sound comes out that a here with the perturbation

the solid lines here are the results of simulations of the diva model producing the same speech sounds under perturbed

conditions

and so the black dashed line here shows the models productions in the option if condition we see weights about

a hundred twenty five when this case actually it only weights about eighty milliseconds are delay loop which short here

and then it starts to compensate for the utterance

similarly in the down shift case it goes for about eighty milliseconds until it starts to your the error and

then it compensates in an upward direction

and we can see that the models productions fall in a confidence intervals of the subjects production so the model

but i produces a good fit of the behavioural data

but we also took a look at the neuroimaging data and on the bottom what i'm showing is the results

of a simulation that we're and before be study where we generated predictions of fmri activity

when we compare shifted speech to non shifted speech as i mentioned one we shift the speech that should uttering

these auditory error cells on and we've localise them to these posterior areas of the temporal gyros here

when those error cells become active they should lead to a motor correction and these are shown by activities in

the motor cortex here in the model simulation

now we also see a little bit stale valour activity here in the model but i'll skip that for two

days

talk

here on the top is what we actually got from our experimental results for the ship minus no ship contrast

the auditory hair cells were pretty much where we expected them so first of all there are auditory ourselves there

are cells in your brain that detect the difference between what you're saying and what you expect it to sound

like even as an adult

these auditory errors of become active at but we noticed is that the motor corrective activity we saw was actually

right lateralized in it was pretty motor it wasn't bilateral and primary motor as we predicted it's farther forward in

the brain it's in a more pretty motor cortical real area

and it's right lateralized so one of the things we learned from this experiment was that auditory feedback control appears

to be right lateralized in the frontal cortex

and so we modify the model to have an auditory feedback that

are sorry a feedback control map in the right entropy motor cortex area correspond with this region here

we actually ran a parallel experiment where we perturbed speech with the balloon in the mouth so we actually

we build a machine that

a perturbed your job while you were speaking at so you would be saying something like a P and during

the how this balloon would blowup very rapidly it was a little was actually the finger of a lot of

that would follow up to about a centimetre and half and would block your job from closing so that when

you were

done with that i'm getting ready to say that consonant and the final vowel key then the job was blocked

the job could move as much subjects compensate again

and we saw in that experiment activity in their smell sense recordable areas corresponding to this matter sensory error map

but we also saw a right lateralized motor cortical activity and so based on these two experiment

we modify the model to include a right lateralized feedback control map that we did not have in the original

model

okay so

the other thing we can do is we can look at connectivity in brain activities using techniques such as structural

equation modelling a very briefly in a structural equation modelling analysis what we would do is we would use a

we define model of connectivity in the brain and then we would go and look at the fmri data and

see how much of the covariance matrix of the fmri data we had a can be captured by this model

if we optimize the connections and so what as cm does is it

reduces connection strings that are produced in that modelling gives you goodness of fit data

and in addition to being able to the data very well meaning that are cut connections in the model are

in the right place

we also noted a an increase in the what what's called effective connectivity so an increase the strength of the

effect of these

auditory areas on the motor areas in the right hemisphere when the speech was perturbed so the interpretation of that

is when i picture of your speech but with an auditory perturbation like this

the error cells are active that drives activity in the right that for motor cortex and so we have an

increase affect on the motor cortex from the auditory areas in this case

and so this is further support for the structure in the model and the feedback control system that we just

the score

okay so that's one example of an experimental test we've done a very large number of a test of this

sort

we've tested predictions of can "'em" addicts in the model so we look we work with people who measure articulator

movements using

electromagnetic articulatory this is a technique where you basically glue receiver coils on the talking in the lips and the

job and you can measure the very accurately the position of the articulators of these points on the articulators

in the midsagittal plane and from this you can estimate quite a accurately in time the positions of speech articulators

and compare them to

productions that use the in the model we've done a lot of work looking at for example phonetic context effects

in our production which i'll come back to later R is a phoneme in english that is produced with a

very wide range of articulatory variability

the acoustic cues for are very stable this been shown by people such as voice in S P wilson

and what you see in the if you produce movements with the model is that

the model will also produce very different articulations for are in different phonetic contexts and this has to do with

the fact that it's starting from dish different initial positions and it's simply going to the five closest point to

the acoustic target

that it can get to and that point will be in different parts of the articulator space depending on where

you start

we looked at a large number of experiments on other types a particular articulatory movements both in

normal hearing and hearing impaired individuals we look at what happens when you put a bite blocked in we look

at what happens when you noise mask these speakers and we've also looked at what happens over time for in

speech of people with cochlear implants for example so

in the case of a cochlear implant recipient that was an adult would already learn to speak

when they first

receive the cochlear implant they hear a sounds that are not the same as the sounds that they used here

so their auditory targets don't match

what's coming in from the cochlear implant and it actually impairs their speech for a little while a before about

a month or so before they start to improve their speech and by a year it show up very strong

improvements in the speech

and according to the model this is occurring because they have to retune their auditory feedback control system to deal

with the new feedback and only when that auditory feedback control system is tunic and they start to retune the

movements to produce more distinct speech data

a we've also done a number of neuroimaging experiments for example we predicted that you left entropy motor cortex

involves syllabic motor programs

and we use the technique called repetition suppression in fmri where you present us to really that change and some

dimensions but don't change in other dimensions

and with this technique you can find out what is it about the seemingly that a particular brain region cares

about and using this technique we were able to show that in fact the only region in the brain that

we found that had

a syllabic sort of representation was the left entropy motor cortex where we believe these syllabic motor programs are located

a highlighting the fact that the syllable is a particularly important entity for motor control

and this we believe is because our syllables are very high we a practise and well to the motor programs

that we can read out we don't have to produce the individual phonemes we read out the whole syllable as

a motor program that we've stored in memory

finally we've been able fourteen would lead to even at test the models predictions electra physiologically in this was in

a case

of a patient with locked in syndrome that'll state speak about in a bit and i'll talk about exactly what

we were able to verify using electro physiology in this case actual recording from neurons in the court

okay so

the last part might talk now will start to focus on using the model to investigate communication disorders

and we've done a number of studies of this sort we as i mentioned look that speech in normal hearing

and hearing impaired populations

we are now doing quite a bit of work on stuttering which is a very common speech disorder that affects

about one percent of the population stuttering is a very complicated disorder it's been known

since the beginning of time basically every culture seems to have people who stutter within them within that culture people

been trying to cure stuttering for ever and we've been unable to do so and the brains of people who

stutter are actually

really similar to bring the people who don't stutter and unless you look very closely and if you start looking

very closely you start to see things like white matter differences

and grey matter thickness differences in the brain and these tend to be localised around the base of anglia alamo

cortical loop and so are you of stuttering is that several different problems can occur in this loop very difference

that people would who stutter

can have different locations of damage or of an anomaly in their basic english alma cortical loop and this can

lead all of these can lead to stuttering and the complexity of this order is partly because

it's a system level disorder where different parts of the system can cause problems it's not always the same part

of the system that's a problematic in different people who stutter and so one of the important areas of research

for stuttering is

computational modelling of this loop to get a much better understanding of what's going on and how these different problems

can lead to similar sorts of behaviour

we looked at we're looking at what's pass moderate dysphonia which is a vocal fold problem similar to just only

it's a

a problem where typically the vocal folds are too tense during speech

again appears to be basal gangly a loop related

a proxy of speech which involves left hemisphere frontal damage a child that a proxy of speech which is actually

a different disorder from acquired a proxy a speech this tends to involve more widespread

kind of lesser damage but in a more widespread a portion of the brain

and so forth and the project all talk most about here will be a project involving neural prosthesis for locked

in syndrome and this is a project that we're doing a are we done with bill kennedy from neural signals

a locality developed technology for implanting brains of people with locked in syndrome and we help them build a prosthesis

from that technology

so typically are studies where we're looking at disorders involve some sort of damage version of the model it's a

neural network so we can go in and we can mess up white matter projections which are these synaptic projections

we can mess up

neurons in a particular area we can even adjust things light levels of neurotransmitters some studies suggest that there may

be an excess of double mean and some people who stutter

well we have added up i mean receptors or base of anglia loop so we can go in and we

can start changing double mean levels and seeing how that changes but the behaviour of the model and also the

brain activities of the model

and what we're doing now is running a number of imaging studies involving people who stutter or we've made predictions

based on several possible

lead to damage in the brain that may result in stutter stuttering and we're testing those predictions both by seeing

if the model is capable of producing stuttering behaviour but also seeing if the brain activities

match up with what we see in people who stutter there are many different ways to invoke stuttering in the

model but each way causes a different pattern of brain activity to occur

so by having both the behavioural results and the neuroimaging or results we can do a much a more detailed

treatment of what exactly is going on in this population

the example i'm gonna spend the rest of the talk describing is a bit different where in this case the

speech motor system of the of the patient was

intact

but patient was suffering from locked in syndrome due to a brain stem stroke

a locked in syndrome is a syndrome where

patients have intact cognition and sensation but they're completely unable to perform voluntary movement so it's a case of being

almost kind of

buried in your own body alive and the patients sometimes have eye movements patient we worked with could vary slowly

move his eyes up and down his eyelids actually to answer yes no questions

this was the only form of communication here at

and so prior to our involvement in the project he was implanted as part of a project developing technologies for

locked in patients to control computers or external devices

these technologies are referred to by several different names brain computer interface or brain machine interface or neural prosthesis

and in this case we were focusing on a neural prosthesis for speech restoration

the locked in syndrome is typically caused by either brain stem stroke and eventual ponce or more commonly people become

locked in through neural degenerative diseases such as a last which are attacked the motor system

people who suffer from a less

go through a stage for the later stages of the disease wait where they are basically locked in there unable

to move or speak

but still fully conscious and with sensation

well the electrode that was developed by are calling filled kennedy is schema ties here and here's a photograph of

it it's a tiny glass cone that is open on both bands the cone is about a millimetre long they're

three gold wires inside the cone

there coded with a and insulator except at the very end where the wires cut off and that acts as

a recording site so there are three recording sites within the cone one is used as a reference and the

other two are used as recording channels

and these wires are this electrode is inserted into the stripper cortex here i've got a schematic of the cortex

which is good consists of six layers of cell types

the goal is to get this near layer five but the cortex

where the output neurons are these are the motor neurons that project in the in the motor cortex these are

neurons a project for the periphery to "'cause" movement

but it doesn't matter too much where you go because the cone is build with i nerve growth factor and

what happens is

over a month or two X sounds actually grow into this conan lock it into place that's very important because

it stops movement if you have movement of a an electrode in the brain

use get problems such as cleo says which is scar tissue building up around the electrode and stopping a the

electron from picking up signals

in this case the wires are actually inside a protected class cone and nobody else's builds up inside the cone

so it's a permanent electrode you can implant this electrode and record form from it for many years and if

when we did the project all talk about the electorate had been in the subjects brain for over three and

a half years

the electrode location was chosen in this case by having subject attempt to produce speech well in a and fmri

scanner

and what we i noticed was that the brain activity is a relatively normal looks like brain activity of

of a neurological a normal person trying to produce speech and in particular we there's a blob of activity on

the three central drivers which is the location of the motor cortex

in the region where we expect for speech so i'm going to refer to this region of speech motor cortex

this is where the electrode was implanted so this is an fmri S can perform before implantation here is actually

a C T scan afterwords where you can see in the same brain area the wires of the electrode coming

out

this is bottom picture is a three D A C T scan showing this call a where you can see

the training out to me where the electorate was inserted you can see the wires coming out and the wires

go into a package of electronics that is located under the skin

and these electronics amplify the signal and then send it is radio signals across the scout

we attach intent as basically that just antenna coils to the scout so the subject has a normal looking had

yes hair on his head there's nothing sticking out of his head

when he comes into the lab we attach these antenna to the scout eight we tune them to just the

right frequencies and they pick up the two signals that we are generating from are electrode

the signals are then routed to a recording system and then to a computer where we can operate on those

signals

in real time

well

kennedy had implanted the patient two years before we are several years before we got involved in the project

but they were having trouble decoding the signals and part of the problem is

that if you look in motor cortex there's nothing obvious that corresponds to a word or for that syllable or

phoneme you don't see neurons turn on when the subject produces a particular syllable and then shut off twenty the

subjects done

a U C instead that all the neurons are just subtly changing their activity over time so there it appears

that there's some sort of continuous representation here in the motor cortex there's not a representation of just words and

phonemes at least at the motor level

a cantonese a group contacted us because we had a model of what these brain areas are doing and so

we collaborated on decoding these signals and routing them to a speech synthesizer so the subject could actually control some

speech output

well

the tricky question here is what is the neural code for speech in the motor cortex

and the problem of course is that there are no prior studies people don't go into a human motor cortex

and record normally

and monkeys don't speak you know whether animals speak so we don't have any single cell data about what's going

on in the motor cortex during speech we have data from our movements and we use the insights from this

data

yeah but we are also used insights from what we saw in human speech movements to determine what where the

variables that these people were controlling what was the motor system caring about

mostly to care about muscle positions or data care about the sound signal

and there is some available data from simulation studies the motor cortex these come from

the work by up and field who work with epilepsy patients who were having surgeries to remove portions of the

cortex that were

causing a epileptic fits

before they did the removal what they would do is actually stimulate in the court ecstasy out what

parts of the brain we're doing why any particular what they wanted to do was avoid parts of the brain

involved in speech and they mapped out along the motor cortex areas that "'cause" movements of the speech articulators for

example and other areas that caused interruptions of speech and so for

and these studies were informative and we help we use them to help us determine where to localise some of

the neurons in the model but they don't really tell you about what kind of representation is being used by

the neurons when you stimulate a portion of cortex are stimulating hundreds of neurons minimally they were using something like

two bolts for stimulation the maximum activity even ron is fifty five mill of also the stimulation signal was dramatically

bigger than any natural signal

and it activates a large area of cortex and so you see a gross

where lee form the movement coming out and speech movements tended to be things like that our price of the

subject might say that

something like this adjust the of movement it's not really a speech sound they don't produce any words or anything

like that

and from these sorts of studies it's next to impossible to determine what sort of representation is going on in

the motor cortex

a however we do have our model which does provide the first explicit characterisation of what these response properties should

be of speech motor cortical cells we have actual speech motor cortical cells in the model they are tuned to

particular things

and so what we did was we use the model to guide are search for information in this part of

the brain

and i want to point out that the characterisation provided by the model was something that we spent twenty years

refining so we ran a large number of experiments testing different possibilities about how speech was control

and we ended up with a particular format in the model and that's no coincidence that's because we spent a

lot of time looking at that in here is the result of one such study which a highlights the fact

that in motor planning

sound appears to be more important than where you're talking is actually located and this is a study of the

phoneme are that i mentioned before just to describe what you're going to see here so that the each of

these lines you see represents a tongue shape

and they're to chunk shapes in each panel there's a dashed line

so this is the tip of the time this is the centre the tongue in this back of the tongue

where actually measuring the positions of these transducers that are located on the time using a thirty kilometre E

and the dashed lines show the tongue shape that occurs seventy five milliseconds before

B centre of the R which happens to be they minimum of the F three trajectories

and the dark bold lines show the tongue shape at the center ready are a or at that have three

minimum so in this case you can see the speaker used

and upward movement other tongue tip to produce the R

in this panel

so what we have over here in our two separate subjects where we have measurements from the subject on the

top row and then productions of the model represented in the bottom row and the model was actually using speaker-specific

vocal tract in this case so

what we did was we took the subject we are collected a number of them are i stands while they

were producing different phonemes

we did principal components analysis to pull out their main movement degrees of freedom we had their acoustic signals and

so we built a synthesiser that had their vocal tract shape and produce their formant frequencies

then we had the diva model learned to control their vocal tract so we put this vocal tract synthesiser in

place of the my the synthesizer we battled the vocal tract around had it learn at to produce hours and

then we went back and had it

produce the estimate lee in the study and in this case the people producing utterances

walk around

what drum and one row of so B R was either preceded by a sound at the orgy

what we see is that the subject produces very different movements in these three cases so in a context the

subject uses it upward movement of the tongue tip like we see over here

but in the D context the subject actually move their tongue backwards to produce the R

in the G context they move their time downward to produce the are so they're using three completely different gestures

are articulatory movements to produce the R and yet the producing pretty much the same after each race the F

three traces are very similar in these cases

if we take the model and we have it produce R's with the speaker-specific vocal tract we see that the

model because it cares about the acoustic signal primarily it's trying to get these F three target

and the model also uses different movements in the different context an impact the movements reflect the movements of the

speaker so here the model uses an upward movement of the tongue tip here the model uses the backward movement

of the time and here the model uses a downward movement of the time to produce are so

what we see is that with a very simple model that's just going to be appropriate position and formant frequency

space we can capture this complicated variability in the articulator movements

of the actual speaker

a another thing to note here is this is the second speaker again the model replicates the movements and the

model also capture speaker-specific differences here in this case the speaker use the small upward tongue tip movement to produce

the R

up at the speaker for reasons having to do with the morphology of their vocal tract had to do a

much bigger movement of the tongue tip to produce the are in a contact

and again the model produces a bigger movement in this speakers case than in the speaker space so

this provides a pretty solid data that speakers are really concentrating on

the formant frequency trajectories of their speech output more so than where the individual articulators were located

and so we made production and that we should see formant frequency representations in the speech motor cortical area if

we're able to look at what's going on during speech

a the slide i'm sure everybody here follows this appears actually the formant frequency traipse traces for good doggy this

is what i'd use of the target for the

simulations i showed you earlier and down here i show the first two formant frequencies what's called the formant frame

plane and the important point here is that if we can move if we can just change F one and

F two we can produce pretty much all of the vowels

of the language because they are differentiated by their first two formant frequencies and so formant frequency space provides a

very low dimensional continuous space for the planning of movements

and that's crucial for the development of the brain computer interface

okay and why is a crucial well

there have been our number brain computer interfaces that involve implants and the hand area

of the motor cortex

and what they do usually is they decode cursor position on the screen from neural activities in the hand area

and people learn to control movement of a cursor by who are activating their neurons in their hand motor cortex

now they when they build these interfaces they don't try to decode all of the joint angles of the arm

and then determine where the cursor would be based on where the mouse would be instead they go directly to

the output space in this case the two dimensional cursor space

in the reason they do that is we're dealing with a very small number of neurons in these sorts of

studies relative to the entire motor system there are hundreds of millions of neurons involved in your motor system

and in the best case you might get a hundred neurons in the brain computer interface we were actually getting

far fewer from that then that we had a very old in plant that only had two electrode wire

so we were getting somewhere we had less than ten neurons maybe is a few as two or three neurons

we could pull out more signals than that but they weren't signal nor on activities

well if we tried to pull out a high dimensional representation of the arm configuration from a small number of

neurons we can have a tremendous amount of error and this is why they don't do that instead they try

to pull out a very low dimensional thing which is this two D cursor position

well we're doing the analogous thing here instead of trying to pull out all of the articulator positions that determine

the shape of the vocal tract we're simply going to the output space which is the formant frequency space which

for the for about production can be as simple as a two-dimensional signal

okay so what we're doing is basically decoding and intended sound position in this two D formant frequency space

that's generated from motor cortical cells a but is a much lower dimensional thing then the entire vocal tract shape

well the first thing we need to do is verify that this formant frequency information was actually in this part

of the brain and the way we did this was we had that subject try to imitate a minute long

vowel sequence that was something like

yeah year who this lasted a minute and they were told the subject was told to do this in synchrony

with the stimulus

this is crucial because we don't know otherwise when he's trying to speak up because no speech comes out and

so what we do is we record the neural activities during this minute long attempted utterance

and then we try to map them into the formant frequencies that the subject was trying to imitate so the

square wave here right which is kind of the C is that the actual in this case actually have to

going up and down and here's the actual F one going up and down for the different vowels

and the solid are not bold squiggly line here is the decoded signal a it's not great but it's actually

highly statistically significant we did cross validated training and testing and we had a very highly significant

a representation of the formant frequencies our values one point six nine four F one point six eight for F

two and so this verifies that there is indeed formant frequency information in your primary motor cortex

and so the next step was simply to use this information to try to produce speech output

just as a review for most of you formant synthesis of speech has been around for a long time goner

font for example in nineteen fifty three use this very large piece of electronic equipment here

with this style was on a two-dimensional pad and what he did was he would be stylus around on the

pad and the location of the stylus was i location in the F one F two space

so is basically moving around in the formant plane and just by moving this cursor around in this two dimensional

space is able to produce

intelligible speech so here's an example

so the good news here is that with just two dimensions some degree of speech output can be produced

consonants are very difficult i'll get back to that at the end but certainly bows are possible with this sort

of synthesis

so what we did was we took the system and we so here is a schematic are electrode in the

speech motor cortex

is recorded by this are picked up and amplified and then sent across the sky now

we record the signals and we then run them through a neural decoder and what the neural decoder does is

it predicts what formant frequencies are being attempted based on the activities so it's trained up on one of these

one minute long sequences

and once you train it up then it can take a set of a neural activities and translate that into

a predicted first and second formant frequency which we can then send over a speech synthesiser to the subject

the delay from the brain activity to the sound output was fifty milliseconds in our system and this is approximately

the same delay as

your motor cortical activity to your sound output and this is crucial because if the subject is going to be

able to learn to use this synthesiser you need to have an actual feedback delay if you delay speech feedback

by a hundred milliseconds in a normal speaker

they start to become highly disfluent they go through some stuttering like behaviour they'll start talking it's very disruptive so

it's important that this thing at operates very quickly

and produces this feedback in a natural time frame

now what i'm gonna show is the subsets performance with the speech bci so we had "'em" produce a about

tasks so subject would start out at the centre about

then would it is

ask on each trial was to go to about that we told him to go to so in the video

well play you'll hear the computer say

listen

and it'll say something like yea i

and it'll say speak and then he supposed to say E with the synthesiser so you'll hear his sound output

as produced by the synthesizer as the attempts to produce the bow that was being that presented in you'll see

that the target values in green here

the cursor you'll see is the subjects location in the formant frequency space

a most of the trials we did not provide visual feedback the subject didn't need visual feedback and we saw

no increase in performance from visual feedback E instead use the auditory feedback that we produced from the synthesiser to

produce a better and better speech

or what speech sounds at least and so here are five examples five consecutive productions in a block

we speak

a so that's a directivity very quickly want to the target

be your egos awfully here's the error any kind of steers the back into the target five

another directive is next trial isn't here you'll seems to me like yeah the before the timeout

but nobody around here

so straight to the target so what we saw were

to sorta behaviours often times it was straight to the target but other times you would go off a little

bit and then you would see him one see her the feedback going off you would see "'em"

and presumably in his head he's trying to change the shape of this time we don't to try to you

know

try to actually say the sound so he's trying to reshape where that sound is going and so you'll see

"'em" kind of steered toward the target in those cases so what's

happening in these slides is or these panels is i'm showing the error a rate here course they hit rate

as a function of block so any given session we would have

a four blocks of trials there were about five productions to ten productions per block so during the course of

a session he would produce anywhere between about

ten that's what course ten to twenty repetitions but about actually five to ten repetitions of each about

and when he first starts his hit rate is just below fifty percent that's above chance but it's not great

but we see with practise it gets better with each block and by the end he's improved a set rate

to over seventy per se

on average in a in fact in the later sessions he was able to get up to about ninety percent

hit rate if we look at the end point error as a function of block this is how far away

he was from the target and formant space i when the trial and that

so if it was a success it would be zero if it's not a success and there's an error we

see that this pretty much linearly drops off over the course of a forty five minute session

and this movement i'm also improves a little bit

this slide shows what happens over many sessions so these are twenty five sessions

one thing to note here is and this is the endpoint error we're looking at one thing to note is

that there's a lot of variability from day to day i'll be happy to talk about that we had to

train up a new decoder everyday because we weren't sure we had the same neurons everyday

so some days the decoder work very well like here in other days it didn't work so well what we

saw on average over the sessions is that the subject got better and better at learning to use the synthesisers

meaning that

even though he was given a brand new synthesiser on the twenty that session it didn't take "'em" nearly as

long to get good it using that a synthesiser

well to summarise them for the speech brain computer interface here

there are several mount novel aspects of this interface that was the first real time speech brain computer interface so

this is the first attempt to actually decode ongoing speech as opposed to pulling out words or moving a cursor

to choose words on the screen

it was the first real time control using wireless system a wireless is very important for this because

if you have a connector coming out of your head which is the case for some patients you get the

sort of surgery

that connector actually can have an infection build up over build up around it and this is a constant problem

for people with this sort of system wireless systems are the weight of the future

we were able to do a wireless system because we only had two channels of information a current systems have

usually hundred channels or more of information and the wireless technology is still catching up so these hundred channel systems

typically still have

connectors coming out of the head

and finally are project was the first real time control within a lecture that in been implanted for this long

the selected within for over three years this highlights the utility of the sort of electrode we you

or permanent implantation the speech that came out was extremely rudimentary as you saw but keep in mind that where

we have two tiny wires of information coming out of the brain

pulling out information from at ten neurons max

out of the hundreds of millions of neurons involved in the system and yet the subject was still able to

learn to use the system and improve the speech over time their number things we're working on now to improve

this

at most notably we're working on improving synthesis that we are developing two-dimensional synthesisers that can produce both vowels and

consonants and that sound much more natural than a straight formant synthesiser

a number of groups are working on smaller electronics and more electrodes

the state-of-the-art now as i mentioned is probably ten times the information that we were able to get out of

this brain computer interface so we would expect a dramatic improvement

in a performance with the modern system

and we're spending a lot of time working on decoding techniques that are i'm improved as well the initial decoder

that you give these subjects is a very rough it just gets i mean the ballpark and that's because there's

not nearly enough information

to an upper decoder properly from a training a sample and so what people are working on include people in

our lab are decoders that actually tune while the subject is trying to use the prosthesis of not only is

the subjects motor system adapting to use the prosthesis

but the prosthesis itself is helping that adaptation by a cutting error down on each production very slowly over time

to help the system state to overtime

and with that i'd like to

again thank at my collaborators and also thank the N I D C D and N S F four funds

that funded this research

okay so we have time for two questions

morgan

yeah

really interesting to

yeah it is pretty strong emphasis and formants room this numbers and speeches in that

when you have the playback of doggy the

that's of great so right is there other work that you're doing with stop consonants are figuring out a way

to put things like that in your right eye so i largely focused on performance for simplicity during the talk

the smell sensory feedback control system in the model actually does

a lot of the work for stop consonants so for example for a B we have a target for the

closure itself or so there is in addition to the formant representation we have tactile dimensions that supplement the targets

mass sensory feedback i is i in our model secondary auditory feedback largely because during development we get auditory targets

in their entirety from people around us

but we don't we can't

tell what's going on in their mouth so early development we believe is largely driven by auditory dimensions

this may have sensory system learns what goes on when you properly produce the sound and then it later contributes

to the production once you build up this madison street target

no one other quick note is another simplification here is that

at frequencies a strictly speaking are very different for women and children and men and so we when we are

using a different voices we use the normalized formant frequency space where we actually use ratios of the formant frequencies

to a

to help accommodate

the question

right well as i think that i think you but i think that scare yeah

i mean and if for any debate between and you can literally tag it's very

i just one can understand where you're coming from one to because we really and in working with

people have

and look very similar to a data you know we here

there you can delete it look at right

then you can get that the articulatory information at that it just use where actually made perfect memory for example

yeah okay well that's so that so in my are you

the

gestural score is more or less equivalent swore ford motor command and that feed forward command is tuned up to

hit auditory target so

we do have a job in a factor gestural score in the form of a feed-forward motor command and so

if you produce speech very rapidly that whole fee for motor command will get read out but it won't necessarily

make the right sounds if you push to the limit

so for example in the perfect memory case the model would you be you know would do the gesture for

the see if it's producing a very rapidly

it wouldn't that he may not come out but it would presumably here a slight error in try to correct

for that a little bit in later production but

to make a long story short my view is that the gestural score which i think does exist is something

that is equivalent to a feed-forward motor man

and

the people model does it show how huge amount that gestural score how you keep it to do over time

and

things like that okay

yeah

and then

thanks a really amusing talk and

seems to me that people did review but someone who sensory feedback doesn't really tell you about what those words

mean

all those mean people through all those sort of visual track on any of the kind of feedback and speech

production

it absolutely does but we do not have anything like that in the model so we purposely focused on motor

control speech is a motor control problem and

the words are meaningless in the to the model that of course a simplification

track ability for us to be able to study a system that we could actually characterise computationally out were working

well or a higher level connecting this model which is kind of a low level motor control model if you

will with higher level models of

sequencing of syllables and we're starting to think about how these

sequencing areas of the brain interact with areas that represent meeting and so middle frontal drivers for example is

very commonly associated with some word meaning and temporal oh these areas you know i

but the sequencing system a but we have not yet model that so this kind of

in our view we're gonna working our way up from the bottom where the bottom is motor control and the

top as language

we're not that far up there yeah

so it was really inspiring talk

i'm

kind of wondering that thinking about

the beginning of your talk and the babbling in imitation

face

one of the things is pretty

apparent from that is that you're starting out effectively with your model with adult vocal tract

and they're listening to external stimuli which are also kind of matched so right so what is your take on

the i'm i work with very back then a lot on thinking about things like normalisation i'm kinda curious what

your take on online

how things change as the as you know you get a six month old and their vocal tract rows and

stuff like that how do you see that fitting into model well so

i think that highlights the fact that

formant strictly are not the representation that's used for this transformation from at all you know when the channel here's

an adult sample they're hearing the big muscular normalized version of it that their vocal tract can imitate because

frequencies themselves they can't imitate but things like so we've looked at a number representations that involve things like wall

of the ratio of the formants and so forth

and those improve its abilities and they work

well in some cases but we haven't found that and like what is that

that representation

i where i think it is in the brain i think in playing time prowling the higher order auditory areas

that's probably where you're representing speech in this

are independent manner

but what exactly those dimensions are i can't say for sure it's something some normalized formant representation but the ones

we tried we tried miller's space for example

eighty nine paper a they're not for satisfactory they do a lot of the normalisation from but they don't work

that well for controlling movements

oh i mean one of the things that i was thinking about is that keith johnson for example really

feels like well this normalisation is actually learn phenomenon so it's easy feels like you have some of the machinery

there instead of i mean deposit

that it's

you know it is some operation

that you could ever imagine

having an adaptive

system that actually you know what that normalisation

it's possible i there's just so examples like parents being able to see and so forth so i think that

there's something about the mammalian auditory system that pulls out that the dimensions that it pulls out naturally are

largely speaker-independent already that the i mean it pulls out all kinds of information but for speech system i think

it you know that's what's using but

i wish i could deviate more satisfactory answer

nor did you have a great for a while

question from cell and i using that when it

is it just the first three data using we have first three are first two depending so for the prosthesis

project we just use the first two for the simulations i showed for the rest of the people do the

simulations those first three okay "'cause" we just in recent work for example that are

and we and then you can tell information about which particular a term shape which is if you look at

high of one right and when that is ideal

it would be great if you pay include create something like any idea do not know what the other hand

i was just gonna say we can look at that so by controlling F one through F three we can

see what F

for an F five would be for four different are configurations we haven't looked at that yeah but

my view is that is that they're perceptually not very important or even salient so of course the physics will

make times you know the form a slightly different if your tongue shapes are

are different especially for the higher for men

but i think that the speakers are you know what they what they perceive and

is largely limited to lower formants i think some your earlier work

just about

no clear and not heard this argument that because you're selling a plate is a christian for see that brad

story and then some more dishes at work that i actually did they give you colouring

i mean you can a speaker-specific information here to saint and make it sound like a different person it's getting

a plan what the values are i see so we yeah so we just fix those formants in our model

a zero values for all sounds and

you can hear the sounds properly but it like you know the voice quality may well change if we allow

them to

very good for just one just a continued but at the more like you would be able to add and

when you add determine what the acoustic features are that these various case because you get the right to place

in about

does its trees but you get this will continue on in between right that would be great information people and

speaker independent and you know

speaker identification and characteristics right and speaker recognition can assistant

as well as well speech therapy and pronunciation tools

so that just something to think about all revisit that

okay so we're gonna close that session because i don't want to sort of a take too much out of

the right but like that's like thanks also be gone again

The Neural Mechanisms of Speech Production: From computational modeling to neural prosthesis

Keynotes

Frank Guenther (Boston University)