so it is my on privilege this morning to introduce a our keynote speaker frank on some
we see a computational and cognitive neural scientist specialising in speech and sensory motor control
is from the from the
department of speech language hearing sciences and biomedical engineering at boston university when i also obtained his phd
and is research combines theoretical modelling
with behaviour or and your imaging experiments to characterise the neural computation underlying speech and language so this is a
fascinating research field
which we thought would advantages the informal all in research
and so without further ado
like a you to help me welcome a corpus of frank and
morning thanks for showing up to thirty in the morning i'd like to start by thanking organisers for inviting to
this conference in such a beautiful location
and that also like to acknowledge my collaborators before it gets started the main collaborators on the work i'll talk
about today include
people from my lab at boston university including adjacent orville jonathan rumble or remember
supper gauche alfonso the other yet to cast an on my a pave elise a cop annapolis and or in
C V A
but in addition we collaborate a lot with outside labs and i'll be talking about a number of projects that
involve collaborations with people at mit including just a perk L melanie matthias and harlan lane
we've work we should you my a to create a speech synthesizer we use for much of our modelling work
and phillip kennedy and his colleagues at neural signals to work with us on our neural prosthesis project which i'll
talk about at the end of the lecture
the research program in our laboratory has the following goals
we are interested in understanding the brain first and foremost and
we're in particular interested in a lucid aiding the neural processes that underlie a normal speech learning and production
but we are also interested in looking at disorders and our goal is to provide a mechanistic model based account
and by model here i mean a neural network model that mimics the brain processes that are underlying speech and
using this model to on understand communication disorders problems that happen when part of the circuit is broken
and i'll talk a bit about communication disorders today but will focus on the last part of our work which
is developing technologies that eight individuals with severe communication disorders and i'll talk a bit about project involving a patient
with locked in syndrome who was
given a brain implant in order to try to restore some speech processing
the methods we use a include neural network modelling we use a very simple neural networks the neurons in our
models are simply actors that i have a nonlinear thresholding a of the output
we have other equations that define synaptic weights between the neurons
and we adjust these weights in a learning process is better described in a bit
we test the model using a number of different types of experiments we use motor and auditory cycle physics experiments
to look at speech look at the formant frequencies for example drinks different speech task
and we also use functional brain imaging including fmri but also i'm E G and E G to try to
verify the model or i help us improve the model by pointing out weaknesses in the model
and the final set of things we do a given that we're a computational neuroscience department we're interested in
producing a technologies also that are capable of helping people with communication disorders and i'll talk about one project involves
the development of neural prosthesis or a allowing people to speak to have problems with their that speech out
the studies we carry out are largely organised around one particular model which we call the diva model and this
is a neural network model of speech acquisition and production that we've developed over the past twenty years in our
lab
so in today's talk up first give you an overview of the diva model including a description of the process
of learning that allows the model to tune up so that it can produce speech sound
i'll talk a bit about how we extract simulated fmri activity from the model fmri is functional magnetic resonance imaging
and this is a technique for measuring blood flow in the brain and areas of the brain that are active
during that
have increased blood flow one so we can identify from fmri what parts of the brain are most active for
a task and differences in activities for different at task
condition
this allows us to test the model and i'll show an example of this where we use auditory perturbation of
speech in real time so that a speaker is saying word but they hear something slightly different
and we use this to test a particular aspect of the model which involves auditory feedback control of speech
and then model and the talk with a presentation of a project that involved
communication disorders in this case an extreme communication disorder in a patient with locked in syndrome was completely paralysed and
unable to move
and so we are working on prosody sees more people in this condition to help restore their ability to speak
so that they can communicate with people around them
this slide usable schematic of the diva model i will not be talking about the full model much i will
use a simplified schematic in a minute
a what i want to point out is that the different blocks in this diagram correspond to different brain regions
that in include different
what we call neural maps a neural map in our terminology is simply a set of neurons that represent a
particular type of information so and motor cortex for example down here in the vector motor cortex part of the
model we have articulator velocity imposition map
what these are neurons basically that command that positions of speech articulators in and articulatory synthesizer
i would just schema ties here so the output of our model is a set of commands to an articulatory
synthesizer this is just a piece of software which you provide a set of articulator positions as input this a
synthesiser we use the most is creative actions you my dad involve
seven articulatory degrees of freedom there's a job degree of freedom three talking degrees of freedom to live degrees of
freedom for opening in profusion
and a larynx height degree of freedom and together once you specify these positions of these articulators you can create
a vocal tract area function and you can use that area function to synthesise a and acoustic signal that would
be produced by vocal tract of bad shape
the models
productions are that back to model in the form of auditory since mada sensory information that go to maps
for auditory statements madison's restate located in auditory cortical areas in herschel drivers and the posterior superior temporal gyro
and this may have sensory cortical areas in the central some at a sensory cortex and supra marginal gyro
each of the large boxes here represents a map in this report cortex
and the smaller boxes represent represents a sub cortical components of the model most notably a base of anglia loop
for initiating speech output
and sarabelle or loop
which contribute to several aspects of production i'm going to focus on the cortical components of the model today for
clarity
and so i'll use this simplified version of the model which doesn't have all the components but it has all
the main processing levels that will need to go to today's talk show the highest level processing in the model
is what we call a speech sound at
and this is corresponds to cells in the left entropy motor cortex and inferior frontal gyros
in what is commonly called broke "'cause" area and then the promoter court are cortex immediately behind broke as area
in the model each one of these cells comes to represent a different speech sound and a speech sound in
the model can be either a phoneme or syllable or even a multi syllabic phrase the key thing here is
that it's something that's produce
very frequently so that there's a stored motor program for that speech sound and the canonical sort of speech sound
that we use
is the syllable so for the remainder the talk i'll talk mostly about yeah syllable production when referring to the
speech sound map
so cells in the speech sound map project
both to be primary motor cortex through what we call a feed-forward pathway at which is a set of learned
commands for producing these speech sounds and the activate associated cells in the motor cortex that command the right articulator
movement
but also be speech map sound map cells project to sensory areas
and what they do is they send
targets to those sensory area so if i want to produce a particular syllable such as bar
when i say bah i expect to hear certain things i expect certain formant frequencies that as a function of
time and that information is represented by synaptic projections from the speech sound map over to what we call an
auditory error my
where this target is compared to incoming auditory information
similarly when we produce a syllable we expected to feel a particular way when i say a for example i
expect my lips to touch for the B E and then to release
for the vowel this sort of information is represented in a smack sensory target that projects over to this matter
sensory cortical areas where it is compared to incoming smell sensory information
these targets are learned as is this feed forward command during learning process that'll describe briefly in just a minute
the arrows in the diagram represent synaptic projections from one type of representation to another
so you can think of these synaptic projections is basically transforming information from one sort of representation frame into another
representation frame and the main representations we focus on here are
phonetic representations in the speech sound map
motor representations in the articulator velocity and position maps
auditory representations in the auditory maps and finally estimate of sensory representation and smacked sensory map
the auditory dimensions we use in the model are typically corresponding to formant frequencies and all that talk about that
quite a bit as i go on in the talk
whereas this matter sensory targets correspond to things like
a fresher tactile information from the lips and the tong while you're speaking as well as muscle information about
lengths of muscles that give you a read of where you're articulators are in the vocal tract
okay so just to give you feel of what the model does so i'm gonna show the synthesizer the articulatory
synthesizer with just purely random movements now so this is
at what we do in the very early stages of learning in the model we randomly move the speech articulators
that creates auditory information it's mada sensory information
from the speech and we can associate auditory information and the smell sensory information with each other and with the
motor information that was used to produce the movements of speech so these movements don't sound anything like speech as
you'll see here
so this is just a randomly activating the seven dimensions of movie
so this is what the model does for the first forty five minutes we call this a babbling cycle take
about forty five minutes real time to go through this
and what the model does is it tunes up many of the projections between the different areas so here for
example in red are the projections that are turn tune during this random babbling cycle
so the key the key things being learned here are relationships between motor command
mada sensory feedback and auditory feedback
and in particular what the model needs to learn for producing sounds later is how to correct for sensory errors
and so what the model was learning largely is if i need to change my first formant frequency in an
upward direction for example because i'm too low
then i need to activate a particular set of motor commands and this will come a flow through a feedback
control mapped to the motor cortex
and will translate this auditory error into a motor corrective command
and similarly if i feel that my lips are not closing enough for be there will be a smack sensory
error representing that and that's ml sense rare will then be mapped into a corrective motor command in the motor
cortex
these arrows in red here are the transformations basically or synaptic weights their encoding these transformations and they're tuned up
during this babbling cycle
well
after the babbling cycle so to this point the model still has no sense of speech sounds this is correspond
very early babbling in infant
up to about six months of age before they start really learning in producing sounds from a particular language and
the next stage of the model handles the learning of speech sounds from a particular language and this is the
imitation process in the model
and what happens in the imitation process is we provide the model with an auditory target so we give it
a sound file of somebody producing a word or phrase
the formant frequencies are extracted and are used as the auditory target for the model
and the model then attempts to produce the sound by reading out whatever feed forward commands it might have if
it just heard the sound for the first time for the first time it will not have any feed forward
commands because it hasn't yet produce the sound it doesn't know what commands are necessary to produce the sound
and so in this case it's going to rely largely on auditory feedback control in order to produce the sound
because all it has an auditory target
the model attempts to produce the sound it makes some errors but it does some things correctly due to the
feedback control and it takes whatever commands are generated on the first attempt and uses them as the feed forward
command for the next attack
so the next attempt now has
a better feed forward command so there the there will be fewer errors will be a less of a correction
but again both the
a feed forward command and the correction added together that's the total output that's then
turned into the feed forward command for the next iteration and with each iteration the air gets smaller and smaller
due to the incorporation of these corrective motor commands into the feed forward command
just to give you an example of what that sounds like so here is an example that was presented to
the model a ford learning
the dog
this is a speaker saying good doggy and
here that are more
a dog
and what the model is going to now try to do is it's going to try to mimic this with
initially no feed forward command and just using auditory feedback control auditory feedback control system was tuned up during the
earlier babbling stage
and so it does a reasonable rendition but it's kind of sloppy
i
this is the second attempt it'll be significantly improve because the commands feedback commands from the first attempt to been
now moved into the feed forward command
i
and then by the sixth attempt the model has perfectly learn the sound meaning that it there are no errors
in its formant frequencies which is all i can hear from the sound pretty much and so it sounds like
this
this was the original
a dog
so what you can here is that the formant frequencies pretty much track the original formant frequencies in this case
they tracked imperfectly we looked at just the first three formant frequencies of the speech sound
when doing this and so in this case we would say the model has learned to produce this phrase now
so it would have a speech sound map sell devoted to that phrase if we activate that sell it reads
the phrase out now with no error too
well an important aspect of this model is that it's a neural network in the reason we chose the neural
network construction is so that we could
investigate brain function in more detail so what we've done is we've taken each of the neurons in the model
and we localise them in a standard brain space a stereo tactic space
that is a commonly used for analysing neuroimaging results from experiments such as fmri experiments and so here these orange
dots represent the different components of the model
a here for example this is the central focus in the brain where the motor cortex is in front of
the so central focus on the smell sensory cortex is behind it
and we have representations of the speech articulators in this region in both hemispheres
the auditory cortical areas include state cells and auditory error cells which was a novel prediction we made from the
model that these cells would reside somewhere in the higher level auditory cortical areas and i'll talk about testing that
prediction in you minute
we have some at a sensory cells in the us mass entry cortical areas of the super marginal drivers here
and these include are some have sensory error cells also crucial to
feedback control
and so forth so in general the representations in the model are bilateral meeting there are other neurons for
representing the lip are located on in both hemispheres but the highest level of the model the speech sound map
is left lateralized and the reason it's left lateralized is that
a large amount of data from the neurology literature suggests that
the left hemisphere is where we store our speech motor programs
in particular if there is damage to the left entropy motor cortex or adjoining brokers area here in the inferior
frontal drivers
speakers have what's referred to as a proxy of speech and this is an inability to read out the motor
programs for speech sound so they hear the sound they understand what the word is a and they
they try to say it but they just can't get the syllables to come out and this in our bus
because their motor programs represent about the speech sound map cells
are damaged due to the stroke if you have a stroke in the right hemisphere in the corresponding location there
is no upper active speech is largely spare
and in our view this is because the right hemisphere as all described about that are is more involved in
feedback control then feed forward control
an important insight is that once an adult speakers learn to produce the speech sounds of his or her language
and their speech articulators of largely stop growing
they don't need feedback control very often because their feed forward commands are already accurate
and if you for example listen to the speech of a somebody who became deaf as an adult for many
years many years there's speech remains largely intelligible a presumably because these motor programs are intact
and they by themselves are enough to produce the speech properly
i
in an adult however if we do something novel to the person such as
block their job why they try to try to speak or we perturbed auditory feedback of their speech then we
should reactivate the feedback control system by first activating sensory error cells that detect that they sensory feedback isn't what
it should be
and then motor correction takes place to the feedback control pathways of the model
okay so just to high like the
use of these locations what i'll show you now is a typical simulation where we have the model produce an
utterance in this case it saying how the
and what you'll see you'll hear first the production in our model the activities of the neurons correspond to electrical
activity in the brain
fmri actually measures blood flow in the brain and blood flow is a function of the electrical activity but it's
quite slow down relative to the activity peaks for five seconds after the speeches started and so what you'll see
is
the brain activity starting to build up in terms of blood flow over time after the utterances produced
so here the utterance was at the beginning but only later D C they hemodynamic response and this is actually
quite useful for us because we can do neuroimaging experiments
where people speak in silent
and then we collect data after they're done speaking at the peak of this blood flow so what we would
do is basically have them speak in silence and
at this point we would take scans with an fmri scanner is very loud which would interrupt the speech if
it was going on during your speech but in this case were able to scan after the speech is completed
and get a measure of what brain activity what brain regions where active and how active they were during speech
production
okay so that's an overview of the model next what i'll do is going to a little more detail about
the functioning of the feedback control system
and my main goal here is simply to give you i feel for the type of experiment we do we've
done many experiments of this sort to test and refine the model over the years
and the experimental talk about in this case is an experiment involving auditory perturbation of the speech signal well subject
is speaking in an M R I scan
so just to review then the model has the feed forward control system shown on the left ear and the
feedback control system shown on the right
and feedback control has both an auditory and isomap sensory component
so during production of speech when we activate this speech sound map cell to produce the speech sound
in the feedback control system we read out these targets to the sum at a sensory system into the auditory
system and those targets are compared to the incoming auditoriums mada sensory information
the targets take the form of regions so there's an acceptable region of F one that they can be in
if they're anywhere within this region there okay but if they go outside of the region and ever cell is
activated and that will drive the
oh and by driving articulator movements that will move it back into the appropriate target region
so
if we have an error arising in one of these maps and in particular we're gonna be focusing on the
auditory error map
what happens next in the models that the sarah gets transform
through a feedback control map in the right up we motor cortex
and then projected to the motor cortex in the form of a corrected motor command and so what the model
is essentially learned is how to take auditory errors and correct them with motor movement
in terms of mathematics this corresponds to a pseudo inverse of that you colby in matrix that relates the articulatory
and auditory spaces
and this can be learned during babbling simply by moving the articulators around and seeing what changes in some at
a sensory and auditory state take place
the fact that we have this feedback control map in the right entropy motor cortex now when the model that
was partially the result of the experiment that i'll be talking about this was not originally in the model originally
these projections what's of the primary motor cortex
i'll show the experimental result the cost us to change that component of the model
okay
so i based on this feedback control system we can make some explicit predictions about brain activities during speech
and in particular we made some predictions about what would happen if we shifted your first formant frequency during speech
so that when we set it back to you over earphones in fifty milliseconds you hear something slightly different than
what you're actually producing
well according to our model the should "'cause" activity of cells and auditory error map which we have localised to
posterior superior temporal drivers and that the adjoining plan and temporal these regions in these still be in fig
on the temporal lobe
so we should see increased activity there if we perturbed the speech
and also we should see some motor corrective activity because according to our model the feedback control system will kick
in when it hears this error even during that particular and
and it will try to correct if the utterance is long enough it will try to correct the error that
is her
now keep in mind that auditory feedback takes time to get back up to the brain so that i'm from
motor cortical activity tomb movement and sound output to get hearing that sound output in project
ejecting about up to your auditory cortex is somewhere in the neighbourhood of a hundred two hundred fifty milliseconds
and so we should see a corrective command kicking in not at the instant that the perturbation start
what about a hundred or one twenty five milliseconds later because that's how long it takes to process this auditory
feedback
so what we did was we developed a digital signal processing system that allowed us to shift the first formant
frequency in real-time meaning that a subject hears the sound with a sixty millisecond delay which is pretty much unnoticeable
to the subject
even unperturbed speech has that same sixty millisecond delay so they're always hearing
a slightly delayed version other speech over headphones we play a rather loud over the headphones and they speak quietly
as a result of this and the reason we do that as we want to minimize things like bone conduction
of the actual speech
and make them focus on the auditory feedback that we're providing them which is the perturbed auditory feedback
and what we do in particular is we take the first formant frequency and in one fourth of the utterances
we will perturbed it either up or down so three out of every four utterances are unperturbed
one in four is perturbed well excuse me one in eight is perturbed up and one in eight is perturbed
down so
they get these perturbations randomly distributed they can't predict them because first of all the direction changes all the time
and secondly because many of the productions are not prepare
and oh what we did well here's what this sounds like so the people were producing vowels
that the bout and so the words that they would produce work are words like that and pack and
pads
and here's an example of on shifted speech before the perturbation
i
and here is a case where we've shifted F one upward and upward shift about one corresponds to a more
open mouth and that should make the pet
a vowel sound a little bit more like an ad
and so if you hear the perturbed version of that production
i
it sounds more like that then yeah in this case so that original
sorry
i
hi
so it's consciously noticeable to you now when i play to you like this but most subjects don't notice what's
going on during the experiment we asked them afterwards that they notice anything sometimes will say
occasionally my speech sound a little odd but usually they didn't really notice that much of anything going on with
their speech and yeah their brains are definitely picking up this difference and we found that without them or i
we also look at their formant frequencies so what i'm showing here is
a normalized for F one
and what normalize means in this case is that the F one in a baseline on perturbed utterance
is what we expect to see that will take the F one in a given utterance we'll compared to that
baseline
it's exactly the same then we'll have a value of one so if they're producing the exact same thing is
they do in the baseline they would stay flat on this value of one
on the other hand if they're increasing their F one then we'll see the normalized F one go about one
in if they're decreasing F one will see go below one
the
gray shaded areas here are the competence in ninety five percent confidence intervals of the subjects productions in the experiment
and what we see for the down shift is that over time the subjects increase their F one to try
to correct for the ad decrease of F one that we
given them with the perturbation
and in the case where we up shift their speech they decrease F one as shown by this confidence interval
here
the split between the two occurs right about where we expect which is somewhere around a hundred two hundred and
fifty milliseconds after the first sound comes out that a here with the perturbation
the solid lines here are the results of simulations of the diva model producing the same speech sounds under perturbed
conditions
and so the black dashed line here shows the models productions in the option if condition we see weights about
a hundred twenty five when this case actually it only weights about eighty milliseconds are delay loop which short here
and then it starts to compensate for the utterance
similarly in the down shift case it goes for about eighty milliseconds until it starts to your the error and
then it compensates in an upward direction
and we can see that the models productions fall in a confidence intervals of the subjects production so the model
but i produces a good fit of the behavioural data
but we also took a look at the neuroimaging data and on the bottom what i'm showing is the results
of a simulation that we're and before be study where we generated predictions of fmri activity
when we compare shifted speech to non shifted speech as i mentioned one we shift the speech that should uttering
these auditory error cells on and we've localise them to these posterior areas of the temporal gyros here
when those error cells become active they should lead to a motor correction and these are shown by activities in
the motor cortex here in the model simulation
now we also see a little bit stale valour activity here in the model but i'll skip that for two
days
talk
here on the top is what we actually got from our experimental results for the ship minus no ship contrast
the auditory hair cells were pretty much where we expected them so first of all there are auditory ourselves there
are cells in your brain that detect the difference between what you're saying and what you expect it to sound
like even as an adult
these auditory errors of become active at but we noticed is that the motor corrective activity we saw was actually
right lateralized in it was pretty motor it wasn't bilateral and primary motor as we predicted it's farther forward in
the brain it's in a more pretty motor cortical real area
and it's right lateralized so one of the things we learned from this experiment was that auditory feedback control appears
to be right lateralized in the frontal cortex
and so we modify the model to have an auditory feedback that
are sorry a feedback control map in the right entropy motor cortex area correspond with this region here
we actually ran a parallel experiment where we perturbed speech with the balloon in the mouth so we actually
we build a machine that
a perturbed your job while you were speaking at so you would be saying something like a P and during
the how this balloon would blowup very rapidly it was a little was actually the finger of a lot of
that would follow up to about a centimetre and half and would block your job from closing so that when
you were
done with that i'm getting ready to say that consonant and the final vowel key then the job was blocked
the job could move as much subjects compensate again
and we saw in that experiment activity in their smell sense recordable areas corresponding to this matter sensory error map
but we also saw a right lateralized motor cortical activity and so based on these two experiment
we modify the model to include a right lateralized feedback control map that we did not have in the original
model
okay so
the other thing we can do is we can look at connectivity in brain activities using techniques such as structural
equation modelling a very briefly in a structural equation modelling analysis what we would do is we would use a
we define model of connectivity in the brain and then we would go and look at the fmri data and
see how much of the covariance matrix of the fmri data we had a can be captured by this model
if we optimize the connections and so what as cm does is it
reduces connection strings that are produced in that modelling gives you goodness of fit data
and in addition to being able to the data very well meaning that are cut connections in the model are
in the right place
we also noted a an increase in the what what's called effective connectivity so an increase the strength of the
effect of these
auditory areas on the motor areas in the right hemisphere when the speech was perturbed so the interpretation of that
is when i picture of your speech but with an auditory perturbation like this
the error cells are active that drives activity in the right that for motor cortex and so we have an
increase affect on the motor cortex from the auditory areas in this case
and so this is further support for the structure in the model and the feedback control system that we just
the score
okay so that's one example of an experimental test we've done a very large number of a test of this
sort
we've tested predictions of can "'em" addicts in the model so we look we work with people who measure articulator
movements using
electromagnetic articulatory this is a technique where you basically glue receiver coils on the talking in the lips and the
job and you can measure the very accurately the position of the articulators of these points on the articulators
in the midsagittal plane and from this you can estimate quite a accurately in time the positions of speech articulators
and compare them to
productions that use the in the model we've done a lot of work looking at for example phonetic context effects
in our production which i'll come back to later R is a phoneme in english that is produced with a
very wide range of articulatory variability
the acoustic cues for are very stable this been shown by people such as voice in S P wilson
and what you see in the if you produce movements with the model is that
the model will also produce very different articulations for are in different phonetic contexts and this has to do with
the fact that it's starting from dish different initial positions and it's simply going to the five closest point to
the acoustic target
that it can get to and that point will be in different parts of the articulator space depending on where
you start
we looked at a large number of experiments on other types a particular articulatory movements both in
normal hearing and hearing impaired individuals we look at what happens when you put a bite blocked in we look
at what happens when you noise mask these speakers and we've also looked at what happens over time for in
speech of people with cochlear implants for example so
in the case of a cochlear implant recipient that was an adult would already learn to speak
when they first
receive the cochlear implant they hear a sounds that are not the same as the sounds that they used here
so their auditory targets don't match
what's coming in from the cochlear implant and it actually impairs their speech for a little while a before about
a month or so before they start to improve their speech and by a year it show up very strong
improvements in the speech
and according to the model this is occurring because they have to retune their auditory feedback control system to deal
with the new feedback and only when that auditory feedback control system is tunic and they start to retune the
movements to produce more distinct speech data
a we've also done a number of neuroimaging experiments for example we predicted that you left entropy motor cortex
involves syllabic motor programs
and we use the technique called repetition suppression in fmri where you present us to really that change and some
dimensions but don't change in other dimensions
and with this technique you can find out what is it about the seemingly that a particular brain region cares
about and using this technique we were able to show that in fact the only region in the brain that
we found that had
a syllabic sort of representation was the left entropy motor cortex where we believe these syllabic motor programs are located
a highlighting the fact that the syllable is a particularly important entity for motor control
and this we believe is because our syllables are very high we a practise and well to the motor programs
that we can read out we don't have to produce the individual phonemes we read out the whole syllable as
a motor program that we've stored in memory
finally we've been able fourteen would lead to even at test the models predictions electra physiologically in this was in
a case
of a patient with locked in syndrome that'll state speak about in a bit and i'll talk about exactly what
we were able to verify using electro physiology in this case actual recording from neurons in the court
okay so
the last part might talk now will start to focus on using the model to investigate communication disorders
and we've done a number of studies of this sort we as i mentioned look that speech in normal hearing
and hearing impaired populations
we are now doing quite a bit of work on stuttering which is a very common speech disorder that affects
about one percent of the population stuttering is a very complicated disorder it's been known
since the beginning of time basically every culture seems to have people who stutter within them within that culture people
been trying to cure stuttering for ever and we've been unable to do so and the brains of people who
stutter are actually
really similar to bring the people who don't stutter and unless you look very closely and if you start looking
very closely you start to see things like white matter differences
and grey matter thickness differences in the brain and these tend to be localised around the base of anglia alamo
cortical loop and so are you of stuttering is that several different problems can occur in this loop very difference
that people would who stutter
can have different locations of damage or of an anomaly in their basic english alma cortical loop and this can
lead all of these can lead to stuttering and the complexity of this order is partly because
it's a system level disorder where different parts of the system can cause problems it's not always the same part
of the system that's a problematic in different people who stutter and so one of the important areas of research
for stuttering is
computational modelling of this loop to get a much better understanding of what's going on and how these different problems
can lead to similar sorts of behaviour
we looked at we're looking at what's pass moderate dysphonia which is a vocal fold problem similar to just only
it's a
a problem where typically the vocal folds are too tense during speech
again appears to be basal gangly a loop related
a proxy of speech which involves left hemisphere frontal damage a child that a proxy of speech which is actually
a different disorder from acquired a proxy a speech this tends to involve more widespread
kind of lesser damage but in a more widespread a portion of the brain
and so forth and the project all talk most about here will be a project involving neural prosthesis for locked
in syndrome and this is a project that we're doing a are we done with bill kennedy from neural signals
a locality developed technology for implanting brains of people with locked in syndrome and we help them build a prosthesis
from that technology
so typically are studies where we're looking at disorders involve some sort of damage version of the model it's a
neural network so we can go in and we can mess up white matter projections which are these synaptic projections
we can mess up
neurons in a particular area we can even adjust things light levels of neurotransmitters some studies suggest that there may
be an excess of double mean and some people who stutter
well we have added up i mean receptors or base of anglia loop so we can go in and we
can start changing double mean levels and seeing how that changes but the behaviour of the model and also the
brain activities of the model
and what we're doing now is running a number of imaging studies involving people who stutter or we've made predictions
based on several possible
lead to damage in the brain that may result in stutter stuttering and we're testing those predictions both by seeing
if the model is capable of producing stuttering behaviour but also seeing if the brain activities
match up with what we see in people who stutter there are many different ways to invoke stuttering in the
model but each way causes a different pattern of brain activity to occur
so by having both the behavioural results and the neuroimaging or results we can do a much a more detailed
treatment of what exactly is going on in this population
the example i'm gonna spend the rest of the talk describing is a bit different where in this case the
speech motor system of the of the patient was
intact
but patient was suffering from locked in syndrome due to a brain stem stroke
a locked in syndrome is a syndrome where
patients have intact cognition and sensation but they're completely unable to perform voluntary movement so it's a case of being
almost kind of
buried in your own body alive and the patients sometimes have eye movements patient we worked with could vary slowly
move his eyes up and down his eyelids actually to answer yes no questions
this was the only form of communication here at
and so prior to our involvement in the project he was implanted as part of a project developing technologies for
locked in patients to control computers or external devices
these technologies are referred to by several different names brain computer interface or brain machine interface or neural prosthesis
and in this case we were focusing on a neural prosthesis for speech restoration
the locked in syndrome is typically caused by either brain stem stroke and eventual ponce or more commonly people become
locked in through neural degenerative diseases such as a last which are attacked the motor system
people who suffer from a less
go through a stage for the later stages of the disease wait where they are basically locked in there unable
to move or speak
but still fully conscious and with sensation
well the electrode that was developed by are calling filled kennedy is schema ties here and here's a photograph of
it it's a tiny glass cone that is open on both bands the cone is about a millimetre long they're
three gold wires inside the cone
there coded with a and insulator except at the very end where the wires cut off and that acts as
a recording site so there are three recording sites within the cone one is used as a reference and the
other two are used as recording channels
and these wires are this electrode is inserted into the stripper cortex here i've got a schematic of the cortex
which is good consists of six layers of cell types
the goal is to get this near layer five but the cortex
where the output neurons are these are the motor neurons that project in the in the motor cortex these are
neurons a project for the periphery to "'cause" movement
but it doesn't matter too much where you go because the cone is build with i nerve growth factor and
what happens is
over a month or two X sounds actually grow into this conan lock it into place that's very important because
it stops movement if you have movement of a an electrode in the brain
use get problems such as cleo says which is scar tissue building up around the electrode and stopping a the
electron from picking up signals
in this case the wires are actually inside a protected class cone and nobody else's builds up inside the cone
so it's a permanent electrode you can implant this electrode and record form from it for many years and if
when we did the project all talk about the electorate had been in the subjects brain for over three and
a half years
so
the electrode location was chosen in this case by having subject attempt to produce speech well in a and fmri
scanner
and what we i noticed was that the brain activity is a relatively normal looks like brain activity of
of a neurological a normal person trying to produce speech and in particular we there's a blob of activity on
the three central drivers which is the location of the motor cortex
in the region where we expect for speech so i'm going to refer to this region of speech motor cortex
this is where the electrode was implanted so this is an fmri S can perform before implantation here is actually
a C T scan afterwords where you can see in the same brain area the wires of the electrode coming
out
this is bottom picture is a three D A C T scan showing this call a where you can see
the training out to me where the electorate was inserted you can see the wires coming out and the wires
go into a package of electronics that is located under the skin
and these electronics amplify the signal and then send it is radio signals across the scout
we attach intent as basically that just antenna coils to the scout so the subject has a normal looking had
yes hair on his head there's nothing sticking out of his head
when he comes into the lab we attach these antenna to the scout eight we tune them to just the
right frequencies and they pick up the two signals that we are generating from are electrode
the signals are then routed to a recording system and then to a computer where we can operate on those
signals
in real time
well
oh
kennedy had implanted the patient two years before we are several years before we got involved in the project
but they were having trouble decoding the signals and part of the problem is
that if you look in motor cortex there's nothing obvious that corresponds to a word or for that syllable or
phoneme you don't see neurons turn on when the subject produces a particular syllable and then shut off twenty the
subjects done
a U C instead that all the neurons are just subtly changing their activity over time so there it appears
that there's some sort of continuous representation here in the motor cortex there's not a representation of just words and
phonemes at least at the motor level
a cantonese a group contacted us because we had a model of what these brain areas are doing and so
we collaborated on decoding these signals and routing them to a speech synthesizer so the subject could actually control some
speech output
well
the tricky question here is what is the neural code for speech in the motor cortex
and the problem of course is that there are no prior studies people don't go into a human motor cortex
and record normally
and monkeys don't speak you know whether animals speak so we don't have any single cell data about what's going
on in the motor cortex during speech we have data from our movements and we use the insights from this
data
yeah but we are also used insights from what we saw in human speech movements to determine what where the
variables that these people were controlling what was the motor system caring about
mostly to care about muscle positions or data care about the sound signal
and there is some available data from simulation studies the motor cortex these come from
the work by up and field who work with epilepsy patients who were having surgeries to remove portions of the
cortex that were
causing a epileptic fits
before they did the removal what they would do is actually stimulate in the court ecstasy out what
parts of the brain we're doing why any particular what they wanted to do was avoid parts of the brain
involved in speech and they mapped out along the motor cortex areas that "'cause" movements of the speech articulators for
example and other areas that caused interruptions of speech and so for
and these studies were informative and we help we use them to help us determine where to localise some of
the neurons in the model but they don't really tell you about what kind of representation is being used by
the neurons when you stimulate a portion of cortex are stimulating hundreds of neurons minimally they were using something like
two bolts for stimulation the maximum activity even ron is fifty five mill of also the stimulation signal was dramatically
bigger than any natural signal
and it activates a large area of cortex and so you see a gross
where lee form the movement coming out and speech movements tended to be things like that our price of the
subject might say that
something like this adjust the of movement it's not really a speech sound they don't produce any words or anything
like that
and from these sorts of studies it's next to impossible to determine what sort of representation is going on in
the motor cortex
a however we do have our model which does provide the first explicit characterisation of what these response properties should
be of speech motor cortical cells we have actual speech motor cortical cells in the model they are tuned to
particular things
and so what we did was we use the model to guide are search for information in this part of
the brain
and i want to point out that the characterisation provided by the model was something that we spent twenty years
refining so we ran a large number of experiments testing different possibilities about how speech was control
and we ended up with a particular format in the model and that's no coincidence that's because we spent a
lot of time looking at that in here is the result of one such study which a highlights the fact
that in motor planning
sound appears to be more important than where you're talking is actually located and this is a study of the
phoneme are that i mentioned before just to describe what you're going to see here so that the each of
these lines you see represents a tongue shape
and they're to chunk shapes in each panel there's a dashed line
so this is the tip of the time this is the centre the tongue in this back of the tongue
where actually measuring the positions of these transducers that are located on the time using a thirty kilometre E
and the dashed lines show the tongue shape that occurs seventy five milliseconds before
B centre of the R which happens to be they minimum of the F three trajectories
and the dark bold lines show the tongue shape at the center ready are a or at that have three
minimum so in this case you can see the speaker used
and upward movement other tongue tip to produce the R
in this panel
so what we have over here in our two separate subjects where we have measurements from the subject on the
top row and then productions of the model represented in the bottom row and the model was actually using speaker-specific
vocal tract in this case so
what we did was we took the subject we are collected a number of them are i stands while they
were producing different phonemes
we did principal components analysis to pull out their main movement degrees of freedom we had their acoustic signals and
so we built a synthesiser that had their vocal tract shape and produce their formant frequencies
then we had the diva model learned to control their vocal tract so we put this vocal tract synthesiser in
place of the my the synthesizer we battled the vocal tract around had it learn at to produce hours and
then we went back and had it
produce the estimate lee in the study and in this case the people producing utterances
walk around
what drum and one row of so B R was either preceded by a sound at the orgy
what we see is that the subject produces very different movements in these three cases so in a context the
subject uses it upward movement of the tongue tip like we see over here
but in the D context the subject actually move their tongue backwards to produce the R
in the G context they move their time downward to produce the are so they're using three completely different gestures
are articulatory movements to produce the R and yet the producing pretty much the same after each race the F
three traces are very similar in these cases
if we take the model and we have it produce R's with the speaker-specific vocal tract we see that the
model because it cares about the acoustic signal primarily it's trying to get these F three target
and the model also uses different movements in the different context an impact the movements reflect the movements of the
speaker so here the model uses an upward movement of the tongue tip here the model uses the backward movement
of the time and here the model uses a downward movement of the time to produce are so
what we see is that with a very simple model that's just going to be appropriate position and formant frequency
space we can capture this complicated variability in the articulator movements
of the actual speaker
a another thing to note here is this is the second speaker again the model replicates the movements and the
model also capture speaker-specific differences here in this case the speaker use the small upward tongue tip movement to produce
the R
up at the speaker for reasons having to do with the morphology of their vocal tract had to do a
much bigger movement of the tongue tip to produce the are in a contact
and again the model produces a bigger movement in this speakers case than in the speaker space so
this provides a pretty solid data that speakers are really concentrating on
the formant frequency trajectories of their speech output more so than where the individual articulators were located
and so we made production and that we should see formant frequency representations in the speech motor cortical area if
we're able to look at what's going on during speech
a the slide i'm sure everybody here follows this appears actually the formant frequency traipse traces for good doggy this
is what i'd use of the target for the
simulations i showed you earlier and down here i show the first two formant frequencies what's called the formant frame
plane and the important point here is that if we can move if we can just change F one and
F two we can produce pretty much all of the vowels
of the language because they are differentiated by their first two formant frequencies and so formant frequency space provides a
very low dimensional continuous space for the planning of movements
and that's crucial for the development of the brain computer interface
okay and why is a crucial well
there have been our number brain computer interfaces that involve implants and the hand area
of the motor cortex
and what they do usually is they decode cursor position on the screen from neural activities in the hand area
and people learn to control movement of a cursor by who are activating their neurons in their hand motor cortex
now they when they build these interfaces they don't try to decode all of the joint angles of the arm
and then determine where the cursor would be based on where the mouse would be instead they go directly to
the output space in this case the two dimensional cursor space
in the reason they do that is we're dealing with a very small number of neurons in these sorts of
studies relative to the entire motor system there are hundreds of millions of neurons involved in your motor system
and in the best case you might get a hundred neurons in the brain computer interface we were actually getting
far fewer from that then that we had a very old in plant that only had two electrode wire
so we were getting somewhere we had less than ten neurons maybe is a few as two or three neurons
we could pull out more signals than that but they weren't signal nor on activities
well if we tried to pull out a high dimensional representation of the arm configuration from a small number of
neurons we can have a tremendous amount of error and this is why they don't do that instead they try
to pull out a very low dimensional thing which is this two D cursor position
well we're doing the analogous thing here instead of trying to pull out all of the articulator positions that determine
the shape of the vocal tract we're simply going to the output space which is the formant frequency space which
for the for about production can be as simple as a two-dimensional signal
okay so what we're doing is basically decoding and intended sound position in this two D formant frequency space
that's generated from motor cortical cells a but is a much lower dimensional thing then the entire vocal tract shape
well the first thing we need to do is verify that this formant frequency information was actually in this part
of the brain and the way we did this was we had that subject try to imitate a minute long
vowel sequence that was something like
yeah year who this lasted a minute and they were told the subject was told to do this in synchrony
with the stimulus
this is crucial because we don't know otherwise when he's trying to speak up because no speech comes out and
so what we do is we record the neural activities during this minute long attempted utterance
and then we try to map them into the formant frequencies that the subject was trying to imitate so the
square wave here right which is kind of the C is that the actual in this case actually have to
going up and down and here's the actual F one going up and down for the different vowels
and the solid are not bold squiggly line here is the decoded signal a it's not great but it's actually
highly statistically significant we did cross validated training and testing and we had a very highly significant
a representation of the formant frequencies our values one point six nine four F one point six eight for F
two and so this verifies that there is indeed formant frequency information in your primary motor cortex
and so the next step was simply to use this information to try to produce speech output
just as a review for most of you formant synthesis of speech has been around for a long time goner
font for example in nineteen fifty three use this very large piece of electronic equipment here
with this style was on a two-dimensional pad and what he did was he would be stylus around on the
pad and the location of the stylus was i location in the F one F two space
so is basically moving around in the formant plane and just by moving this cursor around in this two dimensional
space is able to produce
intelligible speech so here's an example
i
so the good news here is that with just two dimensions some degree of speech output can be produced
consonants are very difficult i'll get back to that at the end but certainly bows are possible with this sort
of synthesis
so what we did was we took the system and we so here is a schematic are electrode in the
speech motor cortex
is recorded by this are picked up and amplified and then sent across the sky now
we record the signals and we then run them through a neural decoder and what the neural decoder does is
it predicts what formant frequencies are being attempted based on the activities so it's trained up on one of these
one minute long sequences
and once you train it up then it can take a set of a neural activities and translate that into
a predicted first and second formant frequency which we can then send over a speech synthesiser to the subject
the delay from the brain activity to the sound output was fifty milliseconds in our system and this is approximately
the same delay as
your motor cortical activity to your sound output and this is crucial because if the subject is going to be
able to learn to use this synthesiser you need to have an actual feedback delay if you delay speech feedback
by a hundred milliseconds in a normal speaker
they start to become highly disfluent they go through some stuttering like behaviour they'll start talking it's very disruptive so
it's important that this thing at operates very quickly
and produces this feedback in a natural time frame
now what i'm gonna show is the subsets performance with the speech bci so we had "'em" produce a about
tasks so subject would start out at the centre about
then would it is
ask on each trial was to go to about that we told him to go to so in the video
well play you'll hear the computer say
listen
and it'll say something like yea i
and it'll say speak and then he supposed to say E with the synthesiser so you'll hear his sound output
as produced by the synthesizer as the attempts to produce the bow that was being that presented in you'll see
that the target values in green here
the cursor you'll see is the subjects location in the formant frequency space
a most of the trials we did not provide visual feedback the subject didn't need visual feedback and we saw
no increase in performance from visual feedback E instead use the auditory feedback that we produced from the synthesiser to
produce a better and better speech
or what speech sounds at least and so here are five examples five consecutive productions in a block
we speak
a so that's a directivity very quickly want to the target
so
be your egos awfully here's the error any kind of steers the back into the target five
another directive is next trial isn't here you'll seems to me like yeah the before the timeout
but nobody around here
so straight to the target so what we saw were
to sorta behaviours often times it was straight to the target but other times you would go off a little
bit and then you would see him one see her the feedback going off you would see "'em"
and presumably in his head he's trying to change the shape of this time we don't to try to you
know
try to actually say the sound so he's trying to reshape where that sound is going and so you'll see
"'em" kind of steered toward the target in those cases so what's
happening in these slides is or these panels is i'm showing the error a rate here course they hit rate
as a function of block so any given session we would have
a four blocks of trials there were about five productions to ten productions per block so during the course of
a session he would produce anywhere between about
ten that's what course ten to twenty repetitions but about actually five to ten repetitions of each about
and when he first starts his hit rate is just below fifty percent that's above chance but it's not great
but we see with practise it gets better with each block and by the end he's improved a set rate
to over seventy per se
on average in a in fact in the later sessions he was able to get up to about ninety percent
hit rate if we look at the end point error as a function of block this is how far away
he was from the target and formant space i when the trial and that
so if it was a success it would be zero if it's not a success and there's an error we
see that this pretty much linearly drops off over the course of a forty five minute session
and this movement i'm also improves a little bit
this slide shows what happens over many sessions so these are twenty five sessions
one thing to note here is and this is the endpoint error we're looking at one thing to note is
that there's a lot of variability from day to day i'll be happy to talk about that we had to
train up a new decoder everyday because we weren't sure we had the same neurons everyday
so some days the decoder work very well like here in other days it didn't work so well what we
saw on average over the sessions is that the subject got better and better at learning to use the synthesisers
meaning that
even though he was given a brand new synthesiser on the twenty that session it didn't take "'em" nearly as
long to get good it using that a synthesiser
well to summarise them for the speech brain computer interface here
there are several mount novel aspects of this interface that was the first real time speech brain computer interface so
this is the first attempt to actually decode ongoing speech as opposed to pulling out words or moving a cursor
to choose words on the screen
it was the first real time control using wireless system a wireless is very important for this because
if you have a connector coming out of your head which is the case for some patients you get the
sort of surgery
that connector actually can have an infection build up over build up around it and this is a constant problem
for people with this sort of system wireless systems are the weight of the future
we were able to do a wireless system because we only had two channels of information a current systems have
usually hundred channels or more of information and the wireless technology is still catching up so these hundred channel systems
typically still have
connectors coming out of the head
and finally are project was the first real time control within a lecture that in been implanted for this long
the selected within for over three years this highlights the utility of the sort of electrode we you
or permanent implantation the speech that came out was extremely rudimentary as you saw but keep in mind that where
we have two tiny wires of information coming out of the brain
pulling out information from at ten neurons max
out of the hundreds of millions of neurons involved in the system and yet the subject was still able to
learn to use the system and improve the speech over time their number things we're working on now to improve
this
at most notably we're working on improving synthesis that we are developing two-dimensional synthesisers that can produce both vowels and
consonants and that sound much more natural than a straight formant synthesiser
a number of groups are working on smaller electronics and more electrodes
the state-of-the-art now as i mentioned is probably ten times the information that we were able to get out of
this brain computer interface so we would expect a dramatic improvement
in a performance with the modern system
and we're spending a lot of time working on decoding techniques that are i'm improved as well the initial decoder
that you give these subjects is a very rough it just gets i mean the ballpark and that's because there's
not nearly enough information
to an upper decoder properly from a training a sample and so what people are working on include people in
our lab are decoders that actually tune while the subject is trying to use the prosthesis of not only is
the subjects motor system adapting to use the prosthesis
but the prosthesis itself is helping that adaptation by a cutting error down on each production very slowly over time
to help the system state to overtime
and with that i'd like to
again thank at my collaborators and also thank the N I D C D and N S F four funds
that funded this research
okay so we have time for two questions
morgan
yeah
really interesting to
yeah it is pretty strong emphasis and formants room this numbers and speeches in that
when you have the playback of doggy the
go
that's of great so right is there other work that you're doing with stop consonants are figuring out a way
to put things like that in your right eye so i largely focused on performance for simplicity during the talk
the smell sensory feedback control system in the model actually does
a lot of the work for stop consonants so for example for a B we have a target for the
closure itself or so there is in addition to the formant representation we have tactile dimensions that supplement the targets
mass sensory feedback i is i in our model secondary auditory feedback largely because during development we get auditory targets
in their entirety from people around us
but we don't we can't
tell what's going on in their mouth so early development we believe is largely driven by auditory dimensions
this may have sensory system learns what goes on when you properly produce the sound and then it later contributes
to the production once you build up this madison street target
no one other quick note is another simplification here is that
at frequencies a strictly speaking are very different for women and children and men and so we when we are
using a different voices we use the normalized formant frequency space where we actually use ratios of the formant frequencies
to a
to help accommodate
i
the question
right well as i think that i think you but i think that scare yeah
i mean and if for any debate between and you can literally tag it's very
i just one can understand where you're coming from one to because we really and in working with
people have
and look very similar to a data you know we here
there you can delete it look at right
then you can get that the articulatory information at that it just use where actually made perfect memory for example
yeah okay well that's so that so in my are you
the
gestural score is more or less equivalent swore ford motor command and that feed forward command is tuned up to
hit auditory target so
we do have a job in a factor gestural score in the form of a feed-forward motor command and so
if you produce speech very rapidly that whole fee for motor command will get read out but it won't necessarily
make the right sounds if you push to the limit
so for example in the perfect memory case the model would you be you know would do the gesture for
the see if it's producing a very rapidly
it wouldn't that he may not come out but it would presumably here a slight error in try to correct
for that a little bit in later production but
to make a long story short my view is that the gestural score which i think does exist is something
that is equivalent to a feed-forward motor man
and
the people model does it show how huge amount that gestural score how you keep it to do over time
and
things like that okay
yeah
and then
thanks a really amusing talk and
oh
seems to me that people did review but someone who sensory feedback doesn't really tell you about what those words
mean
all those mean people through all those sort of visual track on any of the kind of feedback and speech
production
it absolutely does but we do not have anything like that in the model so we purposely focused on motor
control speech is a motor control problem and
the words are meaningless in the to the model that of course a simplification
track ability for us to be able to study a system that we could actually characterise computationally out were working
well or a higher level connecting this model which is kind of a low level motor control model if you
will with higher level models of
sequencing of syllables and we're starting to think about how these
sequencing areas of the brain interact with areas that represent meeting and so middle frontal drivers for example is
very commonly associated with some word meaning and temporal oh these areas you know i
but the sequencing system a but we have not yet model that so this kind of
in our view we're gonna working our way up from the bottom where the bottom is motor control and the
top as language
we're not that far up there yeah
so it was really inspiring talk
i'm
kind of wondering that thinking about
the beginning of your talk and the babbling in imitation
face
one of the things is pretty
apparent from that is that you're starting out effectively with your model with adult vocal tract
and they're listening to external stimuli which are also kind of matched so right so what is your take on
the i'm i work with very back then a lot on thinking about things like normalisation i'm kinda curious what
your take on online
how things change as the as you know you get a six month old and their vocal tract rows and
stuff like that how do you see that fitting into model well so
i think that highlights the fact that
formant strictly are not the representation that's used for this transformation from at all you know when the channel here's
an adult sample they're hearing the big muscular normalized version of it that their vocal tract can imitate because
frequencies themselves they can't imitate but things like so we've looked at a number representations that involve things like wall
of the ratio of the formants and so forth
and those improve its abilities and they work
well in some cases but we haven't found that and like what is that
that representation
i where i think it is in the brain i think in playing time prowling the higher order auditory areas
that's probably where you're representing speech in this
are independent manner
but what exactly those dimensions are i can't say for sure it's something some normalized formant representation but the ones
we tried we tried miller's space for example
eighty nine paper a they're not for satisfactory they do a lot of the normalisation from but they don't work
that well for controlling movements
oh i mean one of the things that i was thinking about is that keith johnson for example really
feels like well this normalisation is actually learn phenomenon so it's easy feels like you have some of the machinery
there instead of i mean deposit
that it's
you know it is some operation
that you could ever imagine
having an adaptive
system that actually you know what that normalisation
it's possible i there's just so examples like parents being able to see and so forth so i think that
there's something about the mammalian auditory system that pulls out that the dimensions that it pulls out naturally are
largely speaker-independent already that the i mean it pulls out all kinds of information but for speech system i think
it you know that's what's using but
i wish i could deviate more satisfactory answer
nor did you have a great for a while
question from cell and i using that when it
is it just the first three data using we have first three are first two depending so for the prosthesis
project we just use the first two for the simulations i showed for the rest of the people do the
simulations those first three okay "'cause" we just in recent work for example that are
and we and then you can tell information about which particular a term shape which is if you look at
high of one right and when that is ideal
it would be great if you pay include create something like any idea do not know what the other hand
i was just gonna say we can look at that so by controlling F one through F three we can
see what F
for an F five would be for four different are configurations we haven't looked at that yeah but
my view is that is that they're perceptually not very important or even salient so of course the physics will
make times you know the form a slightly different if your tongue shapes are
are different especially for the higher for men
but i think that the speakers are you know what they what they perceive and
is largely limited to lower formants i think some your earlier work
just about
no clear and not heard this argument that because you're selling a plate is a christian for see that brad
story and then some more dishes at work that i actually did they give you colouring
i mean you can a speaker-specific information here to saint and make it sound like a different person it's getting
a plan what the values are i see so we yeah so we just fix those formants in our model
a zero values for all sounds and
you can hear the sounds properly but it like you know the voice quality may well change if we allow
them to
very good for just one just a continued but at the more like you would be able to add and
when you add determine what the acoustic features are that these various case because you get the right to place
in about
does its trees but you get this will continue on in between right that would be great information people and
speaker independent and you know
speaker identification and characteristics right and speaker recognition can assistant
as well as well speech therapy and pronunciation tools
so that just something to think about all revisit that
okay so we're gonna close that session because i don't want to sort of a take too much out of
the right but like that's like thanks also be gone again