0:00:13 | so it is my on privilege this morning to introduce a our keynote speaker frank on some |
---|
0:00:22 | we see a computational and cognitive neural scientist specialising in speech and sensory motor control |
---|
0:00:30 | is from the from the |
---|
0:00:34 | department of speech language hearing sciences and biomedical engineering at boston university when i also obtained his phd |
---|
0:00:43 | and is research combines theoretical modelling |
---|
0:00:46 | with behaviour or and your imaging experiments to characterise the neural computation underlying speech and language so this is a |
---|
0:00:55 | fascinating research field |
---|
0:00:58 | which we thought would advantages the informal all in research |
---|
0:01:04 | and so without further ado |
---|
0:01:06 | like a you to help me welcome a corpus of frank and |
---|
0:01:19 | morning thanks for showing up to thirty in the morning i'd like to start by thanking organisers for inviting to |
---|
0:01:25 | this conference in such a beautiful location |
---|
0:01:28 | and that also like to acknowledge my collaborators before it gets started the main collaborators on the work i'll talk |
---|
0:01:35 | about today include |
---|
0:01:36 | people from my lab at boston university including adjacent orville jonathan rumble or remember |
---|
0:01:42 | supper gauche alfonso the other yet to cast an on my a pave elise a cop annapolis and or in |
---|
0:01:48 | C V A |
---|
0:01:50 | but in addition we collaborate a lot with outside labs and i'll be talking about a number of projects that |
---|
0:01:56 | involve collaborations with people at mit including just a perk L melanie matthias and harlan lane |
---|
0:02:02 | we've work we should you my a to create a speech synthesizer we use for much of our modelling work |
---|
0:02:09 | and phillip kennedy and his colleagues at neural signals to work with us on our neural prosthesis project which i'll |
---|
0:02:16 | talk about at the end of the lecture |
---|
0:02:20 | the research program in our laboratory has the following goals |
---|
0:02:25 | we are interested in understanding the brain first and foremost and |
---|
0:02:29 | we're in particular interested in a lucid aiding the neural processes that underlie a normal speech learning and production |
---|
0:02:37 | but we are also interested in looking at disorders and our goal is to provide a mechanistic model based account |
---|
0:02:44 | and by model here i mean a neural network model that mimics the brain processes that are underlying speech and |
---|
0:02:52 | using this model to on understand communication disorders problems that happen when part of the circuit is broken |
---|
0:03:00 | and i'll talk a bit about communication disorders today but will focus on the last part of our work which |
---|
0:03:06 | is developing technologies that eight individuals with severe communication disorders and i'll talk a bit about project involving a patient |
---|
0:03:14 | with locked in syndrome who was |
---|
0:03:16 | given a brain implant in order to try to restore some speech processing |
---|
0:03:22 | the methods we use a include neural network modelling we use a very simple neural networks the neurons in our |
---|
0:03:29 | models are simply actors that i have a nonlinear thresholding a of the output |
---|
0:03:36 | we have other equations that define synaptic weights between the neurons |
---|
0:03:41 | and we adjust these weights in a learning process is better described in a bit |
---|
0:03:45 | we test the model using a number of different types of experiments we use motor and auditory cycle physics experiments |
---|
0:03:52 | to look at speech look at the formant frequencies for example drinks different speech task |
---|
0:03:57 | and we also use functional brain imaging including fmri but also i'm E G and E G to try to |
---|
0:04:04 | verify the model or i help us improve the model by pointing out weaknesses in the model |
---|
0:04:10 | and the final set of things we do a given that we're a computational neuroscience department we're interested in |
---|
0:04:17 | producing a technologies also that are capable of helping people with communication disorders and i'll talk about one project involves |
---|
0:04:24 | the development of neural prosthesis or a allowing people to speak to have problems with their that speech out |
---|
0:04:34 | the studies we carry out are largely organised around one particular model which we call the diva model and this |
---|
0:04:41 | is a neural network model of speech acquisition and production that we've developed over the past twenty years in our |
---|
0:04:46 | lab |
---|
0:04:48 | so in today's talk up first give you an overview of the diva model including a description of the process |
---|
0:04:53 | of learning that allows the model to tune up so that it can produce speech sound |
---|
0:04:57 | i'll talk a bit about how we extract simulated fmri activity from the model fmri is functional magnetic resonance imaging |
---|
0:05:05 | and this is a technique for measuring blood flow in the brain and areas of the brain that are active |
---|
0:05:11 | during that |
---|
0:05:12 | have increased blood flow one so we can identify from fmri what parts of the brain are most active for |
---|
0:05:17 | a task and differences in activities for different at task |
---|
0:05:22 | condition |
---|
0:05:23 | this allows us to test the model and i'll show an example of this where we use auditory perturbation of |
---|
0:05:28 | speech in real time so that a speaker is saying word but they hear something slightly different |
---|
0:05:33 | and we use this to test a particular aspect of the model which involves auditory feedback control of speech |
---|
0:05:40 | and then model and the talk with a presentation of a project that involved |
---|
0:05:46 | communication disorders in this case an extreme communication disorder in a patient with locked in syndrome was completely paralysed and |
---|
0:05:54 | unable to move |
---|
0:05:56 | and so we are working on prosody sees more people in this condition to help restore their ability to speak |
---|
0:06:03 | so that they can communicate with people around them |
---|
0:06:08 | this slide usable schematic of the diva model i will not be talking about the full model much i will |
---|
0:06:14 | use a simplified schematic in a minute |
---|
0:06:16 | a what i want to point out is that the different blocks in this diagram correspond to different brain regions |
---|
0:06:23 | that in include different |
---|
0:06:25 | what we call neural maps a neural map in our terminology is simply a set of neurons that represent a |
---|
0:06:32 | particular type of information so and motor cortex for example down here in the vector motor cortex part of the |
---|
0:06:38 | model we have articulator velocity imposition map |
---|
0:06:42 | what these are neurons basically that command that positions of speech articulators in and articulatory synthesizer |
---|
0:06:51 | i would just schema ties here so the output of our model is a set of commands to an articulatory |
---|
0:06:56 | synthesizer this is just a piece of software which you provide a set of articulator positions as input this a |
---|
0:07:04 | synthesiser we use the most is creative actions you my dad involve |
---|
0:07:09 | seven articulatory degrees of freedom there's a job degree of freedom three talking degrees of freedom to live degrees of |
---|
0:07:16 | freedom for opening in profusion |
---|
0:07:18 | and a larynx height degree of freedom and together once you specify these positions of these articulators you can create |
---|
0:07:26 | a vocal tract area function and you can use that area function to synthesise a and acoustic signal that would |
---|
0:07:32 | be produced by vocal tract of bad shape |
---|
0:07:36 | the models |
---|
0:07:39 | productions are that back to model in the form of auditory since mada sensory information that go to maps |
---|
0:07:45 | for auditory statements madison's restate located in auditory cortical areas in herschel drivers and the posterior superior temporal gyro |
---|
0:07:54 | and this may have sensory cortical areas in the central some at a sensory cortex and supra marginal gyro |
---|
0:08:01 | each of the large boxes here represents a map in this report cortex |
---|
0:08:05 | and the smaller boxes represent represents a sub cortical components of the model most notably a base of anglia loop |
---|
0:08:13 | for initiating speech output |
---|
0:08:16 | and sarabelle or loop |
---|
0:08:18 | which contribute to several aspects of production i'm going to focus on the cortical components of the model today for |
---|
0:08:24 | clarity |
---|
0:08:26 | and so i'll use this simplified version of the model which doesn't have all the components but it has all |
---|
0:08:32 | the main processing levels that will need to go to today's talk show the highest level processing in the model |
---|
0:08:40 | is what we call a speech sound at |
---|
0:08:42 | and this is corresponds to cells in the left entropy motor cortex and inferior frontal gyros |
---|
0:08:49 | in what is commonly called broke "'cause" area and then the promoter court are cortex immediately behind broke as area |
---|
0:08:57 | in the model each one of these cells comes to represent a different speech sound and a speech sound in |
---|
0:09:03 | the model can be either a phoneme or syllable or even a multi syllabic phrase the key thing here is |
---|
0:09:10 | that it's something that's produce |
---|
0:09:11 | very frequently so that there's a stored motor program for that speech sound and the canonical sort of speech sound |
---|
0:09:18 | that we use |
---|
0:09:19 | is the syllable so for the remainder the talk i'll talk mostly about yeah syllable production when referring to the |
---|
0:09:24 | speech sound map |
---|
0:09:26 | so cells in the speech sound map project |
---|
0:09:30 | both to be primary motor cortex through what we call a feed-forward pathway at which is a set of learned |
---|
0:09:37 | commands for producing these speech sounds and the activate associated cells in the motor cortex that command the right articulator |
---|
0:09:44 | movement |
---|
0:09:45 | but also be speech map sound map cells project to sensory areas |
---|
0:09:49 | and what they do is they send |
---|
0:09:51 | targets to those sensory area so if i want to produce a particular syllable such as bar |
---|
0:09:57 | when i say bah i expect to hear certain things i expect certain formant frequencies that as a function of |
---|
0:10:03 | time and that information is represented by synaptic projections from the speech sound map over to what we call an |
---|
0:10:10 | auditory error my |
---|
0:10:11 | where this target is compared to incoming auditory information |
---|
0:10:16 | similarly when we produce a syllable we expected to feel a particular way when i say a for example i |
---|
0:10:22 | expect my lips to touch for the B E and then to release |
---|
0:10:25 | for the vowel this sort of information is represented in a smack sensory target that projects over to this matter |
---|
0:10:32 | sensory cortical areas where it is compared to incoming smell sensory information |
---|
0:10:37 | these targets are learned as is this feed forward command during learning process that'll describe briefly in just a minute |
---|
0:10:45 | the arrows in the diagram represent synaptic projections from one type of representation to another |
---|
0:10:52 | so you can think of these synaptic projections is basically transforming information from one sort of representation frame into another |
---|
0:10:59 | representation frame and the main representations we focus on here are |
---|
0:11:04 | phonetic representations in the speech sound map |
---|
0:11:06 | motor representations in the articulator velocity and position maps |
---|
0:11:11 | auditory representations in the auditory maps and finally estimate of sensory representation and smacked sensory map |
---|
0:11:18 | the auditory dimensions we use in the model are typically corresponding to formant frequencies and all that talk about that |
---|
0:11:25 | quite a bit as i go on in the talk |
---|
0:11:27 | whereas this matter sensory targets correspond to things like |
---|
0:11:31 | a fresher tactile information from the lips and the tong while you're speaking as well as muscle information about |
---|
0:11:40 | lengths of muscles that give you a read of where you're articulators are in the vocal tract |
---|
0:11:47 | okay so just to give you feel of what the model does so i'm gonna show the synthesizer the articulatory |
---|
0:11:54 | synthesizer with just purely random movements now so this is |
---|
0:11:58 | at what we do in the very early stages of learning in the model we randomly move the speech articulators |
---|
0:12:05 | that creates auditory information it's mada sensory information |
---|
0:12:09 | from the speech and we can associate auditory information and the smell sensory information with each other and with the |
---|
0:12:16 | motor information that was used to produce the movements of speech so these movements don't sound anything like speech as |
---|
0:12:23 | you'll see here |
---|
0:12:25 | so this is just a randomly activating the seven dimensions of movie |
---|
0:12:32 | so this is what the model does for the first forty five minutes we call this a babbling cycle take |
---|
0:12:37 | about forty five minutes real time to go through this |
---|
0:12:40 | and what the model does is it tunes up many of the projections between the different areas so here for |
---|
0:12:45 | example in red are the projections that are turn tune during this random babbling cycle |
---|
0:12:50 | so the key the key things being learned here are relationships between motor command |
---|
0:12:56 | mada sensory feedback and auditory feedback |
---|
0:12:59 | and in particular what the model needs to learn for producing sounds later is how to correct for sensory errors |
---|
0:13:06 | and so what the model was learning largely is if i need to change my first formant frequency in an |
---|
0:13:13 | upward direction for example because i'm too low |
---|
0:13:16 | then i need to activate a particular set of motor commands and this will come a flow through a feedback |
---|
0:13:21 | control mapped to the motor cortex |
---|
0:13:24 | and will translate this auditory error into a motor corrective command |
---|
0:13:29 | and similarly if i feel that my lips are not closing enough for be there will be a smack sensory |
---|
0:13:36 | error representing that and that's ml sense rare will then be mapped into a corrective motor command in the motor |
---|
0:13:41 | cortex |
---|
0:13:43 | these arrows in red here are the transformations basically or synaptic weights their encoding these transformations and they're tuned up |
---|
0:13:51 | during this babbling cycle |
---|
0:13:54 | well |
---|
0:13:55 | after the babbling cycle so to this point the model still has no sense of speech sounds this is correspond |
---|
0:14:01 | very early babbling in infant |
---|
0:14:04 | up to about six months of age before they start really learning in producing sounds from a particular language and |
---|
0:14:11 | the next stage of the model handles the learning of speech sounds from a particular language and this is the |
---|
0:14:16 | imitation process in the model |
---|
0:14:18 | and what happens in the imitation process is we provide the model with an auditory target so we give it |
---|
0:14:23 | a sound file of somebody producing a word or phrase |
---|
0:14:28 | the formant frequencies are extracted and are used as the auditory target for the model |
---|
0:14:34 | and the model then attempts to produce the sound by reading out whatever feed forward commands it might have if |
---|
0:14:41 | it just heard the sound for the first time for the first time it will not have any feed forward |
---|
0:14:46 | commands because it hasn't yet produce the sound it doesn't know what commands are necessary to produce the sound |
---|
0:14:51 | and so in this case it's going to rely largely on auditory feedback control in order to produce the sound |
---|
0:14:57 | because all it has an auditory target |
---|
0:14:59 | the model attempts to produce the sound it makes some errors but it does some things correctly due to the |
---|
0:15:05 | feedback control and it takes whatever commands are generated on the first attempt and uses them as the feed forward |
---|
0:15:11 | command for the next attack |
---|
0:15:13 | so the next attempt now has |
---|
0:15:16 | a better feed forward command so there the there will be fewer errors will be a less of a correction |
---|
0:15:22 | but again both the |
---|
0:15:24 | a feed forward command and the correction added together that's the total output that's then |
---|
0:15:29 | turned into the feed forward command for the next iteration and with each iteration the air gets smaller and smaller |
---|
0:15:35 | due to the incorporation of these corrective motor commands into the feed forward command |
---|
0:15:41 | just to give you an example of what that sounds like so here is an example that was presented to |
---|
0:15:46 | the model a ford learning |
---|
0:15:50 | the dog |
---|
0:15:52 | this is a speaker saying good doggy and |
---|
0:15:54 | here that are more |
---|
0:15:57 | a dog |
---|
0:15:58 | and what the model is going to now try to do is it's going to try to mimic this with |
---|
0:16:03 | initially no feed forward command and just using auditory feedback control auditory feedback control system was tuned up during the |
---|
0:16:11 | earlier babbling stage |
---|
0:16:13 | and so it does a reasonable rendition but it's kind of sloppy |
---|
0:16:17 | i |
---|
0:16:18 | this is the second attempt it'll be significantly improve because the commands feedback commands from the first attempt to been |
---|
0:16:25 | now moved into the feed forward command |
---|
0:16:31 | i |
---|
0:16:32 | and then by the sixth attempt the model has perfectly learn the sound meaning that it there are no errors |
---|
0:16:39 | in its formant frequencies which is all i can hear from the sound pretty much and so it sounds like |
---|
0:16:44 | this |
---|
0:16:47 | this was the original |
---|
0:16:49 | a dog |
---|
0:16:50 | so what you can here is that the formant frequencies pretty much track the original formant frequencies in this case |
---|
0:16:55 | they tracked imperfectly we looked at just the first three formant frequencies of the speech sound |
---|
0:17:01 | when doing this and so in this case we would say the model has learned to produce this phrase now |
---|
0:17:06 | so it would have a speech sound map sell devoted to that phrase if we activate that sell it reads |
---|
0:17:12 | the phrase out now with no error too |
---|
0:17:16 | well an important aspect of this model is that it's a neural network in the reason we chose the neural |
---|
0:17:22 | network construction is so that we could |
---|
0:17:25 | investigate brain function in more detail so what we've done is we've taken each of the neurons in the model |
---|
0:17:31 | and we localise them in a standard brain space a stereo tactic space |
---|
0:17:37 | that is a commonly used for analysing neuroimaging results from experiments such as fmri experiments and so here these orange |
---|
0:17:46 | dots represent the different components of the model |
---|
0:17:50 | a here for example this is the central focus in the brain where the motor cortex is in front of |
---|
0:17:55 | the so central focus on the smell sensory cortex is behind it |
---|
0:17:58 | and we have representations of the speech articulators in this region in both hemispheres |
---|
0:18:03 | the auditory cortical areas include state cells and auditory error cells which was a novel prediction we made from the |
---|
0:18:11 | model that these cells would reside somewhere in the higher level auditory cortical areas and i'll talk about testing that |
---|
0:18:17 | prediction in you minute |
---|
0:18:19 | we have some at a sensory cells in the us mass entry cortical areas of the super marginal drivers here |
---|
0:18:26 | and these include are some have sensory error cells also crucial to |
---|
0:18:30 | feedback control |
---|
0:18:32 | and so forth so in general the representations in the model are bilateral meeting there are other neurons for |
---|
0:18:40 | representing the lip are located on in both hemispheres but the highest level of the model the speech sound map |
---|
0:18:47 | is left lateralized and the reason it's left lateralized is that |
---|
0:18:52 | a large amount of data from the neurology literature suggests that |
---|
0:18:57 | the left hemisphere is where we store our speech motor programs |
---|
0:19:01 | in particular if there is damage to the left entropy motor cortex or adjoining brokers area here in the inferior |
---|
0:19:09 | frontal drivers |
---|
0:19:10 | speakers have what's referred to as a proxy of speech and this is an inability to read out the motor |
---|
0:19:17 | programs for speech sound so they hear the sound they understand what the word is a and they |
---|
0:19:24 | they try to say it but they just can't get the syllables to come out and this in our bus |
---|
0:19:30 | because their motor programs represent about the speech sound map cells |
---|
0:19:34 | are damaged due to the stroke if you have a stroke in the right hemisphere in the corresponding location there |
---|
0:19:41 | is no upper active speech is largely spare |
---|
0:19:45 | and in our view this is because the right hemisphere as all described about that are is more involved in |
---|
0:19:51 | feedback control then feed forward control |
---|
0:19:54 | an important insight is that once an adult speakers learn to produce the speech sounds of his or her language |
---|
0:20:01 | and their speech articulators of largely stop growing |
---|
0:20:04 | they don't need feedback control very often because their feed forward commands are already accurate |
---|
0:20:10 | and if you for example listen to the speech of a somebody who became deaf as an adult for many |
---|
0:20:16 | years many years there's speech remains largely intelligible a presumably because these motor programs are intact |
---|
0:20:23 | and they by themselves are enough to produce the speech properly |
---|
0:20:27 | i |
---|
0:20:28 | in an adult however if we do something novel to the person such as |
---|
0:20:32 | block their job why they try to try to speak or we perturbed auditory feedback of their speech then we |
---|
0:20:38 | should reactivate the feedback control system by first activating sensory error cells that detect that they sensory feedback isn't what |
---|
0:20:46 | it should be |
---|
0:20:47 | and then motor correction takes place to the feedback control pathways of the model |
---|
0:20:54 | okay so just to high like the |
---|
0:20:58 | use of these locations what i'll show you now is a typical simulation where we have the model produce an |
---|
0:21:05 | utterance in this case it saying how the |
---|
0:21:08 | and what you'll see you'll hear first the production in our model the activities of the neurons correspond to electrical |
---|
0:21:15 | activity in the brain |
---|
0:21:17 | fmri actually measures blood flow in the brain and blood flow is a function of the electrical activity but it's |
---|
0:21:23 | quite slow down relative to the activity peaks for five seconds after the speeches started and so what you'll see |
---|
0:21:32 | is |
---|
0:21:33 | the brain activity starting to build up in terms of blood flow over time after the utterances produced |
---|
0:21:41 | so here the utterance was at the beginning but only later D C they hemodynamic response and this is actually |
---|
0:21:46 | quite useful for us because we can do neuroimaging experiments |
---|
0:21:50 | where people speak in silent |
---|
0:21:53 | and then we collect data after they're done speaking at the peak of this blood flow so what we would |
---|
0:21:58 | do is basically have them speak in silence and |
---|
0:22:03 | at this point we would take scans with an fmri scanner is very loud which would interrupt the speech if |
---|
0:22:09 | it was going on during your speech but in this case were able to scan after the speech is completed |
---|
0:22:14 | and get a measure of what brain activity what brain regions where active and how active they were during speech |
---|
0:22:21 | production |
---|
0:22:23 | okay so that's an overview of the model next what i'll do is going to a little more detail about |
---|
0:22:28 | the functioning of the feedback control system |
---|
0:22:31 | and my main goal here is simply to give you i feel for the type of experiment we do we've |
---|
0:22:36 | done many experiments of this sort to test and refine the model over the years |
---|
0:22:41 | and the experimental talk about in this case is an experiment involving auditory perturbation of the speech signal well subject |
---|
0:22:48 | is speaking in an M R I scan |
---|
0:22:51 | so just to review then the model has the feed forward control system shown on the left ear and the |
---|
0:22:59 | feedback control system shown on the right |
---|
0:23:01 | and feedback control has both an auditory and isomap sensory component |
---|
0:23:06 | so during production of speech when we activate this speech sound map cell to produce the speech sound |
---|
0:23:13 | in the feedback control system we read out these targets to the sum at a sensory system into the auditory |
---|
0:23:18 | system and those targets are compared to the incoming auditoriums mada sensory information |
---|
0:23:25 | the targets take the form of regions so there's an acceptable region of F one that they can be in |
---|
0:23:30 | if they're anywhere within this region there okay but if they go outside of the region and ever cell is |
---|
0:23:35 | activated and that will drive the |
---|
0:23:38 | oh and by driving articulator movements that will move it back into the appropriate target region |
---|
0:23:44 | so |
---|
0:23:45 | if we have an error arising in one of these maps and in particular we're gonna be focusing on the |
---|
0:23:51 | auditory error map |
---|
0:23:53 | what happens next in the models that the sarah gets transform |
---|
0:23:56 | through a feedback control map in the right up we motor cortex |
---|
0:24:01 | and then projected to the motor cortex in the form of a corrected motor command and so what the model |
---|
0:24:07 | is essentially learned is how to take auditory errors and correct them with motor movement |
---|
0:24:13 | in terms of mathematics this corresponds to a pseudo inverse of that you colby in matrix that relates the articulatory |
---|
0:24:20 | and auditory spaces |
---|
0:24:22 | and this can be learned during babbling simply by moving the articulators around and seeing what changes in some at |
---|
0:24:28 | a sensory and auditory state take place |
---|
0:24:31 | the fact that we have this feedback control map in the right entropy motor cortex now when the model that |
---|
0:24:36 | was partially the result of the experiment that i'll be talking about this was not originally in the model originally |
---|
0:24:42 | these projections what's of the primary motor cortex |
---|
0:24:44 | i'll show the experimental result the cost us to change that component of the model |
---|
0:24:50 | okay |
---|
0:24:52 | so i based on this feedback control system we can make some explicit predictions about brain activities during speech |
---|
0:24:59 | and in particular we made some predictions about what would happen if we shifted your first formant frequency during speech |
---|
0:25:07 | so that when we set it back to you over earphones in fifty milliseconds you hear something slightly different than |
---|
0:25:14 | what you're actually producing |
---|
0:25:16 | well according to our model the should "'cause" activity of cells and auditory error map which we have localised to |
---|
0:25:24 | posterior superior temporal drivers and that the adjoining plan and temporal these regions in these still be in fig |
---|
0:25:31 | on the temporal lobe |
---|
0:25:32 | so we should see increased activity there if we perturbed the speech |
---|
0:25:38 | and also we should see some motor corrective activity because according to our model the feedback control system will kick |
---|
0:25:45 | in when it hears this error even during that particular and |
---|
0:25:48 | and it will try to correct if the utterance is long enough it will try to correct the error that |
---|
0:25:54 | is her |
---|
0:25:56 | now keep in mind that auditory feedback takes time to get back up to the brain so that i'm from |
---|
0:26:02 | motor cortical activity tomb movement and sound output to get hearing that sound output in project |
---|
0:26:09 | ejecting about up to your auditory cortex is somewhere in the neighbourhood of a hundred two hundred fifty milliseconds |
---|
0:26:16 | and so we should see a corrective command kicking in not at the instant that the perturbation start |
---|
0:26:22 | what about a hundred or one twenty five milliseconds later because that's how long it takes to process this auditory |
---|
0:26:28 | feedback |
---|
0:26:30 | so what we did was we developed a digital signal processing system that allowed us to shift the first formant |
---|
0:26:37 | frequency in real-time meaning that a subject hears the sound with a sixty millisecond delay which is pretty much unnoticeable |
---|
0:26:46 | to the subject |
---|
0:26:47 | even unperturbed speech has that same sixty millisecond delay so they're always hearing |
---|
0:26:52 | a slightly delayed version other speech over headphones we play a rather loud over the headphones and they speak quietly |
---|
0:26:59 | as a result of this and the reason we do that as we want to minimize things like bone conduction |
---|
0:27:04 | of the actual speech |
---|
0:27:06 | and make them focus on the auditory feedback that we're providing them which is the perturbed auditory feedback |
---|
0:27:12 | and what we do in particular is we take the first formant frequency and in one fourth of the utterances |
---|
0:27:18 | we will perturbed it either up or down so three out of every four utterances are unperturbed |
---|
0:27:25 | one in four is perturbed well excuse me one in eight is perturbed up and one in eight is perturbed |
---|
0:27:32 | down so |
---|
0:27:33 | they get these perturbations randomly distributed they can't predict them because first of all the direction changes all the time |
---|
0:27:42 | and secondly because many of the productions are not prepare |
---|
0:27:46 | and oh what we did well here's what this sounds like so the people were producing vowels |
---|
0:27:52 | that the bout and so the words that they would produce work are words like that and pack and |
---|
0:27:59 | pads |
---|
0:28:00 | and here's an example of on shifted speech before the perturbation |
---|
0:28:08 | i |
---|
0:28:09 | and here is a case where we've shifted F one upward and upward shift about one corresponds to a more |
---|
0:28:16 | open mouth and that should make the pet |
---|
0:28:19 | a vowel sound a little bit more like an ad |
---|
0:28:22 | and so if you hear the perturbed version of that production |
---|
0:28:27 | i |
---|
0:28:27 | it sounds more like that then yeah in this case so that original |
---|
0:28:33 | sorry |
---|
0:28:38 | i |
---|
0:28:40 | hi |
---|
0:28:41 | so it's consciously noticeable to you now when i play to you like this but most subjects don't notice what's |
---|
0:28:46 | going on during the experiment we asked them afterwards that they notice anything sometimes will say |
---|
0:28:52 | occasionally my speech sound a little odd but usually they didn't really notice that much of anything going on with |
---|
0:28:59 | their speech and yeah their brains are definitely picking up this difference and we found that without them or i |
---|
0:29:07 | we also look at their formant frequencies so what i'm showing here is |
---|
0:29:13 | a normalized for F one |
---|
0:29:16 | and what normalize means in this case is that the F one in a baseline on perturbed utterance |
---|
0:29:22 | is what we expect to see that will take the F one in a given utterance we'll compared to that |
---|
0:29:29 | baseline |
---|
0:29:30 | it's exactly the same then we'll have a value of one so if they're producing the exact same thing is |
---|
0:29:36 | they do in the baseline they would stay flat on this value of one |
---|
0:29:39 | on the other hand if they're increasing their F one then we'll see the normalized F one go about one |
---|
0:29:46 | in if they're decreasing F one will see go below one |
---|
0:29:50 | the |
---|
0:29:51 | gray shaded areas here are the competence in ninety five percent confidence intervals of the subjects productions in the experiment |
---|
0:29:59 | and what we see for the down shift is that over time the subjects increase their F one to try |
---|
0:30:06 | to correct for the ad decrease of F one that we |
---|
0:30:09 | given them with the perturbation |
---|
0:30:12 | and in the case where we up shift their speech they decrease F one as shown by this confidence interval |
---|
0:30:18 | here |
---|
0:30:19 | the split between the two occurs right about where we expect which is somewhere around a hundred two hundred and |
---|
0:30:26 | fifty milliseconds after the first sound comes out that a here with the perturbation |
---|
0:30:33 | the solid lines here are the results of simulations of the diva model producing the same speech sounds under perturbed |
---|
0:30:40 | conditions |
---|
0:30:41 | and so the black dashed line here shows the models productions in the option if condition we see weights about |
---|
0:30:47 | a hundred twenty five when this case actually it only weights about eighty milliseconds are delay loop which short here |
---|
0:30:52 | and then it starts to compensate for the utterance |
---|
0:30:56 | similarly in the down shift case it goes for about eighty milliseconds until it starts to your the error and |
---|
0:31:03 | then it compensates in an upward direction |
---|
0:31:05 | and we can see that the models productions fall in a confidence intervals of the subjects production so the model |
---|
0:31:11 | but i produces a good fit of the behavioural data |
---|
0:31:16 | but we also took a look at the neuroimaging data and on the bottom what i'm showing is the results |
---|
0:31:23 | of a simulation that we're and before be study where we generated predictions of fmri activity |
---|
0:31:30 | when we compare shifted speech to non shifted speech as i mentioned one we shift the speech that should uttering |
---|
0:31:37 | these auditory error cells on and we've localise them to these posterior areas of the temporal gyros here |
---|
0:31:44 | when those error cells become active they should lead to a motor correction and these are shown by activities in |
---|
0:31:51 | the motor cortex here in the model simulation |
---|
0:31:55 | now we also see a little bit stale valour activity here in the model but i'll skip that for two |
---|
0:32:00 | days |
---|
0:32:01 | talk |
---|
0:32:02 | here on the top is what we actually got from our experimental results for the ship minus no ship contrast |
---|
0:32:08 | the auditory hair cells were pretty much where we expected them so first of all there are auditory ourselves there |
---|
0:32:15 | are cells in your brain that detect the difference between what you're saying and what you expect it to sound |
---|
0:32:20 | like even as an adult |
---|
0:32:22 | these auditory errors of become active at but we noticed is that the motor corrective activity we saw was actually |
---|
0:32:29 | right lateralized in it was pretty motor it wasn't bilateral and primary motor as we predicted it's farther forward in |
---|
0:32:36 | the brain it's in a more pretty motor cortical real area |
---|
0:32:39 | and it's right lateralized so one of the things we learned from this experiment was that auditory feedback control appears |
---|
0:32:46 | to be right lateralized in the frontal cortex |
---|
0:32:49 | and so we modify the model to have an auditory feedback that |
---|
0:32:53 | are sorry a feedback control map in the right entropy motor cortex area correspond with this region here |
---|
0:33:01 | we actually ran a parallel experiment where we perturbed speech with the balloon in the mouth so we actually |
---|
0:33:09 | we build a machine that |
---|
0:33:11 | a perturbed your job while you were speaking at so you would be saying something like a P and during |
---|
0:33:16 | the how this balloon would blowup very rapidly it was a little was actually the finger of a lot of |
---|
0:33:21 | that would follow up to about a centimetre and half and would block your job from closing so that when |
---|
0:33:26 | you were |
---|
0:33:27 | done with that i'm getting ready to say that consonant and the final vowel key then the job was blocked |
---|
0:33:33 | the job could move as much subjects compensate again |
---|
0:33:37 | and we saw in that experiment activity in their smell sense recordable areas corresponding to this matter sensory error map |
---|
0:33:45 | but we also saw a right lateralized motor cortical activity and so based on these two experiment |
---|
0:33:51 | we modify the model to include a right lateralized feedback control map that we did not have in the original |
---|
0:33:57 | model |
---|
0:34:02 | okay so |
---|
0:34:03 | the other thing we can do is we can look at connectivity in brain activities using techniques such as structural |
---|
0:34:10 | equation modelling a very briefly in a structural equation modelling analysis what we would do is we would use a |
---|
0:34:18 | we define model of connectivity in the brain and then we would go and look at the fmri data and |
---|
0:34:24 | see how much of the covariance matrix of the fmri data we had a can be captured by this model |
---|
0:34:31 | if we optimize the connections and so what as cm does is it |
---|
0:34:36 | reduces connection strings that are produced in that modelling gives you goodness of fit data |
---|
0:34:41 | and in addition to being able to the data very well meaning that are cut connections in the model are |
---|
0:34:47 | in the right place |
---|
0:34:49 | we also noted a an increase in the what what's called effective connectivity so an increase the strength of the |
---|
0:34:56 | effect of these |
---|
0:34:57 | auditory areas on the motor areas in the right hemisphere when the speech was perturbed so the interpretation of that |
---|
0:35:05 | is when i picture of your speech but with an auditory perturbation like this |
---|
0:35:09 | the error cells are active that drives activity in the right that for motor cortex and so we have an |
---|
0:35:14 | increase affect on the motor cortex from the auditory areas in this case |
---|
0:35:19 | and so this is further support for the structure in the model and the feedback control system that we just |
---|
0:35:28 | the score |
---|
0:35:30 | okay so that's one example of an experimental test we've done a very large number of a test of this |
---|
0:35:36 | sort |
---|
0:35:36 | we've tested predictions of can "'em" addicts in the model so we look we work with people who measure articulator |
---|
0:35:45 | movements using |
---|
0:35:46 | electromagnetic articulatory this is a technique where you basically glue receiver coils on the talking in the lips and the |
---|
0:35:55 | job and you can measure the very accurately the position of the articulators of these points on the articulators |
---|
0:36:03 | in the midsagittal plane and from this you can estimate quite a accurately in time the positions of speech articulators |
---|
0:36:10 | and compare them to |
---|
0:36:12 | productions that use the in the model we've done a lot of work looking at for example phonetic context effects |
---|
0:36:19 | in our production which i'll come back to later R is a phoneme in english that is produced with a |
---|
0:36:24 | very wide range of articulatory variability |
---|
0:36:27 | the acoustic cues for are very stable this been shown by people such as voice in S P wilson |
---|
0:36:34 | and what you see in the if you produce movements with the model is that |
---|
0:36:40 | the model will also produce very different articulations for are in different phonetic contexts and this has to do with |
---|
0:36:45 | the fact that it's starting from dish different initial positions and it's simply going to the five closest point to |
---|
0:36:51 | the acoustic target |
---|
0:36:53 | that it can get to and that point will be in different parts of the articulator space depending on where |
---|
0:36:58 | you start |
---|
0:37:00 | we looked at a large number of experiments on other types a particular articulatory movements both in |
---|
0:37:08 | normal hearing and hearing impaired individuals we look at what happens when you put a bite blocked in we look |
---|
0:37:13 | at what happens when you noise mask these speakers and we've also looked at what happens over time for in |
---|
0:37:21 | speech of people with cochlear implants for example so |
---|
0:37:24 | in the case of a cochlear implant recipient that was an adult would already learn to speak |
---|
0:37:29 | when they first |
---|
0:37:31 | receive the cochlear implant they hear a sounds that are not the same as the sounds that they used here |
---|
0:37:38 | so their auditory targets don't match |
---|
0:37:41 | what's coming in from the cochlear implant and it actually impairs their speech for a little while a before about |
---|
0:37:48 | a month or so before they start to improve their speech and by a year it show up very strong |
---|
0:37:54 | improvements in the speech |
---|
0:37:56 | and according to the model this is occurring because they have to retune their auditory feedback control system to deal |
---|
0:38:02 | with the new feedback and only when that auditory feedback control system is tunic and they start to retune the |
---|
0:38:07 | movements to produce more distinct speech data |
---|
0:38:12 | a we've also done a number of neuroimaging experiments for example we predicted that you left entropy motor cortex |
---|
0:38:21 | involves syllabic motor programs |
---|
0:38:24 | and we use the technique called repetition suppression in fmri where you present us to really that change and some |
---|
0:38:32 | dimensions but don't change in other dimensions |
---|
0:38:35 | and with this technique you can find out what is it about the seemingly that a particular brain region cares |
---|
0:38:41 | about and using this technique we were able to show that in fact the only region in the brain that |
---|
0:38:46 | we found that had |
---|
0:38:47 | a syllabic sort of representation was the left entropy motor cortex where we believe these syllabic motor programs are located |
---|
0:38:54 | a highlighting the fact that the syllable is a particularly important entity for motor control |
---|
0:39:00 | and this we believe is because our syllables are very high we a practise and well to the motor programs |
---|
0:39:07 | that we can read out we don't have to produce the individual phonemes we read out the whole syllable as |
---|
0:39:12 | a motor program that we've stored in memory |
---|
0:39:16 | finally we've been able fourteen would lead to even at test the models predictions electra physiologically in this was in |
---|
0:39:24 | a case |
---|
0:39:25 | of a patient with locked in syndrome that'll state speak about in a bit and i'll talk about exactly what |
---|
0:39:30 | we were able to verify using electro physiology in this case actual recording from neurons in the court |
---|
0:39:39 | okay so |
---|
0:39:40 | the last part might talk now will start to focus on using the model to investigate communication disorders |
---|
0:39:47 | and we've done a number of studies of this sort we as i mentioned look that speech in normal hearing |
---|
0:39:54 | and hearing impaired populations |
---|
0:39:57 | we are now doing quite a bit of work on stuttering which is a very common speech disorder that affects |
---|
0:40:03 | about one percent of the population stuttering is a very complicated disorder it's been known |
---|
0:40:10 | since the beginning of time basically every culture seems to have people who stutter within them within that culture people |
---|
0:40:17 | been trying to cure stuttering for ever and we've been unable to do so and the brains of people who |
---|
0:40:23 | stutter are actually |
---|
0:40:24 | really similar to bring the people who don't stutter and unless you look very closely and if you start looking |
---|
0:40:30 | very closely you start to see things like white matter differences |
---|
0:40:35 | and grey matter thickness differences in the brain and these tend to be localised around the base of anglia alamo |
---|
0:40:41 | cortical loop and so are you of stuttering is that several different problems can occur in this loop very difference |
---|
0:40:48 | that people would who stutter |
---|
0:40:51 | can have different locations of damage or of an anomaly in their basic english alma cortical loop and this can |
---|
0:40:59 | lead all of these can lead to stuttering and the complexity of this order is partly because |
---|
0:41:05 | it's a system level disorder where different parts of the system can cause problems it's not always the same part |
---|
0:41:11 | of the system that's a problematic in different people who stutter and so one of the important areas of research |
---|
0:41:19 | for stuttering is |
---|
0:41:20 | computational modelling of this loop to get a much better understanding of what's going on and how these different problems |
---|
0:41:25 | can lead to similar sorts of behaviour |
---|
0:41:29 | we looked at we're looking at what's pass moderate dysphonia which is a vocal fold problem similar to just only |
---|
0:41:36 | it's a |
---|
0:41:37 | a problem where typically the vocal folds are too tense during speech |
---|
0:41:42 | again appears to be basal gangly a loop related |
---|
0:41:46 | a proxy of speech which involves left hemisphere frontal damage a child that a proxy of speech which is actually |
---|
0:41:52 | a different disorder from acquired a proxy a speech this tends to involve more widespread |
---|
0:42:00 | kind of lesser damage but in a more widespread a portion of the brain |
---|
0:42:05 | and so forth and the project all talk most about here will be a project involving neural prosthesis for locked |
---|
0:42:12 | in syndrome and this is a project that we're doing a are we done with bill kennedy from neural signals |
---|
0:42:19 | a locality developed technology for implanting brains of people with locked in syndrome and we help them build a prosthesis |
---|
0:42:28 | from that technology |
---|
0:42:31 | so typically are studies where we're looking at disorders involve some sort of damage version of the model it's a |
---|
0:42:37 | neural network so we can go in and we can mess up white matter projections which are these synaptic projections |
---|
0:42:42 | we can mess up |
---|
0:42:43 | neurons in a particular area we can even adjust things light levels of neurotransmitters some studies suggest that there may |
---|
0:42:53 | be an excess of double mean and some people who stutter |
---|
0:42:56 | well we have added up i mean receptors or base of anglia loop so we can go in and we |
---|
0:43:01 | can start changing double mean levels and seeing how that changes but the behaviour of the model and also the |
---|
0:43:07 | brain activities of the model |
---|
0:43:09 | and what we're doing now is running a number of imaging studies involving people who stutter or we've made predictions |
---|
0:43:15 | based on several possible |
---|
0:43:19 | lead to damage in the brain that may result in stutter stuttering and we're testing those predictions both by seeing |
---|
0:43:25 | if the model is capable of producing stuttering behaviour but also seeing if the brain activities |
---|
0:43:31 | match up with what we see in people who stutter there are many different ways to invoke stuttering in the |
---|
0:43:36 | model but each way causes a different pattern of brain activity to occur |
---|
0:43:42 | so by having both the behavioural results and the neuroimaging or results we can do a much a more detailed |
---|
0:43:49 | treatment of what exactly is going on in this population |
---|
0:43:54 | the example i'm gonna spend the rest of the talk describing is a bit different where in this case the |
---|
0:44:01 | speech motor system of the of the patient was |
---|
0:44:05 | intact |
---|
0:44:06 | but patient was suffering from locked in syndrome due to a brain stem stroke |
---|
0:44:11 | a locked in syndrome is a syndrome where |
---|
0:44:15 | patients have intact cognition and sensation but they're completely unable to perform voluntary movement so it's a case of being |
---|
0:44:23 | almost kind of |
---|
0:44:25 | buried in your own body alive and the patients sometimes have eye movements patient we worked with could vary slowly |
---|
0:44:33 | move his eyes up and down his eyelids actually to answer yes no questions |
---|
0:44:39 | this was the only form of communication here at |
---|
0:44:42 | and so prior to our involvement in the project he was implanted as part of a project developing technologies for |
---|
0:44:51 | locked in patients to control computers or external devices |
---|
0:44:56 | these technologies are referred to by several different names brain computer interface or brain machine interface or neural prosthesis |
---|
0:45:05 | and in this case we were focusing on a neural prosthesis for speech restoration |
---|
0:45:10 | the locked in syndrome is typically caused by either brain stem stroke and eventual ponce or more commonly people become |
---|
0:45:19 | locked in through neural degenerative diseases such as a last which are attacked the motor system |
---|
0:45:25 | people who suffer from a less |
---|
0:45:27 | go through a stage for the later stages of the disease wait where they are basically locked in there unable |
---|
0:45:34 | to move or speak |
---|
0:45:35 | but still fully conscious and with sensation |
---|
0:45:41 | well the electrode that was developed by are calling filled kennedy is schema ties here and here's a photograph of |
---|
0:45:49 | it it's a tiny glass cone that is open on both bands the cone is about a millimetre long they're |
---|
0:45:56 | three gold wires inside the cone |
---|
0:45:59 | there coded with a and insulator except at the very end where the wires cut off and that acts as |
---|
0:46:07 | a recording site so there are three recording sites within the cone one is used as a reference and the |
---|
0:46:12 | other two are used as recording channels |
---|
0:46:15 | and these wires are this electrode is inserted into the stripper cortex here i've got a schematic of the cortex |
---|
0:46:23 | which is good consists of six layers of cell types |
---|
0:46:28 | the goal is to get this near layer five but the cortex |
---|
0:46:32 | where the output neurons are these are the motor neurons that project in the in the motor cortex these are |
---|
0:46:39 | neurons a project for the periphery to "'cause" movement |
---|
0:46:42 | but it doesn't matter too much where you go because the cone is build with i nerve growth factor and |
---|
0:46:47 | what happens is |
---|
0:46:49 | over a month or two X sounds actually grow into this conan lock it into place that's very important because |
---|
0:46:55 | it stops movement if you have movement of a an electrode in the brain |
---|
0:47:00 | use get problems such as cleo says which is scar tissue building up around the electrode and stopping a the |
---|
0:47:06 | electron from picking up signals |
---|
0:47:08 | in this case the wires are actually inside a protected class cone and nobody else's builds up inside the cone |
---|
0:47:16 | so it's a permanent electrode you can implant this electrode and record form from it for many years and if |
---|
0:47:22 | when we did the project all talk about the electorate had been in the subjects brain for over three and |
---|
0:47:28 | a half years |
---|
0:47:31 | so |
---|
0:47:33 | the electrode location was chosen in this case by having subject attempt to produce speech well in a and fmri |
---|
0:47:40 | scanner |
---|
0:47:41 | and what we i noticed was that the brain activity is a relatively normal looks like brain activity of |
---|
0:47:49 | of a neurological a normal person trying to produce speech and in particular we there's a blob of activity on |
---|
0:47:57 | the three central drivers which is the location of the motor cortex |
---|
0:48:01 | in the region where we expect for speech so i'm going to refer to this region of speech motor cortex |
---|
0:48:08 | this is where the electrode was implanted so this is an fmri S can perform before implantation here is actually |
---|
0:48:15 | a C T scan afterwords where you can see in the same brain area the wires of the electrode coming |
---|
0:48:21 | out |
---|
0:48:22 | this is bottom picture is a three D A C T scan showing this call a where you can see |
---|
0:48:29 | the training out to me where the electorate was inserted you can see the wires coming out and the wires |
---|
0:48:34 | go into a package of electronics that is located under the skin |
---|
0:48:39 | and these electronics amplify the signal and then send it is radio signals across the scout |
---|
0:48:44 | we attach intent as basically that just antenna coils to the scout so the subject has a normal looking had |
---|
0:48:52 | yes hair on his head there's nothing sticking out of his head |
---|
0:48:56 | when he comes into the lab we attach these antenna to the scout eight we tune them to just the |
---|
0:49:02 | right frequencies and they pick up the two signals that we are generating from are electrode |
---|
0:49:08 | the signals are then routed to a recording system and then to a computer where we can operate on those |
---|
0:49:13 | signals |
---|
0:49:14 | in real time |
---|
0:49:17 | well |
---|
0:49:18 | oh |
---|
0:49:19 | kennedy had implanted the patient two years before we are several years before we got involved in the project |
---|
0:49:27 | but they were having trouble decoding the signals and part of the problem is |
---|
0:49:31 | that if you look in motor cortex there's nothing obvious that corresponds to a word or for that syllable or |
---|
0:49:38 | phoneme you don't see neurons turn on when the subject produces a particular syllable and then shut off twenty the |
---|
0:49:46 | subjects done |
---|
0:49:47 | a U C instead that all the neurons are just subtly changing their activity over time so there it appears |
---|
0:49:53 | that there's some sort of continuous representation here in the motor cortex there's not a representation of just words and |
---|
0:49:59 | phonemes at least at the motor level |
---|
0:50:02 | a cantonese a group contacted us because we had a model of what these brain areas are doing and so |
---|
0:50:09 | we collaborated on decoding these signals and routing them to a speech synthesizer so the subject could actually control some |
---|
0:50:17 | speech output |
---|
0:50:19 | well |
---|
0:50:20 | the tricky question here is what is the neural code for speech in the motor cortex |
---|
0:50:26 | and the problem of course is that there are no prior studies people don't go into a human motor cortex |
---|
0:50:33 | and record normally |
---|
0:50:35 | and monkeys don't speak you know whether animals speak so we don't have any single cell data about what's going |
---|
0:50:41 | on in the motor cortex during speech we have data from our movements and we use the insights from this |
---|
0:50:48 | data |
---|
0:50:48 | yeah but we are also used insights from what we saw in human speech movements to determine what where the |
---|
0:50:54 | variables that these people were controlling what was the motor system caring about |
---|
0:50:59 | mostly to care about muscle positions or data care about the sound signal |
---|
0:51:04 | and there is some available data from simulation studies the motor cortex these come from |
---|
0:51:11 | the work by up and field who work with epilepsy patients who were having surgeries to remove portions of the |
---|
0:51:18 | cortex that were |
---|
0:51:19 | causing a epileptic fits |
---|
0:51:22 | before they did the removal what they would do is actually stimulate in the court ecstasy out what |
---|
0:51:30 | parts of the brain we're doing why any particular what they wanted to do was avoid parts of the brain |
---|
0:51:35 | involved in speech and they mapped out along the motor cortex areas that "'cause" movements of the speech articulators for |
---|
0:51:41 | example and other areas that caused interruptions of speech and so for |
---|
0:51:46 | and these studies were informative and we help we use them to help us determine where to localise some of |
---|
0:51:52 | the neurons in the model but they don't really tell you about what kind of representation is being used by |
---|
0:51:57 | the neurons when you stimulate a portion of cortex are stimulating hundreds of neurons minimally they were using something like |
---|
0:52:04 | two bolts for stimulation the maximum activity even ron is fifty five mill of also the stimulation signal was dramatically |
---|
0:52:11 | bigger than any natural signal |
---|
0:52:13 | and it activates a large area of cortex and so you see a gross |
---|
0:52:17 | where lee form the movement coming out and speech movements tended to be things like that our price of the |
---|
0:52:22 | subject might say that |
---|
0:52:24 | something like this adjust the of movement it's not really a speech sound they don't produce any words or anything |
---|
0:52:30 | like that |
---|
0:52:31 | and from these sorts of studies it's next to impossible to determine what sort of representation is going on in |
---|
0:52:37 | the motor cortex |
---|
0:52:39 | a however we do have our model which does provide the first explicit characterisation of what these response properties should |
---|
0:52:46 | be of speech motor cortical cells we have actual speech motor cortical cells in the model they are tuned to |
---|
0:52:52 | particular things |
---|
0:52:54 | and so what we did was we use the model to guide are search for information in this part of |
---|
0:53:00 | the brain |
---|
0:53:01 | and i want to point out that the characterisation provided by the model was something that we spent twenty years |
---|
0:53:08 | refining so we ran a large number of experiments testing different possibilities about how speech was control |
---|
0:53:15 | and we ended up with a particular format in the model and that's no coincidence that's because we spent a |
---|
0:53:22 | lot of time looking at that in here is the result of one such study which a highlights the fact |
---|
0:53:28 | that in motor planning |
---|
0:53:30 | sound appears to be more important than where you're talking is actually located and this is a study of the |
---|
0:53:37 | phoneme are that i mentioned before just to describe what you're going to see here so that the each of |
---|
0:53:43 | these lines you see represents a tongue shape |
---|
0:53:47 | and they're to chunk shapes in each panel there's a dashed line |
---|
0:53:52 | so this is the tip of the time this is the centre the tongue in this back of the tongue |
---|
0:53:56 | where actually measuring the positions of these transducers that are located on the time using a thirty kilometre E |
---|
0:54:01 | and the dashed lines show the tongue shape that occurs seventy five milliseconds before |
---|
0:54:09 | B centre of the R which happens to be they minimum of the F three trajectories |
---|
0:54:14 | and the dark bold lines show the tongue shape at the center ready are a or at that have three |
---|
0:54:21 | minimum so in this case you can see the speaker used |
---|
0:54:24 | and upward movement other tongue tip to produce the R |
---|
0:54:28 | in this panel |
---|
0:54:30 | so what we have over here in our two separate subjects where we have measurements from the subject on the |
---|
0:54:36 | top row and then productions of the model represented in the bottom row and the model was actually using speaker-specific |
---|
0:54:43 | vocal tract in this case so |
---|
0:54:45 | what we did was we took the subject we are collected a number of them are i stands while they |
---|
0:54:50 | were producing different phonemes |
---|
0:54:52 | we did principal components analysis to pull out their main movement degrees of freedom we had their acoustic signals and |
---|
0:54:58 | so we built a synthesiser that had their vocal tract shape and produce their formant frequencies |
---|
0:55:04 | then we had the diva model learned to control their vocal tract so we put this vocal tract synthesiser in |
---|
0:55:10 | place of the my the synthesizer we battled the vocal tract around had it learn at to produce hours and |
---|
0:55:16 | then we went back and had it |
---|
0:55:18 | produce the estimate lee in the study and in this case the people producing utterances |
---|
0:55:24 | walk around |
---|
0:55:25 | what drum and one row of so B R was either preceded by a sound at the orgy |
---|
0:55:34 | what we see is that the subject produces very different movements in these three cases so in a context the |
---|
0:55:40 | subject uses it upward movement of the tongue tip like we see over here |
---|
0:55:44 | but in the D context the subject actually move their tongue backwards to produce the R |
---|
0:55:49 | in the G context they move their time downward to produce the are so they're using three completely different gestures |
---|
0:55:55 | are articulatory movements to produce the R and yet the producing pretty much the same after each race the F |
---|
0:56:01 | three traces are very similar in these cases |
---|
0:56:04 | if we take the model and we have it produce R's with the speaker-specific vocal tract we see that the |
---|
0:56:11 | model because it cares about the acoustic signal primarily it's trying to get these F three target |
---|
0:56:17 | and the model also uses different movements in the different context an impact the movements reflect the movements of the |
---|
0:56:23 | speaker so here the model uses an upward movement of the tongue tip here the model uses the backward movement |
---|
0:56:29 | of the time and here the model uses a downward movement of the time to produce are so |
---|
0:56:34 | what we see is that with a very simple model that's just going to be appropriate position and formant frequency |
---|
0:56:39 | space we can capture this complicated variability in the articulator movements |
---|
0:56:45 | of the actual speaker |
---|
0:56:47 | a another thing to note here is this is the second speaker again the model replicates the movements and the |
---|
0:56:53 | model also capture speaker-specific differences here in this case the speaker use the small upward tongue tip movement to produce |
---|
0:57:01 | the R |
---|
0:57:02 | up at the speaker for reasons having to do with the morphology of their vocal tract had to do a |
---|
0:57:06 | much bigger movement of the tongue tip to produce the are in a contact |
---|
0:57:11 | and again the model produces a bigger movement in this speakers case than in the speaker space so |
---|
0:57:17 | this provides a pretty solid data that speakers are really concentrating on |
---|
0:57:21 | the formant frequency trajectories of their speech output more so than where the individual articulators were located |
---|
0:57:29 | and so we made production and that we should see formant frequency representations in the speech motor cortical area if |
---|
0:57:38 | we're able to look at what's going on during speech |
---|
0:57:42 | a the slide i'm sure everybody here follows this appears actually the formant frequency traipse traces for good doggy this |
---|
0:57:51 | is what i'd use of the target for the |
---|
0:57:54 | simulations i showed you earlier and down here i show the first two formant frequencies what's called the formant frame |
---|
0:58:00 | plane and the important point here is that if we can move if we can just change F one and |
---|
0:58:06 | F two we can produce pretty much all of the vowels |
---|
0:58:09 | of the language because they are differentiated by their first two formant frequencies and so formant frequency space provides a |
---|
0:58:18 | very low dimensional continuous space for the planning of movements |
---|
0:58:22 | and that's crucial for the development of the brain computer interface |
---|
0:58:27 | okay and why is a crucial well |
---|
0:58:31 | there have been our number brain computer interfaces that involve implants and the hand area |
---|
0:58:37 | of the motor cortex |
---|
0:58:39 | and what they do usually is they decode cursor position on the screen from neural activities in the hand area |
---|
0:58:46 | and people learn to control movement of a cursor by who are activating their neurons in their hand motor cortex |
---|
0:58:55 | now they when they build these interfaces they don't try to decode all of the joint angles of the arm |
---|
0:59:01 | and then determine where the cursor would be based on where the mouse would be instead they go directly to |
---|
0:59:06 | the output space in this case the two dimensional cursor space |
---|
0:59:11 | in the reason they do that is we're dealing with a very small number of neurons in these sorts of |
---|
0:59:15 | studies relative to the entire motor system there are hundreds of millions of neurons involved in your motor system |
---|
0:59:21 | and in the best case you might get a hundred neurons in the brain computer interface we were actually getting |
---|
0:59:26 | far fewer from that then that we had a very old in plant that only had two electrode wire |
---|
0:59:32 | so we were getting somewhere we had less than ten neurons maybe is a few as two or three neurons |
---|
0:59:38 | we could pull out more signals than that but they weren't signal nor on activities |
---|
0:59:42 | well if we tried to pull out a high dimensional representation of the arm configuration from a small number of |
---|
0:59:49 | neurons we can have a tremendous amount of error and this is why they don't do that instead they try |
---|
0:59:54 | to pull out a very low dimensional thing which is this two D cursor position |
---|
0:59:58 | well we're doing the analogous thing here instead of trying to pull out all of the articulator positions that determine |
---|
1:00:04 | the shape of the vocal tract we're simply going to the output space which is the formant frequency space which |
---|
1:00:10 | for the for about production can be as simple as a two-dimensional signal |
---|
1:00:16 | okay so what we're doing is basically decoding and intended sound position in this two D formant frequency space |
---|
1:00:23 | that's generated from motor cortical cells a but is a much lower dimensional thing then the entire vocal tract shape |
---|
1:00:32 | well the first thing we need to do is verify that this formant frequency information was actually in this part |
---|
1:00:37 | of the brain and the way we did this was we had that subject try to imitate a minute long |
---|
1:00:44 | vowel sequence that was something like |
---|
1:00:46 | yeah year who this lasted a minute and they were told the subject was told to do this in synchrony |
---|
1:00:56 | with the stimulus |
---|
1:00:58 | this is crucial because we don't know otherwise when he's trying to speak up because no speech comes out and |
---|
1:01:04 | so what we do is we record the neural activities during this minute long attempted utterance |
---|
1:01:08 | and then we try to map them into the formant frequencies that the subject was trying to imitate so the |
---|
1:01:14 | square wave here right which is kind of the C is that the actual in this case actually have to |
---|
1:01:21 | going up and down and here's the actual F one going up and down for the different vowels |
---|
1:01:27 | and the solid are not bold squiggly line here is the decoded signal a it's not great but it's actually |
---|
1:01:35 | highly statistically significant we did cross validated training and testing and we had a very highly significant |
---|
1:01:42 | a representation of the formant frequencies our values one point six nine four F one point six eight for F |
---|
1:01:48 | two and so this verifies that there is indeed formant frequency information in your primary motor cortex |
---|
1:01:55 | and so the next step was simply to use this information to try to produce speech output |
---|
1:02:00 | just as a review for most of you formant synthesis of speech has been around for a long time goner |
---|
1:02:07 | font for example in nineteen fifty three use this very large piece of electronic equipment here |
---|
1:02:14 | with this style was on a two-dimensional pad and what he did was he would be stylus around on the |
---|
1:02:20 | pad and the location of the stylus was i location in the F one F two space |
---|
1:02:26 | so is basically moving around in the formant plane and just by moving this cursor around in this two dimensional |
---|
1:02:32 | space is able to produce |
---|
1:02:33 | intelligible speech so here's an example |
---|
1:02:39 | i |
---|
1:02:41 | so the good news here is that with just two dimensions some degree of speech output can be produced |
---|
1:02:47 | consonants are very difficult i'll get back to that at the end but certainly bows are possible with this sort |
---|
1:02:53 | of synthesis |
---|
1:02:55 | so what we did was we took the system and we so here is a schematic are electrode in the |
---|
1:03:01 | speech motor cortex |
---|
1:03:03 | is recorded by this are picked up and amplified and then sent across the sky now |
---|
1:03:08 | we record the signals and we then run them through a neural decoder and what the neural decoder does is |
---|
1:03:15 | it predicts what formant frequencies are being attempted based on the activities so it's trained up on one of these |
---|
1:03:21 | one minute long sequences |
---|
1:03:23 | and once you train it up then it can take a set of a neural activities and translate that into |
---|
1:03:30 | a predicted first and second formant frequency which we can then send over a speech synthesiser to the subject |
---|
1:03:36 | the delay from the brain activity to the sound output was fifty milliseconds in our system and this is approximately |
---|
1:03:42 | the same delay as |
---|
1:03:43 | your motor cortical activity to your sound output and this is crucial because if the subject is going to be |
---|
1:03:49 | able to learn to use this synthesiser you need to have an actual feedback delay if you delay speech feedback |
---|
1:03:55 | by a hundred milliseconds in a normal speaker |
---|
1:03:58 | they start to become highly disfluent they go through some stuttering like behaviour they'll start talking it's very disruptive so |
---|
1:04:08 | it's important that this thing at operates very quickly |
---|
1:04:11 | and produces this feedback in a natural time frame |
---|
1:04:17 | now what i'm gonna show is the subsets performance with the speech bci so we had "'em" produce a about |
---|
1:04:24 | tasks so subject would start out at the centre about |
---|
1:04:29 | then would it is |
---|
1:04:31 | ask on each trial was to go to about that we told him to go to so in the video |
---|
1:04:36 | well play you'll hear the computer say |
---|
1:04:38 | listen |
---|
1:04:39 | and it'll say something like yea i |
---|
1:04:42 | and it'll say speak and then he supposed to say E with the synthesiser so you'll hear his sound output |
---|
1:04:49 | as produced by the synthesizer as the attempts to produce the bow that was being that presented in you'll see |
---|
1:04:56 | that the target values in green here |
---|
1:04:59 | the cursor you'll see is the subjects location in the formant frequency space |
---|
1:05:04 | a most of the trials we did not provide visual feedback the subject didn't need visual feedback and we saw |
---|
1:05:10 | no increase in performance from visual feedback E instead use the auditory feedback that we produced from the synthesiser to |
---|
1:05:17 | produce a better and better speech |
---|
1:05:20 | or what speech sounds at least and so here are five examples five consecutive productions in a block |
---|
1:05:31 | we speak |
---|
1:05:35 | a so that's a directivity very quickly want to the target |
---|
1:05:40 | so |
---|
1:05:45 | be your egos awfully here's the error any kind of steers the back into the target five |
---|
1:05:55 | another directive is next trial isn't here you'll seems to me like yeah the before the timeout |
---|
1:06:12 | but nobody around here |
---|
1:06:18 | so straight to the target so what we saw were |
---|
1:06:21 | to sorta behaviours often times it was straight to the target but other times you would go off a little |
---|
1:06:26 | bit and then you would see him one see her the feedback going off you would see "'em" |
---|
1:06:31 | and presumably in his head he's trying to change the shape of this time we don't to try to you |
---|
1:06:35 | know |
---|
1:06:36 | try to actually say the sound so he's trying to reshape where that sound is going and so you'll see |
---|
1:06:41 | "'em" kind of steered toward the target in those cases so what's |
---|
1:06:48 | happening in these slides is or these panels is i'm showing the error a rate here course they hit rate |
---|
1:06:54 | as a function of block so any given session we would have |
---|
1:06:58 | a four blocks of trials there were about five productions to ten productions per block so during the course of |
---|
1:07:06 | a session he would produce anywhere between about |
---|
1:07:09 | ten that's what course ten to twenty repetitions but about actually five to ten repetitions of each about |
---|
1:07:16 | and when he first starts his hit rate is just below fifty percent that's above chance but it's not great |
---|
1:07:23 | but we see with practise it gets better with each block and by the end he's improved a set rate |
---|
1:07:29 | to over seventy per se |
---|
1:07:31 | on average in a in fact in the later sessions he was able to get up to about ninety percent |
---|
1:07:36 | hit rate if we look at the end point error as a function of block this is how far away |
---|
1:07:41 | he was from the target and formant space i when the trial and that |
---|
1:07:45 | so if it was a success it would be zero if it's not a success and there's an error we |
---|
1:07:50 | see that this pretty much linearly drops off over the course of a forty five minute session |
---|
1:07:56 | and this movement i'm also improves a little bit |
---|
1:07:59 | this slide shows what happens over many sessions so these are twenty five sessions |
---|
1:08:04 | one thing to note here is and this is the endpoint error we're looking at one thing to note is |
---|
1:08:09 | that there's a lot of variability from day to day i'll be happy to talk about that we had to |
---|
1:08:13 | train up a new decoder everyday because we weren't sure we had the same neurons everyday |
---|
1:08:17 | so some days the decoder work very well like here in other days it didn't work so well what we |
---|
1:08:24 | saw on average over the sessions is that the subject got better and better at learning to use the synthesisers |
---|
1:08:30 | meaning that |
---|
1:08:31 | even though he was given a brand new synthesiser on the twenty that session it didn't take "'em" nearly as |
---|
1:08:36 | long to get good it using that a synthesiser |
---|
1:08:42 | well to summarise them for the speech brain computer interface here |
---|
1:08:45 | there are several mount novel aspects of this interface that was the first real time speech brain computer interface so |
---|
1:08:52 | this is the first attempt to actually decode ongoing speech as opposed to pulling out words or moving a cursor |
---|
1:08:59 | to choose words on the screen |
---|
1:09:02 | it was the first real time control using wireless system a wireless is very important for this because |
---|
1:09:10 | if you have a connector coming out of your head which is the case for some patients you get the |
---|
1:09:15 | sort of surgery |
---|
1:09:17 | that connector actually can have an infection build up over build up around it and this is a constant problem |
---|
1:09:24 | for people with this sort of system wireless systems are the weight of the future |
---|
1:09:29 | we were able to do a wireless system because we only had two channels of information a current systems have |
---|
1:09:36 | usually hundred channels or more of information and the wireless technology is still catching up so these hundred channel systems |
---|
1:09:44 | typically still have |
---|
1:09:45 | connectors coming out of the head |
---|
1:09:48 | and finally are project was the first real time control within a lecture that in been implanted for this long |
---|
1:09:53 | the selected within for over three years this highlights the utility of the sort of electrode we you |
---|
1:10:00 | or permanent implantation the speech that came out was extremely rudimentary as you saw but keep in mind that where |
---|
1:10:08 | we have two tiny wires of information coming out of the brain |
---|
1:10:13 | pulling out information from at ten neurons max |
---|
1:10:17 | out of the hundreds of millions of neurons involved in the system and yet the subject was still able to |
---|
1:10:22 | learn to use the system and improve the speech over time their number things we're working on now to improve |
---|
1:10:28 | this |
---|
1:10:29 | at most notably we're working on improving synthesis that we are developing two-dimensional synthesisers that can produce both vowels and |
---|
1:10:37 | consonants and that sound much more natural than a straight formant synthesiser |
---|
1:10:41 | a number of groups are working on smaller electronics and more electrodes |
---|
1:10:45 | the state-of-the-art now as i mentioned is probably ten times the information that we were able to get out of |
---|
1:10:51 | this brain computer interface so we would expect a dramatic improvement |
---|
1:10:56 | in a performance with the modern system |
---|
1:10:59 | and we're spending a lot of time working on decoding techniques that are i'm improved as well the initial decoder |
---|
1:11:07 | that you give these subjects is a very rough it just gets i mean the ballpark and that's because there's |
---|
1:11:13 | not nearly enough information |
---|
1:11:15 | to an upper decoder properly from a training a sample and so what people are working on include people in |
---|
1:11:22 | our lab are decoders that actually tune while the subject is trying to use the prosthesis of not only is |
---|
1:11:29 | the subjects motor system adapting to use the prosthesis |
---|
1:11:32 | but the prosthesis itself is helping that adaptation by a cutting error down on each production very slowly over time |
---|
1:11:40 | to help the system state to overtime |
---|
1:11:43 | and with that i'd like to |
---|
1:11:45 | again thank at my collaborators and also thank the N I D C D and N S F four funds |
---|
1:11:51 | that funded this research |
---|
1:12:05 | okay so we have time for two questions |
---|
1:12:08 | morgan |
---|
1:12:11 | yeah |
---|
1:12:12 | really interesting to |
---|
1:12:15 | yeah it is pretty strong emphasis and formants room this numbers and speeches in that |
---|
1:12:21 | when you have the playback of doggy the |
---|
1:12:24 | go |
---|
1:12:26 | that's of great so right is there other work that you're doing with stop consonants are figuring out a way |
---|
1:12:32 | to put things like that in your right eye so i largely focused on performance for simplicity during the talk |
---|
1:12:39 | the smell sensory feedback control system in the model actually does |
---|
1:12:44 | a lot of the work for stop consonants so for example for a B we have a target for the |
---|
1:12:49 | closure itself or so there is in addition to the formant representation we have tactile dimensions that supplement the targets |
---|
1:13:02 | mass sensory feedback i is i in our model secondary auditory feedback largely because during development we get auditory targets |
---|
1:13:12 | in their entirety from people around us |
---|
1:13:14 | but we don't we can't |
---|
1:13:16 | tell what's going on in their mouth so early development we believe is largely driven by auditory dimensions |
---|
1:13:21 | this may have sensory system learns what goes on when you properly produce the sound and then it later contributes |
---|
1:13:27 | to the production once you build up this madison street target |
---|
1:13:31 | no one other quick note is another simplification here is that |
---|
1:13:35 | at frequencies a strictly speaking are very different for women and children and men and so we when we are |
---|
1:13:43 | using a different voices we use the normalized formant frequency space where we actually use ratios of the formant frequencies |
---|
1:13:51 | to a |
---|
1:13:52 | to help accommodate |
---|
1:13:55 | i |
---|
1:13:57 | the question |
---|
1:13:59 | right well as i think that i think you but i think that scare yeah |
---|
1:14:04 | i mean and if for any debate between and you can literally tag it's very |
---|
1:14:13 | i just one can understand where you're coming from one to because we really and in working with |
---|
1:14:20 | people have |
---|
1:14:21 | and look very similar to a data you know we here |
---|
1:14:27 | there you can delete it look at right |
---|
1:14:31 | then you can get that the articulatory information at that it just use where actually made perfect memory for example |
---|
1:14:38 | yeah okay well that's so that so in my are you |
---|
1:14:44 | the |
---|
1:14:45 | gestural score is more or less equivalent swore ford motor command and that feed forward command is tuned up to |
---|
1:14:52 | hit auditory target so |
---|
1:14:54 | we do have a job in a factor gestural score in the form of a feed-forward motor command and so |
---|
1:14:59 | if you produce speech very rapidly that whole fee for motor command will get read out but it won't necessarily |
---|
1:15:07 | make the right sounds if you push to the limit |
---|
1:15:10 | so for example in the perfect memory case the model would you be you know would do the gesture for |
---|
1:15:15 | the see if it's producing a very rapidly |
---|
1:15:18 | it wouldn't that he may not come out but it would presumably here a slight error in try to correct |
---|
1:15:25 | for that a little bit in later production but |
---|
1:15:28 | to make a long story short my view is that the gestural score which i think does exist is something |
---|
1:15:34 | that is equivalent to a feed-forward motor man |
---|
1:15:37 | and |
---|
1:15:38 | the people model does it show how huge amount that gestural score how you keep it to do over time |
---|
1:15:43 | and |
---|
1:15:44 | things like that okay |
---|
1:15:46 | yeah |
---|
1:15:48 | and then |
---|
1:15:50 | thanks a really amusing talk and |
---|
1:15:53 | oh |
---|
1:15:54 | seems to me that people did review but someone who sensory feedback doesn't really tell you about what those words |
---|
1:16:02 | mean |
---|
1:16:03 | all those mean people through all those sort of visual track on any of the kind of feedback and speech |
---|
1:16:09 | production |
---|
1:16:10 | it absolutely does but we do not have anything like that in the model so we purposely focused on motor |
---|
1:16:16 | control speech is a motor control problem and |
---|
1:16:19 | the words are meaningless in the to the model that of course a simplification |
---|
1:16:25 | track ability for us to be able to study a system that we could actually characterise computationally out were working |
---|
1:16:32 | well or a higher level connecting this model which is kind of a low level motor control model if you |
---|
1:16:38 | will with higher level models of |
---|
1:16:41 | sequencing of syllables and we're starting to think about how these |
---|
1:16:46 | sequencing areas of the brain interact with areas that represent meeting and so middle frontal drivers for example is |
---|
1:16:56 | very commonly associated with some word meaning and temporal oh these areas you know i |
---|
1:17:02 | but the sequencing system a but we have not yet model that so this kind of |
---|
1:17:10 | in our view we're gonna working our way up from the bottom where the bottom is motor control and the |
---|
1:17:15 | top as language |
---|
1:17:17 | we're not that far up there yeah |
---|
1:17:25 | so it was really inspiring talk |
---|
1:17:28 | i'm |
---|
1:17:30 | kind of wondering that thinking about |
---|
1:17:32 | the beginning of your talk and the babbling in imitation |
---|
1:17:36 | face |
---|
1:17:37 | one of the things is pretty |
---|
1:17:41 | apparent from that is that you're starting out effectively with your model with adult vocal tract |
---|
1:17:49 | and they're listening to external stimuli which are also kind of matched so right so what is your take on |
---|
1:17:56 | the i'm i work with very back then a lot on thinking about things like normalisation i'm kinda curious what |
---|
1:18:02 | your take on online |
---|
1:18:05 | how things change as the as you know you get a six month old and their vocal tract rows and |
---|
1:18:12 | stuff like that how do you see that fitting into model well so |
---|
1:18:16 | i think that highlights the fact that |
---|
1:18:18 | formant strictly are not the representation that's used for this transformation from at all you know when the channel here's |
---|
1:18:25 | an adult sample they're hearing the big muscular normalized version of it that their vocal tract can imitate because |
---|
1:18:33 | frequencies themselves they can't imitate but things like so we've looked at a number representations that involve things like wall |
---|
1:18:40 | of the ratio of the formants and so forth |
---|
1:18:43 | and those improve its abilities and they work |
---|
1:18:47 | well in some cases but we haven't found that and like what is that |
---|
1:18:52 | that representation |
---|
1:18:54 | i where i think it is in the brain i think in playing time prowling the higher order auditory areas |
---|
1:18:59 | that's probably where you're representing speech in this |
---|
1:19:02 | are independent manner |
---|
1:19:03 | but what exactly those dimensions are i can't say for sure it's something some normalized formant representation but the ones |
---|
1:19:12 | we tried we tried miller's space for example |
---|
1:19:17 | eighty nine paper a they're not for satisfactory they do a lot of the normalisation from but they don't work |
---|
1:19:23 | that well for controlling movements |
---|
1:19:27 | oh i mean one of the things that i was thinking about is that keith johnson for example really |
---|
1:19:32 | feels like well this normalisation is actually learn phenomenon so it's easy feels like you have some of the machinery |
---|
1:19:38 | there instead of i mean deposit |
---|
1:19:40 | that it's |
---|
1:19:41 | you know it is some operation |
---|
1:19:43 | that you could ever imagine |
---|
1:19:45 | having an adaptive |
---|
1:19:47 | system that actually you know what that normalisation |
---|
1:19:50 | it's possible i there's just so examples like parents being able to see and so forth so i think that |
---|
1:19:56 | there's something about the mammalian auditory system that pulls out that the dimensions that it pulls out naturally are |
---|
1:20:03 | largely speaker-independent already that the i mean it pulls out all kinds of information but for speech system i think |
---|
1:20:09 | it you know that's what's using but |
---|
1:20:11 | i wish i could deviate more satisfactory answer |
---|
1:20:15 | nor did you have a great for a while |
---|
1:20:18 | question from cell and i using that when it |
---|
1:20:22 | is it just the first three data using we have first three are first two depending so for the prosthesis |
---|
1:20:28 | project we just use the first two for the simulations i showed for the rest of the people do the |
---|
1:20:33 | simulations those first three okay "'cause" we just in recent work for example that are |
---|
1:20:39 | and we and then you can tell information about which particular a term shape which is if you look at |
---|
1:20:45 | high of one right and when that is ideal |
---|
1:20:49 | it would be great if you pay include create something like any idea do not know what the other hand |
---|
1:20:55 | i was just gonna say we can look at that so by controlling F one through F three we can |
---|
1:20:59 | see what F |
---|
1:21:01 | for an F five would be for four different are configurations we haven't looked at that yeah but |
---|
1:21:07 | my view is that is that they're perceptually not very important or even salient so of course the physics will |
---|
1:21:13 | make times you know the form a slightly different if your tongue shapes are |
---|
1:21:16 | are different especially for the higher for men |
---|
1:21:19 | but i think that the speakers are you know what they what they perceive and |
---|
1:21:24 | is largely limited to lower formants i think some your earlier work |
---|
1:21:29 | just about |
---|
1:21:31 | no clear and not heard this argument that because you're selling a plate is a christian for see that brad |
---|
1:21:39 | story and then some more dishes at work that i actually did they give you colouring |
---|
1:21:45 | i mean you can a speaker-specific information here to saint and make it sound like a different person it's getting |
---|
1:21:50 | a plan what the values are i see so we yeah so we just fix those formants in our model |
---|
1:21:57 | a zero values for all sounds and |
---|
1:22:00 | you can hear the sounds properly but it like you know the voice quality may well change if we allow |
---|
1:22:04 | them to |
---|
1:22:06 | very good for just one just a continued but at the more like you would be able to add and |
---|
1:22:11 | when you add determine what the acoustic features are that these various case because you get the right to place |
---|
1:22:17 | in about |
---|
1:22:18 | does its trees but you get this will continue on in between right that would be great information people and |
---|
1:22:24 | speaker independent and you know |
---|
1:22:27 | speaker identification and characteristics right and speaker recognition can assistant |
---|
1:22:32 | as well as well speech therapy and pronunciation tools |
---|
1:22:37 | so that just something to think about all revisit that |
---|
1:22:40 | okay so we're gonna close that session because i don't want to sort of a take too much out of |
---|
1:22:45 | the right but like that's like thanks also be gone again |
---|