0:00:16Well the main thing I'm grateful for
0:00:20is for the award and this wonderful medal. It's a
0:00:23amazing honor.
0:00:24And
0:00:27particularly
0:00:28particularly pleasing to me because I love this community. I love the Interspeech
0:00:33community and the Interspeech conferences.
0:00:38Some people in the audience, I don't know who ??, but she knows particularly that
0:00:43I'm particularly proud of my ISCA,
0:00:46previously ESCA, membership number being thirty.
0:00:50And here is a list of the conferences in the Interspeech series starting
0:00:56with the predecessor of the first Eurospeech and it was the meeting in Edinburgh in
0:01:011988.
0:01:02All of the Eurospeech conferences and on the ICSOP
0:01:05conferences and since Interspeech 2000
0:01:08and the one ?? come read and the one I was actually at.
0:01:14And another four that you find my name in the program was
0:01:20co-author or member or area chair.
0:01:24And so that's only three of the them.
0:01:27You see I have nothing to do with it's Genevan, it's Burgan and it's Budapest.
0:01:32I have actually being to
0:01:33Pittsburgh and I've been to Geneva.
0:01:36Pitty about Budapest.
0:01:38Such a lovely city and I'll probably never get the chance. I missed it in
0:01:421999.
0:01:44However I love these conferences
0:01:46and
0:01:52it's the interdisciplinary nature that I particularly
0:01:57appreciate.
0:01:58You heard from the introduction that some
0:02:02interdisciplinary is
0:02:04... well it's heart of psycholinguistics
0:02:07that we're the interdisciplinary undertaking.
0:02:11But I loved the idea from the beginning of bringing all the speech communities together
0:02:16in a single organization and
0:02:20single conference series.
0:02:23And
0:02:24I think the founding fathers of the organisations, the founding
0:02:30members of Eurospeech
0:02:32quite a broad theme there
0:02:35and the founding
0:02:37father or founding fellow, because we
0:02:40never knew who it was, for ICSOP that was Heroi Fujisaki.
0:02:44These people were visionaries
0:02:46and the continuing success of this conference series is a tribute
0:02:52to their vision.
0:02:53Back
0:02:54in the 1980's, early 90's
0:02:58and that's
0:03:00that's why I'm very proud to be to be part of this
0:03:04of this community, this interdisciplinary community
0:03:08and
0:03:10I love the conferences and I'm just tremendously grateful
0:03:14for the award of this medal, so thank you very much to everybody
0:03:19involved.
0:03:21So
0:03:22back to my title slide.
0:03:27I'm afraid it's a little messy
0:03:29or they're all my affiliations on that. Tanja
0:03:32already mentioned most of them. You would think wouldn't you that
0:03:36the various people involved would at least chosen the same shade of red
0:03:41but
0:03:43down on the right-hand side is my primary affiliation at the moment
0:03:48the MARCS Institute and University of Western Sydney. My previous european
0:03:52affiliations which I still have a meritus position on the
0:03:55left of the bottom
0:03:57and
0:03:58the upper layer of loggers there.
0:04:03I want to call your attention to for practical reason.
0:04:07So on the on the right is the Centre Of Excellence For The Dynamics Of
0:04:11Language which is the
0:04:12an enormous ground actually, it's the big prize in Australian
0:04:17ground landscape
0:04:19and this is
0:04:20this is gonna run for
0:04:22seven years. It's just started. In fact if I'm
0:04:25not mistaken it's actually today, it's the first
0:04:28day of its operation. So it was just awarded, we've just been setting it up
0:04:32of the last six months and it's starting off today.
0:04:36And it's a grant worth some 28 million Australian Dollars over seven years
0:04:42and on the left of that is another big ground
0:04:46running in the Netherlands for the last .. it's been going for about a year
0:04:49and a half now
0:04:50Language in Interaction
0:04:52and that's a similar kind of undertaking and again it's 27 million euros
0:04:58over period of ten years.
0:05:01And
0:05:02it is remarkable
0:05:03that two
0:05:04government organizations, two government research councils, across different sides of
0:05:10The World more and less simultaneously saw it was really important to stick some serious
0:05:18funding
0:05:18into language research, speech and language research.
0:05:22Okay now the practical reason that I wanted draw
0:05:24your attention to these two is that they both have websites
0:05:28and
0:05:29if you have
0:05:31bright undergraduates looking for a PhD place
0:05:34at the moment, please go to the Language and Interaction web website where every
0:05:39six months for at least next six years will be
0:05:42bunch of new PhD positions advertised.
0:05:47We are looking worldwide for bright PhD
0:05:51candidates. It's being run mainly as a training
0:05:53ground, so the mainly PhD positions on this ground.
0:05:57And on the right if you know somebody's looking for a postdoc position we are
0:06:01about to in
0:06:02the Centre of Excellence about to advertise a very large number of postdoctoral positions mostly
0:06:07many
0:06:08of them require linguistics background,
0:06:10but please go on look at that website
0:06:12too, if you or your students or anybody you know
0:06:16is looking for such a position.
0:06:19Okay.
0:06:20Onto my title Learning about speech why did I choose that?
0:06:25As Tanja
0:06:27rubbed in
0:06:28there weren't many topics that I could have chosen.
0:06:33In choosing this one
0:06:37I was guided by first looking at the abstracts for the other keynote
0:06:41talks in this conference.
0:06:45And I discovered that there is a theme
0:06:48two of them actually have learning in the title, two out of the others.
0:06:51And all of them address some
0:06:54form of learning about speech and I thought well okay
0:07:00it would be really useful
0:07:02in the spirit of encouraging the interdisciplinary communication and integration across the various
0:07:10Interspeech areas,
0:07:12if I took
0:07:14the same kind of
0:07:18general theme
0:07:19and started by
0:07:22by sketching what I think of the
0:07:25some of them most important basic attributes
0:07:29of human learning about speech. Namely.
0:07:33But it starts at
0:07:34the very earliest possible moment,
0:07:37no kidding,
0:07:39I will illustrate that in a second.
0:07:42That it
0:07:43actually shapes the
0:07:45processing, it engineers the
0:07:48the algorithms that
0:07:50are going on in your brain
0:07:52that is that the speech you learn about
0:07:55sets up the processing that you're going to be using for the rest of your
0:07:59life. This is
0:07:59also was foreshadowed and what Tanja just told you about me.
0:08:04And it never stops, it never stops learning.
0:08:07Okay
0:08:08so onto
0:08:11the first part of that.
0:08:13So let's listen to something.
0:08:15Warning: you won't be able to understand it.
0:08:18Well, at least I hope not.
0:08:40Okay, I see several people in the audience
0:08:42making ...
0:08:46movements to show that they have understood what was going on.
0:08:57Because
0:09:00what we know now that
0:09:03infants start learning about speech as soon as the auditory system that they have
0:09:08is functional.
0:09:10And the auditory system becomes functional in the third trimester of a mother's pregnancy.
0:09:16But this to say
0:09:17for the last three months before you are born you are already listening
0:09:22to speech
0:09:24and
0:09:25when
0:09:26a baby is born
0:09:29the baby already shows preference for the native language or another language. Very like you
0:09:35can't tell a difference between individual languages for instance, it's known that you can't tell
0:09:38the
0:09:38difference between Dutch and English on the day you born.
0:09:41If you're
0:09:42but you have a preference if you were
0:09:44exposed to an environment speaking one of those languages for that kind of language.
0:09:49So what did you think
0:09:51was in that audio that I just played, I mean what did it sounds like?
0:09:55Speech, right? But
0:09:57what else could do ... What language was that?
0:10:02Do you have any idea?
0:10:04What language might that have been?
0:10:08Was it Chinese?
0:10:18I think that this is an easy question for you guys, come on.
0:10:21Well, were they speaking chinese in that? No!
0:10:26Sorry?
0:10:27Yeah, but it was English, it was Canadian English actually, so
0:10:32the point is you can't and the baby can
0:10:36tell
0:10:38before birth
0:10:41that it's recording taken from a Canadian team which did the
0:10:46recording in the mood of
0:10:49almost
0:10:53in a moment about eight and half months to nine
0:10:57months of pregnancy, right? So you can put a little microphone in.
0:11:01And
0:11:02let's don't thing
0:11:04too much about this.
0:11:06You can actually make a recording within a womb and that's the kind of
0:11:16audio that you get. So that kind of audio is
0:11:20presented to a babies before they're even born and so that's why
0:11:26they get born with preference, with knowing something about the general shape
0:11:30of the language. So you can tell that's stress based language, right?
0:11:36That was the stress based language you were listening to.
0:11:39So.
0:11:42Learning about speech starts as early as possible.
0:11:47We also know now, another thing that many people in this audience would know, that
0:11:52actually infant
0:11:53speech perception is one of the most rapidly
0:11:55growing areas in speech processing,
0:11:59speech research and all of the moment.
0:12:02When I set up
0:12:04a lab 15 years ago in the Netherlands, it was the first modern speech perception
0:12:10lab,
0:12:10infant speech lab in the Netherlands, now there're half a dozen.
0:12:15And people who,
0:12:17PhD students who graduate in this topic have no trouble finding a position. Everybody in
0:12:23the
0:12:24U.S. is hiring every psychology and linguistics
0:12:26department's that have somebody doing infant speech perception at the moment.
0:12:29Good job.
0:12:30Good place
0:12:31for students to get into.
0:12:33But what
0:12:35the
0:12:35recent explosion of research in this area
0:12:39has meant that some
0:12:42we've actually overturned some of the initial ideas that we had in this area, so
0:12:47we now know that
0:12:49it is really
0:12:51infant
0:12:52speech
0:12:53learning that's really grounded in social communication. It's
0:12:57these social interactions with the caregivers that
0:13:02that
0:13:03actually
0:13:05motivates
0:13:06the child to continue learning.
0:13:11That we also know that
0:13:14we don't teach individual words to the babies
0:13:18in the
0:13:19in this very early period they're mainly exposed to continuous speech input and they learn
0:13:25from it.
0:13:26That constructing vocabulary and phonology together
0:13:33it was first thought because of the results that we had that you had to
0:13:37learn the
0:13:38the
0:13:40finding repertoire of your language first and only then you could start building a vocabulary.
0:13:46Well
0:13:47successful building of vocabulary is slow, but nevertheless the very first
0:13:55access to meaning can now be shown
0:13:58as early as the very first access to
0:14:02sound
0:14:04contrast.
0:14:05And the latest,
0:14:09also from my colleagues in Sydney, is that part of the,
0:14:14sorry you know how it was, the a kind of speech
0:14:19called Motherese. The special way you talk to babies.
0:14:22You know you see a baby and you start talking in a special way and
0:14:25it turns out
0:14:25that part of this is under the infants control, it's the infant who
0:14:30who
0:14:30actually
0:14:32elicits this kind of speech by
0:14:35responding positively to it and
0:14:40also trains
0:14:42caregivers to stop doing or
0:14:45to start doing one kind of
0:14:49speech with enhanced finding contrasts and then stop doing that later and start doing
0:14:55individual words and so on. So that's all under the babies' control.
0:15:02So what we
0:15:04tried to do in the lab that I set up in
0:15:09Nijmegen, the Netherlands, some fifteen years ago was to
0:15:13look
0:15:15using
0:15:19the techniques, the electrophysiological techniques
0:15:23of brain sciences, so using Event-related potentials in the infant brain
0:15:29to look at
0:15:32the signature of word recognition in an infant brain, that's what we were looking for.
0:15:35We decided to go
0:15:36and look for what does word recognition look like
0:15:39in an infant's brain.
0:15:41And we found it.
0:15:42So he's an infant in our lab,
0:15:45so
0:15:46sweet, right?
0:15:47You don't have to stick the electrodes on their heads
0:15:53separately, we just have a little cap
0:15:54and
0:15:56they were quite happy to wear a little cap.
0:15:58And,
0:15:59and so
0:16:00what we usually do is
0:16:03familiarize them with speech, so it could be words in isolation or it could be
0:16:09sentences
0:16:10and
0:16:11and then we
0:16:12continue
0:16:14playing some
0:16:16speech as it might be
0:16:18continuous sentences containing
0:16:21the words that they've already heard or containing some other words.
0:16:25Okay?
0:16:26And
0:16:29what we find is a particular kind of response, this is the word recognition response,
0:16:34a negative
0:16:35response to familiarized words compared to the
0:16:40unfamiliarized words.
0:16:42It's in the left side of the brain
0:16:45and
0:16:46it
0:16:48this is word onset, it's the word onset here, right.
0:16:52And
0:16:54and you'll see it's about some
0:16:56half a second after
0:16:58word onset.
0:17:00And so this is the word recognition effect that you can see
0:17:04in
0:17:05in an infant's brain.
0:17:09So
0:17:13we know
0:17:14as I said that in the first year of life
0:17:18infants mainly hear
0:17:20continuous speech.
0:17:22Okay so they're able to learn words from continuous
0:17:25speech and so in this experiment
0:17:29we only used continuous speech.
0:17:34And this was with ten month old infants now they don't have understanding any of
0:17:38this, you don't have to
0:17:39understand. Whatever, it's in Dutch.
0:17:43It's just the
0:17:44showing what they were like, so that in the particular trial
0:17:49you'd have
0:17:50eight different
0:17:53sentences and all the sentences have one word in common
0:17:56and this is the word drummer, which happens to be drama, right?
0:17:59And
0:18:01and
0:18:03then you switch to hearing four
0:18:06sentences
0:18:07later on
0:18:09and
0:18:10the trick is that of course all of these things can occur in pairs, so
0:18:15for every infant
0:18:16that hears eight sentences with drummer
0:18:18right there's gonna be another
0:18:20infant that's gonna hear eight sentences with fakirs.
0:18:23Okay
0:18:25and
0:18:26so then you have two each of these sentences and what you expect is that
0:18:29you get more
0:18:30negative response to whichever word you have actually
0:18:36already heard
0:18:38and that's exactly what you found. This one has just been published, as you see.
0:18:42And so what we have is the proof that
0:18:45just exposing
0:18:47infants to a word in an continuous speech
0:18:51contexts is enough for them to recognize that same word form
0:18:55and now they don't have understanding of anything at ten months
0:18:57right, they are not understanding anything about. They're pulling out
0:19:00words out of continuous speech
0:19:03at this
0:19:04at this early age.
0:19:06Okay
0:19:07now this is
0:19:09given the fact that
0:19:11the
0:19:12input to infants is mainly continuous speech
0:19:18is of course vital that they can do this, right? And another
0:19:27important finding that has come from this series of
0:19:34experiments and in using infants' word recognition effect
0:19:39is that
0:19:41it
0:19:42predicts
0:19:43your later
0:19:44language performance
0:19:46as a child,
0:19:47right? So that
0:19:49if you're showing that
0:19:51to become negative going response that I've just talked about already
0:19:56at seven months which is very early
0:20:00if it's a nice big effect that you get, a big difference
0:20:05and if it's a nice clean
0:20:10a response in the brain
0:20:12then
0:20:13for instance here is the
0:20:18I've sorted here
0:20:20two
0:20:22groups of infants
0:20:23which had a negative responses at age of seven months
0:20:27or in the same experiment did not have a negative response.
0:20:31And at age three
0:20:33look at their comprehension scores, their sentence productions scores, the size of vocabulary scores.
0:20:40The blue guys, the ones who showed that segment, that word recognition effect in continuous
0:20:46speech
0:20:46at age seven months already
0:20:48performing
0:20:50much better. So it's a vital for your
0:20:52later development of
0:20:55speech and language competence.
0:20:57Here is an actual
0:21:01participant by participant correlation
0:21:05between the size of the response,
0:21:10so remember that we're looking at negative response so
0:21:13the bigger it is down here, right? The more negative it is
0:21:18the bigger your scores
0:21:20in the number of words you know at age one or the number of words
0:21:25you can speak
0:21:25at age two. Both correlate significantly, so this is really important.
0:21:32Okay, so starting early
0:21:34and
0:21:36listening actually just to real continuous speech
0:21:40and
0:21:41recognizing that what it consists of is
0:21:45reccuring
0:21:47items, that you can pull out of that speech signal and store for later use.
0:21:52That is
0:21:53setting up a vocabulary
0:21:54bin and starting early on that
0:21:56really launches your
0:21:58language skill.
0:22:01And we're currently working on just how long that some
0:22:06that effect lasts.
0:22:08So the second
0:22:10major topic
0:22:12that I want to talk about is how learning shapes processing.
0:22:17You'll know already from Tanja's introduction that this has actually been the
0:22:24guiding
0:22:25theme of my research for the last
0:22:28well I don't think we are going how many years it is now
0:22:32for a long time.
0:22:34And I could easily stand here and talk for the whole hour about this topic
0:22:40alone
0:22:40or I could talk for a month about this topic alone but I'm not going
0:22:43to. I am going to take one particular
0:22:45really cool,
0:22:47very small
0:22:49example of how it works.
0:22:53So the point is that
0:22:56the way you actually deal with the speech signal,
0:23:00the actual processes that you apply
0:23:04at different
0:23:07depending on the language you
0:23:09grew up speaking or your primary language, right? So those of you out there
0:23:15for whom English is not your primary
0:23:19language you're gonna have different
0:23:21processes going on
0:23:23in your head
0:23:24than what I have.
0:23:25Okay
0:23:27now
0:23:28I'm gonna take this really tiny
0:23:32form of processing. So you take a fricative sound right s or f.
0:23:37Now these are pretty simple sounds.
0:23:39How do we recognise? How do we identify
0:23:42a sound,
0:23:43right? For these fricatives do we actually just
0:23:48analyze
0:23:49the frication noise
0:23:51which is different for sss, fff.
0:23:54You can hear just hear the difference
0:23:56sss high frequency energy, right?
0:23:57fff is lower.
0:24:00Or do we analyze the surrounding
0:24:03that information in the vowels? Well, there is always transitional information in any speech
0:24:10signal between sounds. So are we using this in identifying s and f?
0:24:15Well.
0:24:17Maybe we shouldn't because s and f are
0:24:20tremendously common
0:24:22sounds across languages and their pronunciation is very similar across languages so we probably
0:24:28expect it to be much the same way they are processed across languages.
0:24:32But we cannot always test whether
0:24:36vowel information is used in the following way.
0:24:41You ask:
0:24:43is going to be harder
0:24:45to identify particular sound,
0:24:48this works for any sound, right, now we are talking about s and f,
0:24:52if you insert them into a context that was originally added with another sound?
0:24:57Okay.
0:24:58So in the experiment I'm gonna tell you about
0:25:02your task is just to detect a sound that might be s or f in
0:25:06this experiment,
0:25:07okay?
0:25:08And it's gonna be nonsense you're listening to so
0:25:10dokubapi pekida tikufa
0:25:13right and your task would then be to press the button when you hear f
0:25:18as sound of
0:25:19f in tikufa.
0:25:20And crucial thing is that every one of those target
0:25:23sound is gonna come from another recording every one of them
0:25:26and it's gonna be either another
0:25:28recording which had origin,
0:25:31which originally have the same.
0:25:39In the tikufa is either gonna have come from another utterance of tikufa
0:25:44or it's gonna come from
0:25:48the tiku_a is gonna come from
0:25:50tikusa
0:25:51and have the f put into it, right? So you're going to have
0:25:56mismatch in vowel cues if it was originally tikusa
0:26:00and congruent vowel cues if it was another utterance of tikufa.
0:26:04Now some of you who teach speech science may recognise
0:26:08this experiment because it was originally ... it's a very old experiment,
0:26:12right?
0:26:13Anybody recognised it?
0:26:15It was originally published in 1958, right? Really old experiment.
0:26:21First done with American English
0:26:25and the result was very surprising because what
0:26:28was found was different for f and s,
0:26:31right?
0:26:32That in the case of f
0:26:35if it came from another, if tiku_a was originally tikusa
0:26:41then
0:26:42it was harder to, if you put the f
0:26:46into a different context that was much harder to detect it,
0:26:49whereas if you did it with the s there was zero effect
0:26:53of the cross-splicing. No effect whatsoever for s.
0:26:56But a big effect for f.
0:26:59So listeners are only using vowel context for f but they weren't using it for
0:27:04s, right? A so this
0:27:05just seemed like a bit of puzzle at the time. But you know in 1958,
0:27:09these old results has been
0:27:11in the text books for years you know. It's in the text books.
0:27:15And the explanation was well you know that it's the high frequency energy in s
0:27:20that makes it clearer,
0:27:21it's you don't need to listen to anything else the vowels, you can just do
0:27:25s on the frication noise
0:27:27alone but f is not so clear, so you need something else.
0:27:32Wrong.
0:27:34As you will see
0:27:37so
0:27:39I'm going to tell you about some thesis work of my student A. Wagner
0:27:44a few years ago.
0:27:46And she first replicated this experiment, so what I'm gonna plug up here is
0:27:52the cross-splicing effect
0:27:56for f minus the effect for s,
0:27:59right so,
0:28:00you know that
0:28:02the bigger effect for f
0:28:04than there is for s, we just saw that, right?
0:28:07And so she replicated that right. The original one was American English she did it
0:28:13with British English and get exactly the same
0:28:15effect, so the
0:28:18huge effect for f and very little effect for s
0:28:24So the size of the effect for f is bigger.
0:28:27And she did in Spanish and got exactly the same result,
0:28:30okay.
0:28:32So it's looking good for the original hypothesis, right?
0:28:36And then she did it in Dutch.
0:28:38Nothing.
0:28:39In fact there was no effect for either s or f in Dutch
0:28:44or in Italian, she did an Italian,
0:28:46or in German, she did in German,
0:28:48so okay.
0:28:51Audience response time again, right? So I missed that,
0:28:54I didn't tell you one crucial bit of information here.
0:28:58The Spanish listeners were in Madrid,
0:29:02so this is Castilian Spanish,
0:29:05so what two English,
0:29:08think now
0:29:09what two English
0:29:10and Castilian Spanish have
0:29:13that Dutch and
0:29:14German and
0:29:15Italian,
0:29:17Chinese or whatever languages don't have?
0:29:21You're good, you're really good.
0:29:23That's right.
0:29:27So here, this is the reason you think the original explanation
0:29:31?? that s is clearer.
0:29:34Accounts for the results for English and Spanish, but doesn't account for the results for
0:29:37Dutch and
0:29:38Italian and German, right? But the
0:29:43the explanation that
0:29:45you need extra information for f,
0:29:49because it's so like θ, right? Because f and θ are about the most confusable
0:29:55phonemes in any phoneme repertoire.
0:29:59As the confusion matrix of English certainly shows us.
0:30:04So you need the extra information for f just because there is another sound in
0:30:10your phoneme repertoire which its confusable with,
0:30:14but how do you test that explanation?
0:30:16Well,
0:30:18you need,
0:30:19now you know I'm not gonna ask you to guess what's coming
0:30:21up, right, because you know it from it if you are looking at the slide.
0:30:25But you need a language
0:30:26which has a lot of different s sounds, right?
0:30:30Because then the effect should reverse
0:30:33if you find a language with a lot of other sounds like s
0:30:37and yes Polish is such a language.
0:30:40Then want you should find in that cross-slicing experiment is that
0:30:45that
0:30:46you get a big effect
0:30:48for mismatching vowel cues for s
0:30:51and nothing much for f, if you don't have also have θ theta in the
0:30:55language.
0:30:55And that's exactly what you find in Polish.
0:30:58Very nice result. How cool is that overturn the textbooks in your PhD?
0:31:03So,
0:31:06we listened to different sources of information in different
0:31:11languages, right? So we learn to process the signal differently
0:31:16even s and f are really articulated much the same across languages, but in Spanish
0:31:21and English you
0:31:22have fricatives that resemble f and in Polish
0:31:25you have fricatives that resembles s, so you have to pay
0:31:28extra attention to surrounding,
0:31:31well it helps to pay extra attention to surrounding
0:31:36speech information to identify them.
0:31:39The information that surrounds
0:31:41inter-vowel vocalic
0:31:44consonants is always going to be there. There is always information in the vowel which
0:31:48you only use
0:31:49if it helps you.
0:31:51Okay
0:31:52onto the third
0:31:54point that I want to make.
0:31:56Learning about speech
0:31:58never stops.
0:32:01Even if we were only to speak one language,
0:32:04even if we knew every word of that language, so we didn't have to learn
0:32:08any new words,
0:32:09even if we always heard speech spoken in clean conditions
0:32:13there still learning to be done, especially whenever we meet new
0:32:15talker which we can do every day. Especially at the conference.
0:32:22When we do meet new talkers, we adapt quickly.
0:32:26That's one of the
0:32:26the most robust findings in human speech recognition, right? We have no problem walking into
0:32:32a shop
0:32:33and engage in a conversation with somebody behind the counter we never spoken to before.
0:32:40And this kind of talker adaptation also begins very early
0:32:44in infancy
0:32:46and it continues through
0:32:47life.
0:32:50So
0:32:53as I already said
0:32:55you know about
0:32:57particular talkers you can tell your
0:33:00mother's speech from other
0:33:02talkers at birth.
0:33:03So these experiments that people do at birth, right. I mean it's literally within
0:33:09the first couple of hours after an infant is born. In some labs they are
0:33:14presenting them with speech and see
0:33:16if they shown a preference. And they show a preference by sucking
0:33:19harder to keep the,
0:33:21you got to pacify the sucker with the transducer and
0:33:26keep speech signal going and you find
0:33:30that infants will suck longer that hear their own mother's voice than other voices.
0:33:36But when do they,
0:33:38when do they tell the difference between
0:33:42unfamiliar
0:33:43talkers, so you have new talkers, when can an infant
0:33:47tell whether,
0:33:49whether,
0:33:50whether they're same or not?
0:33:52Well you can test discrimination easily
0:33:56in infants, right.
0:33:58And it's a method habituation test methat that we use.
0:34:03So what you do is that you have baby sitting on
0:34:07caretaker's mother's lap.
0:34:10And mother's listening to something else, right. You bring in a music tape or something,
0:34:14so mother
0:34:15can't hear what babies are hearing
0:34:17and
0:34:19baby is hearing speech coming over
0:34:22loudspeakers
0:34:24and is looking at a pattern on the screen which
0:34:30and if they look away the speech will stop,
0:34:32right.
0:34:33Sorry.
0:34:36What happens is you
0:34:37play them
0:34:39a repeating
0:34:40stimulus of some kind, so
0:34:42in this experiment that I'm gonna talk about, the repeating stimulus is just
0:34:46some sentences that they wouldn't understand
0:34:48being spoken by
0:34:50three different speakers, interchanging one's. Speaker will say
0:34:54a sentence and the next one will say a couple of sentences and the first
0:34:57one will also say a couple of sentences
0:34:58again and third speaker also says sentence These are just sentences that the babies can't
0:35:03actually understand.
0:35:04These babies are actually seven months old. Younger than the baby in the picture there.
0:35:10And
0:35:11so as to the
0:35:13stimulus keeps repeating the infant keeps listening, right.
0:35:19And the stimulus keeps repeating,
0:35:22and the infant keeps listening,
0:35:24and the stimulus keeps repeating,
0:35:30and eventually baby get bored and looks away, right.
0:35:33And at that point
0:35:35you change the input,
0:35:37right.
0:35:38And then you wanna know if and that's the way you test discrimination, does the
0:35:43baby look back? Right.
0:35:44Look back at the screen and perk up.
0:35:47Okay and continues to look at
0:35:52the screen and thereby keep the speech going.
0:35:57Well,
0:35:58so
0:35:59these were seven month olds as I said, so really they don't understand anything like
0:36:04no words yet.
0:36:05Maybe that recognise their own name, that's about it.
0:36:10And we have
0:36:11got three different voices, the three different
0:36:15young women
0:36:17that have reasonably similar voices
0:36:19talking away and saying sentences that are you know way beyond seven month olds' comprehension
0:36:25like: Artist are attracted to life in the capital.
0:36:30And then at the point in which the infant
0:36:34loses attention you'll bring in a fourth voice,
0:36:39a new voice and the question is: Does the infant notice?
0:36:43Okay.
0:36:44So these are Dutch babies. This was run in Nijmegen.
0:36:49And yes, they do.
0:36:51They really do notice the difference, right.
0:36:55As long as it's in Dutch.
0:36:56We also did the experiment with four people talking in Japanese,
0:37:00four people talking Italian
0:37:02and it was no significant
0:37:06discrimination in that case. So it's only in the native language, right. That is to
0:37:10say the
0:37:10language of the environment that they have been exposed to.
0:37:14So
0:37:15this is important because it's not
0:37:18whether speech is understood that's going on here, it's whether sound is familiar, beucase what
0:37:24infants are doing between six and nine months is there
0:37:27they're building up their knowledge of the phonology of
0:37:31their language and building up their first
0:37:35store of words.
0:37:38So
0:37:39and then this is important. Some of you probably know the literature from forensic
0:37:44speach science on this and you know that
0:37:47that
0:37:51if you're trying to do a voice lineup and pick a speaker you heard in
0:37:56a
0:37:56criminal context or something and that speakers is speaking a language you don't know very
0:38:01well
0:38:02you're much poorer at making a judgement than if they're speaking
0:38:06the same language as your native language.
0:38:08And
0:38:10this appears to be based on exactly the same
0:38:13the same
0:38:14basic phonology
0:38:18adjustment that some
0:38:20that we see happening in the first year of life.
0:38:24And we can do a little bartery. We can show adaptation to
0:38:29to new talkers
0:38:31and strange speech sounds
0:38:33in a perceptual learning experiment that we first
0:38:37ran about eleven years ago
0:38:40and has been replicated in many languages and in many labs around The World since.
0:38:47And in this paradigm what we do is we start with a learning phase, right.
0:38:51Now there are many different kinds of things you can do in this learning phase,
0:38:55but one of them is
0:38:56to ask people to decide, they're listening to individual
0:39:01tokens and you ask them to decide
0:39:03is this the real world or not?
0:39:05Right.
0:39:06And that's called lexical decision task, right.
0:39:09So here's somebody doing lexical decision and they're looking
0:39:12the hearing cushion,
0:39:13astopa, fire place, fire place yes, that's the word, magnify yes,
0:39:20heno no that's not a word, devilish yes, defa no that's not a word and
0:39:23so on just going through pressing the button.
0:39:25Yes, no, yes, no and so on.
0:39:26Right.
0:39:27Now the crucial thing in this experiment that we're doing
0:39:30is that we're changing one of the sounds
0:39:33in the experiment,
0:39:34okay.
0:39:35And we're gonna stick with s and f here, just to keep things simple,
0:39:40but again we've done it with a lot of different sounds,
0:39:43so
0:39:45if you
0:39:46for instance had a
0:39:48sound that was halfway between s and f,
0:39:52we
0:39:54create a sound along a continuum between s and f that's halfway in between, in
0:39:58the middle,
0:39:59and we stick it on the end of a word like which would've been giraffe
0:40:02but
0:40:03but then that sounds like
0:40:05this.
0:40:08No, like here.
0:40:10Can you hear that it's a blend of f and s.
0:40:13And
0:40:16and a dozen of other words in the experiment
0:40:20which all should have an f in
0:40:23and
0:40:24if they had a s it would be a non-word, so we expose
0:40:30a group of people to learning that
0:40:33the way the speakers says f
0:40:35is this strange thing which is a bit more s like.
0:40:39Meanwhile there's another group
0:40:41that's doing the same experiment,
0:40:44right.
0:40:45And they're hearing things like this.
0:40:49That's exactly the same sound at the end of what should be horse.
0:40:54Right, so they have been trained
0:40:56to
0:40:57hear that particular strange sound and identify it as s.
0:41:02Where the other group identifies it as
0:41:04as f, right.
0:41:06And then you do a standard phoneme categorization experiment,
0:41:12right. Where what everybody hear is exactly the same continue
0:41:28and some of them were better s and some of them were better f,
0:41:32but none of them are really good s but the
0:41:36the point is that
0:41:38you make a
0:41:40categorization function out of an experiment like that, right, which goes from one
0:41:45of those sounds to the other
0:41:47and you would normally,
0:41:49under normal conditions get
0:41:52a baseline categorization function that are shown up there
0:41:57and if you, but if you're
0:41:59if a category was expanded
0:42:01you might get that function and if your s category was expanded you might get
0:42:06that function okay so
0:42:07that's what we're gonna look at
0:42:09as a result of
0:42:10our experiment, which just one group of people and expanded their f category and another
0:42:14group of people
0:42:15and expanded their s category and that's exactly what you get,
0:42:19right.
0:42:21Completely different functions for identical continua,
0:42:24right.
0:42:26Okay, so we exposed these people to a change sound in just a few words
0:42:32so we had
0:42:35up to twenty words in our experiments, but people were
0:42:37tested on many fewer words and obviously
0:42:40in real life where the new talker probably works with one
0:42:44occurrence
0:42:47and
0:42:48it only works if you could work out what the sound was
0:42:51supposed to be, right. And with real words, so if we did
0:42:54the same thing with non-words there's no significant shift, those are both exactly
0:42:59equivalent to the baseline function.
0:43:02So that's basically what we're doing.
0:43:05Adapting to talkers we just met by adapting our phoneme boundaries
0:43:11especially for them.
0:43:13Now this as I've already said
0:43:18has spawned a huge number of follow-up experiments, not only in our lab.
0:43:23We know that to generalize across the vocabulary don't have to
0:43:27have the same sound in a similar
0:43:30context.
0:43:33We know that lots of different kinds of exposure
0:43:36can
0:43:37can bring about the adaptation
0:43:40doesn't have to be lexical decision task, you don't have to be making any decision
0:43:44about the word,
0:43:45you just have passive exposure, you can have
0:43:47non-sense words if their phone is phonotactic
0:43:51constraints force you to
0:43:54choose one particular sound.
0:43:57And we know that it's pretty much speaker's specific
0:44:01that is the least adjustment is bigger for the speaker you actually heard
0:44:06and we've done it across many different languages and I brought along some results
0:44:11from Mandarin, because Mandarin gives as something really beautiful.
0:44:16Namely that you can do the same
0:44:18adjustment, the same
0:44:21experiment with segments and with tones, right.
0:44:24Different kinds of speech sounds as I said not just
0:44:29the same segments that I used in that
0:44:32experimental but here they are again f and s in Mandarin. Same result.
0:44:36Right.
0:44:38Very new data.
0:44:39And there is the result when you do it with tone one and tone two
0:44:43and
0:44:43in Mandarin exactly the same way. Make an ambiguous stimulus halfway between tone one and
0:44:49tone two.
0:44:50And you get the same adjustment.
0:44:54You do
0:44:57use this, you can use this
0:44:59kind of adaptation
0:45:01effectively in a second language which is good.
0:45:06At least
0:45:07in this experiment by colleagues of mine in
0:45:12Nijmegen using the same Dutch input with Dutch listeners get
0:45:16exactly the same shift, right.
0:45:18And
0:45:19German students, now German and Dutch are very close languages, and the German students come
0:45:25to
0:45:26study in the Netherlands in Nijmegen, they take, imagine this the rest of
0:45:32you who've gone to study in an another
0:45:36country you know, which doesn't speak your L1 (first language).
0:45:41They take a course for five weeks,
0:45:44a course in Dutch for five weeks and at the end of that five weeks
0:45:48they just go into the lectures
0:45:49which are in Dutch
0:45:50and they're just treated like anybody else
0:45:55in the,
0:45:56so that long it takes to learn
0:45:58to get up to speed.
0:46:00If you're German that long it takes to get up to speed with
0:46:04Dutch, okay.
0:46:05So not surprisingly
0:46:07huge effect, the same effect, the same
0:46:11experiment
0:46:13and
0:46:14with German students in the Netherlands. I have to say that I'm actually, this is
0:46:20this is my current research, one of my current research projects
0:46:24and the news isn't hundred percent good on this
0:46:27topic after all, because I brought along some data which
0:46:32which is actually just from a couple weeks ago, we've only just got it in,
0:46:37and this is
0:46:42adaptation in two languages,
0:46:45in the same individuals. Now you just seen that graph.
0:46:48That's the Mandarin listeners doing the task in Mandarin
0:46:53and what I'm trying to do in one of my current projects
0:46:58is look at the processing
0:47:01of different languages by the same person,
0:47:05right. Because I want to track down what's
0:47:08what is the source of native language listening advantages in
0:47:12various different context and so what I'm trying to do now is look at the
0:47:18same people
0:47:19doing the same kind of task.
0:47:23It might be listening to noises, it might be perceptual learning for speakers and so
0:47:28on
0:47:28in their different languages.
0:47:30So here are the same Mandarin listeners
0:47:33doing the English experiment.
0:47:39Not so good.
0:47:41So
0:47:42it looks
0:47:42and these were tested in China so
0:47:47it was,
0:47:49they are not in immersion situation, it is their second language and they are living
0:47:54in their
0:47:54L1 environment, so that's not quite
0:47:57as hopeful as
0:48:00as the previous
0:48:05study. However one thing we know about some
0:48:08about this adaptation to talkers, we've already seen that discrimination
0:48:13between talkers is something that even seven month old listeners can do, so what about
0:48:19this kind of
0:48:22lexically based adaptation to strange pronunciation. We decided to test this in children
0:48:29which couldn't really use a
0:48:35lexical decision experiment, because you can't really ask kids, they don't know a lot of
0:48:40words.
0:48:42So we did a picture verification experiments with them.
0:48:46A giraffe and the one on the right is a Platypus, right.
0:48:49So the first one ends with the f and the second
0:48:52one ends with the s. We're doing the s/f thing again.
0:48:56And
0:48:57and then we had a name continua for our
0:49:02for our
0:49:04finding categorization, so again you don't want to be asking young kids to
0:49:09decide whether they're hearing f or s, it's not natural
0:49:12task but if you teach them that the guy on the left is called Fimpy
0:49:16and the guy on the right is called Simpy
0:49:19and then you give them something that's halfway between Fimpy and Simpy, right.
0:49:24Then
0:49:25then you can
0:49:27get a phoneme categorization experiment and we first of all had to validate
0:49:33the task with adults, needless to say we did not
0:49:37have to do,
0:49:39the adults could just press a button.
0:49:42So I didn't have to point to the character and so on.
0:49:45But we get the same shift again for the adults
0:49:50and we get it with twelve year olds and we get it with sixty years
0:49:54olds and important differences with twelve
0:49:56year olds and six year olds is that twelve year olds can read already.
0:49:59And six year olds can't read.
0:50:01And there is a certain school of thought that believes
0:50:06that you get phoneme categories from reading. But you don't get phoneme categories from reading,
0:50:10you have
0:50:11your phoneme categories in place very early in life.
0:50:15So
0:50:17that's exactly the same effect as you say very early in life even at age
0:50:23six
0:50:23you're using your perceptual learning to
0:50:26understand new talkers.
0:50:28And I think I saw our debt over there, so I'm going to show some
0:50:31of
0:50:31some of ?? data presented,
0:50:34so we know, yes there you are.
0:50:38This is some of the older work so that we know that
0:50:43that this kind of perceptual learning goes on in life. I brought this particular
0:50:51result which is again with s and f and was presented to Interspeech in 2012
0:50:57so I
0:50:58hope you were all there and you all heard it actually
0:51:01but they also have some
0:51:052013 paper with
0:51:07different phoneme continuum which I urge you also to look at.
0:51:13So
0:51:14even when you're losing your hearing you'll still doing this perceptual learning
0:51:19and adapting to
0:51:23to new talkers, so learning about new talkers is just
0:51:27something that human listeners do
0:51:30throughout
0:51:31the lifespan.
0:51:32So that brings me
0:51:33to my final slide.
0:51:36So this has been a
0:51:38quick
0:51:39tour through some highlights of some really important issues in human learning about speech.
0:51:44Namely that it starts as early as a possibly can,
0:51:48that it actually trains up the nature of the processes
0:51:52and that it never actually stops.
0:51:55So
0:51:58when I was doing this I thought well actually you know
0:52:01I love these conferences because they're the
0:52:04interdisciplinary, because we get to talk about the same topic from
0:52:08from
0:52:09from different viewpoints. So what actually
0:52:12would I think after
0:52:14preparing this talk?
0:52:17What I think is the
0:52:19biggest difference you could put your finger on between human learning about speech and
0:52:24machine learning about speech.
0:52:27So I have been talking about this during week and I'll give you
0:52:32that question to take to all the other keynotes and think about too
0:52:39but
0:52:39if you'd say, you know, it starts at the earliest possible moment, well I mean
0:52:44so would a good machine
0:52:47learning algorithm, right? I mean
0:52:50it shapes the processing, it actually changes the algorithms that you're using, that's not the
0:52:55usual
0:52:55way because we usually start
0:52:58in programming
0:53:01machine learning system we start with the algorithm, right?
0:53:06You don't actually change the algorithm
0:53:08as a result of the input, but you could. I mean
0:53:12there's no logical reason why that can't be done I think.
0:53:19And never stops what I mean that's not the difference, is it? No that's not
0:53:22a difference you can run
0:53:23any machine learning algorithm as long as you like.
0:53:27I think buried in one of many very early slides is
0:53:32something which is crucially important
0:53:35and that is the social reward.
0:53:38That we now know to be really important factor in the early human
0:53:43learning about speech and you can think of humans
0:53:46as machines that really
0:53:49want to
0:53:50learn about speech. I'd be very happy to talk about this
0:53:54at any time
0:53:56during the rest of this week
0:53:58or
0:53:58or at any other time
0:54:00too and I thank you very much for your attention.
0:54:29Hi and fascinating talk
0:54:31so a quick question. Your boundaries the ??. Do they change as a function
0:54:35of the adjacent vowels? So far versus
0:54:39fa, sa versus fa. ??
0:54:46We've always used a whatever was the constant
0:54:53context.
0:54:54So you're talkind about perceptual learning experiments?
0:54:59The last set of experiments, right? We've always tried to use a
0:55:05varying context so I can't answer that question. If we had used only a
0:55:14or hang on
0:55:16we did use a constant context in the non-word experiment with
0:55:25phonotactic constraints, but then that was different in many other ways so
0:55:31no I can't answer that question but,
0:55:36there is some tangential
0:55:39answer, information from another lab
0:55:44which has shown that people can learn
0:55:47in this way,
0:55:49a dialect feature
0:55:51that is only
0:55:53applied in a certain context.
0:55:56So
0:55:57the answer would be yes. People would be sensitive to that if it was consistent,
0:56:02yes.
0:56:07Tanja?
0:56:11There are two in the same row.
0:56:17Caroline.
0:56:19Have you found any sex specific differences in the infants' responses?
0:56:24Have we found sex specific differences in the infants' responses. There are some
0:56:29sex specific differences
0:56:31in. But we have not found them in
0:56:36in these speech
0:56:37segmentation. In the word recognition in continuous speech we've actually always looked
0:56:44and never found a significant difference between boys and girls.
0:56:52That was the a short one. So are there any other questions or not?
0:57:02With respect to the
0:57:05negative responses
0:57:07on the words
0:57:08that you used there,
0:57:10that was presented in the experiment
0:57:13and
0:57:14that
0:57:15at age three the children were..
0:57:17Right.
0:57:17The size of the negative going brain potential, right?
0:57:25Is that just
0:57:27would you say that could be good to
0:57:31detect pathology?
0:57:34Yes.
0:57:35Definitely and the person whose name you saw on the slides as first author Caroline
0:57:41Junge
0:57:42is actually starting a new
0:57:45personal career development award project in Amsterdam
0:57:50and in Utrecht, sorry in Utrecht, where she will actually look at that.
0:57:56Okay so, thank you so much again for delivering this wonderful keynote and
0:58:02congratulations again for being our ISCA medalist. I am happy that you're around so you
0:58:07can back our medallist over
0:58:09the whole duration of the Interspeech conference. Thank you Anne.