Speech Transcript - Behavioral Signal Processing

0:00:12	it's my great honour and pleasure to announce our stick
0:00:17	invited speaker today three now ryan who will talk about behavioral signal processing
0:00:23	so i three is the andrew viterbi professor at U S C
0:00:27	this research focuses on human centred information processing and communication technologies
0:00:33	and enjoying that he seems to be kind of the volume that holds for professor appointment
0:00:38	i was very impressed to see that in electrical engineering computer science but also linguistics
0:00:43	and psychology
0:00:44	and i don't live in the us but one and told me that is also a regular guest on us
0:00:48	television so
0:00:50	so please help me welcome sheri not really looking forward to talk
0:01:03	thank you
0:01:04	right i
0:01:05	some really honoured to be here and it was great to see a lot of friends of my haven't seen
0:01:10	in a long time kind of come back to speech at least to check it out
0:01:14	so they were asking you know what to say next crazy fringe E funny things i've been up to you
0:01:21	so that's this talk today
0:01:23	and
0:01:24	the only bad little problem i have with this is because i haven't done very much in this topic yet
0:01:30	machines but i would share whatever we've been up to in the last couple of years
0:01:35	hopefully won't disappoint them able to spend you part yeah
0:01:38	so the title is a bit here on signal processing i will yeah momentarily define what i mean by that
0:01:44	the case be made of this terms of the got at least say what it is
0:01:49	so
0:01:51	but this is this work concerns you know human behaviour as we all know it's a very complex and multifaceted
0:01:58	involves a very complex and intricate of main body kind of relations
0:02:03	has the effect of you know but you know and the environment rolls interaction with other people and then barman
0:02:11	a very low you know it's that reflected in how we communicate you mode our personality and interact with other
0:02:18	people
0:02:19	and also it's characterised by the generation processing a multimodal cues
0:02:24	and often characterises typical atypical this water so one
0:02:29	so one wonder you know what is the role of signal processing or signal bussing people in this
0:02:36	business
0:02:38	so you get across number of domains actually be here analysis for either explicitly or implicitly so essential to the
0:02:46	starting from customer care you know you want to know a person is
0:02:51	you know frustrate are very satisfied with the services that that's been rendered and you want to sell more things
0:02:57	you know you wanna
0:02:58	right here but at the level of an individual or group source or one
0:03:02	in a learning and education you not only do you wanna know whether someone is getting a particular and so
0:03:09	right or wrong you wanna know how they got it how confident are they
0:03:13	and you know how can you actually adapt or this personalise learning is one of these you know grand challenges
0:03:18	of engineering so to be able to do that you know we have to understand
0:03:23	be here patterns and like that
0:03:25	but more importantly and something that i'm five gotta know increasing passion about this whole area of mental health and
0:03:31	wellbeing which i'll try to touch of one today S couple of my examples
0:03:35	where a
0:03:37	you'll behavior analysis very centrally the observation based or other means
0:03:43	but you know when you look across no while the computational tools are used but mostly it's very human based
0:03:50	so i thought before we go for it also shows some videos are examples of you know some of these
0:03:57	typical problems one could ask
0:03:58	so here this is like you're gonna see kids playing with actually a computer game talking to it
0:04:03	the question is you know a can be tell if the child is you know something about their cognitive state
0:04:09	you know confident are
0:04:11	not
0:04:12	so let's look at this little girl
0:04:18	right
0:04:20	or you can you
0:04:22	and mute audio please
0:04:36	alright let's try again
0:04:43	hold on i checked many times
0:04:47	something about you people an idea
0:04:50	let's see
0:04:53	it's still a
0:04:54	okay
0:05:06	or answer
0:05:09	yeah
0:05:16	i
0:05:20	where is this
0:05:28	well i
0:05:30	oh
0:05:31	oh
0:05:33	i
0:05:36	oh
0:05:43	i
0:05:45	so just looking at us
0:05:50	we season or from there is sort of a vocal cues and you know that the language they're using the
0:05:55	visual cues and looking around and looking away you can say something you know at least that these are different
0:06:02	and you know the one of the questions we ask is like okay can be actually formally someone you know
0:06:06	the these problems of measuring speaker
0:06:09	so the next example
0:06:10	it's from marital therapy or plastic all than your counselling
0:06:16	so what you're gonna see us that a couple in writing
0:06:21	and that people in this or social able to play a psychologist or doing this kind of research and people
0:06:29	who are actually help in trying to help these couples in a look for a lot of things you know
0:06:34	characterising aspect of dynamics in off
0:06:37	looking at who's blaming homeland trying to figure out what that is and try to plan to treatment based on
0:06:43	that so let's look at this video
0:06:45	should i tried again
0:06:48	i
0:06:50	okay
0:06:58	no it's not me
0:07:01	you know
0:07:02	no
0:07:03	the right leg
0:07:06	right
0:07:10	alright
0:07:45	oh
0:07:46	used car
0:07:48	yeah but what you
0:07:50	again
0:07:52	but
0:07:53	the one of these things
0:07:57	or we try to make
0:08:01	this is an example from
0:08:06	the main word
0:08:07	colour is actually
0:08:10	you interaction with the child
0:08:12	the sort of a semi structured interaction following a particular diagnostic for diagnostic test
0:08:23	so that is engaged
0:08:28	one
0:08:28	one
0:08:29	trying to figure out
0:08:33	things
0:08:34	or
0:08:35	everything
0:08:37	prosody to
0:08:39	sure
0:08:41	right
0:08:42	you know get price that characterising
0:08:45	if you ask
0:08:47	six
0:08:49	so
0:08:52	i
0:08:53	i
0:08:56	oh right
0:09:01	right
0:09:03	i
0:09:06	right
0:09:08	i
0:09:10	so you think you should probably observed is that no that child you know there was a clear place more
0:09:16	no they could chart of last back or looked at the person's the cost nothing was happening this has just
0:09:22	you know doing the task memory so why not sway
0:09:25	and X I causes rate so these things sort of on a very i'll talk a little later on some
0:09:32	scales that we've been developed in the I D S M
0:09:35	or just want to confide
0:09:37	and it all these are some of the things that are happening as you can see right very observation based
0:09:41	but where people are looking at multimodal cues and trying to so vendor sentiment be
0:09:48	so when you look at these human behavior signals write the kind of pro why
0:09:53	a window into these high-level processes like you know i'll be you know what's it depends on how big or
0:09:58	small the window is
0:09:59	some or all working observable like this vocal and facial expressions and body posture others are covered you know we
0:10:06	don't have access to them non the less intelligent special cases
0:10:10	things like heart rate can lead to the remote response or even brain activity and from a single one of
0:10:16	you know in this kind of information besides and you know different time scales to these different Q
0:10:22	but you know the ability to process and you know sort of interpret decode these signals so can provide us
0:10:28	some insights and understanding mind body relations
0:10:31	but also more importantly no these how people process other people's behaviour patterns no that's a fine distinction bode plot
0:10:40	are generated a processes but also hoping something process and
0:10:45	and don't the measurements and quantification of these kinds of human behaviour both from the production perception respect is a
0:10:51	fairly challenging problem i believe
0:10:55	so here's my operational definition for what are called he'll signal processing basically traverse the competition methods that try to
0:11:03	model human behavioral signals
0:11:05	that are manifested in you know either will work and or covert signals
0:11:09	i don't process by humans explicitly or implicitly you know
0:11:13	and that you know eventually help facilitate no human analysis and decision making you know
0:11:19	so
0:11:20	the outcome is you know it's informatics which can be useful across domains you know whether to inform diagnostics are
0:11:26	they not planned treatments already know a fire up an autonomous system do you know do personalised no teaching age
0:11:34	and so on
0:11:35	but in all these writers be here on signal processing what tries to do such varying levels face to quantify
0:11:40	this human felt sense
0:11:42	so and
0:11:44	that's kind of that they don't like it's challenging from a very lot different dimensions and i'll try to get
0:11:50	at least impress upon you some of those
0:11:54	so i think about it right now of course technology's already held and not in this in this domain quite
0:12:00	a bit a role in all of this is that relies on the significant foundational advances that have been made
0:12:05	and number of the means no but well things that happened and been discussed
0:12:10	i know deeply this conference to audio video data station you know a speech recognition understanding what was spoken
0:12:17	two things like what they forced to talk about visual activity recognition about you know everything from the little descriptions
0:12:23	of you know head pose orientation to
0:12:26	complex you know
0:12:29	classification of a normal activity
0:12:31	to physiological aspect of signal processing
0:12:34	but the thing is that the difference is that using these as building blocks no what you wanna do is
0:12:39	to try to map it to more abstract domain relevant behaviours and that means no more new or a multimodal
0:12:46	model modeling approach
0:12:48	oh
0:12:49	so people have been started to but work on this already you know in no solving various parts of disposal
0:12:55	a right from sensing more people other people been trying to say how do you actually measure human behaviour and
0:13:01	sort of ecologically valid be that is not disturbing the process that we're trying to measure
0:13:06	from you know instrumenting environments but that no cameras and the microphones and other types of things to actually instrumenting
0:13:13	people with sensors by computing that's of techniques
0:13:16	in speech a lot you know increasingly people are doing very rich and rich processing a large know what's
0:13:23	been said by whom and
0:13:24	how
0:13:25	i think to computing is you see a lot of papers have been published in this area
0:13:30	and also it's neutron so still signal processing about how modeling individual group interaction turn-taking dynamics and non-verbal cue processing
0:13:39	and so on so that these are all kind of no essential building blocks for speech
0:13:44	so
0:13:47	in somewhere you know the ingredients for being able to do this is of course you know people are working
0:13:52	in signal processing areas on acquisition how do you acquire these things are you build these types of systems and
0:13:58	meaningful way many dimensions might wanna make are you know the kind of behaviour is you want to track
0:14:04	might not happen in at sonic no you might wanna do it in no in wild animal in the wild
0:14:09	so to speak you know and playgrounds in classrooms at home
0:14:13	for example the montana modeling hidden buttons of elderly
0:14:17	and also you know body computing and there's lots of interesting signal processing challenges their analysis you know how do
0:14:23	you what features kind of tell you more about particular behaviour patterns of interest
0:14:29	and how do you do this robustly no questions that we ask your noise are you
0:14:33	and more importantly also modeling these behavioural constructs a better decide by this expert
0:14:40	oh and provide the capability of you know both descriptive and pretty to you know modeling
0:14:47	so this is kind of not easy because
0:14:51	one the observations off these that here buttons are you know how large amounts of uncertainty
0:14:58	at best partial
0:14:59	there's lots of you know there's no didn't mention this talk and the vision computer vision talk about representations know
0:15:07	how are you what are the representations that we
0:15:10	i have to define
0:15:11	to compute these things the first place no you mention experiment where they gave visual scenes and ask people describe
0:15:18	right so imagine now if you are psychologist is absorbing a couple interacting that one of the things that you're
0:15:24	looking for how they describe the before we even set out to actually man
0:15:29	observable cues to be some presentation so
0:15:32	that itself is a first class of source problem what kind of presentations be specified
0:15:36	and given you know we are talking about human behaviour there's fast model heterogeneity
0:15:42	and that basically differences and how people the bu patterns of people over time and across people
0:15:49	and variability in how these data are generated and use
0:15:53	so
0:15:54	what do people do you know that you know each of these domains you look at a i'll show you
0:16:00	some examples they have their own specific constructs for example in all and language assessment or you know in a
0:16:06	learning situation say literacy
0:16:08	when they tried to figure out what kind of you know help but little a child needs that when they're
0:16:13	learning to read they're looking at night just to know if a child is making it particular sound or in
0:16:18	all of two and they are a number of things come into play in or disfluencies in fact the rate
0:16:23	of disfluencies station they play to
0:16:25	implicit role when we did some experiments
0:16:28	in you know and video should be C D you know how wondering not only are they monitoring physical activity
0:16:33	but also you know emotional state and still want to know model a decision making
0:16:38	and so on
0:16:39	and a lot of common features because after all you know the kinda sensing we have access to are limited
0:16:46	now we have an audio microphones be a bit you can write some physiological sense
0:16:51	and so the approach tends to be at least at some little the little bit levels him tends to be
0:16:56	the same
0:16:57	but important part is no to see
0:17:00	how exports a human expert and i see signal absorb them and learn and see try to see how we
0:17:05	can augment the cape
0:17:07	so that's why the kind of i think the hallmarks of the way i look at the cable signal processing
0:17:12	is to provide supporting tools that would help the human expert and not supplant in on a total automation of
0:17:19	replacing what they're doing and i think that's probably not the most beneficial thing to do
0:17:24	so
0:17:26	oh pictorially you look at this particular chart you know it this is what happens today you know people levels
0:17:32	or
0:17:35	but phenomena that they're trying to do no observe say for example child interact with the teacher and they don't
0:17:41	get a lot of data to listen to but look at the child and see how confident make some judgements
0:17:45	about how the child is reading and provide appropriate scaffolding or you know intervention
0:17:52	what you're saying is that perhaps you know signal processing in the machine learning and then all computational tools can
0:17:58	come in handy one based on trying to sort of be called what human experts to try to learn a
0:18:04	what are the features that you see no either explicitly or implicitly learned that
0:18:09	build models that can help with some of these predictive capabilities there's certain things you know there are beyond human
0:18:15	processing capabilities for example in a look you know fine pitch dynamics or looking at you know what happened the
0:18:21	beginning of the session and of the session some things
0:18:24	computer models can do better
0:18:27	provide feedback and hopefully not this can kind of reinforce each other nicely and no common of conducted and use
0:18:34	it as some informatics so that's kind of the idea here
0:18:37	so
0:18:39	with that kind of background what i'm gonna do the rest of the talk is to signal quickly don't to
0:18:43	some of these building blocks that indeed
0:18:46	but mostly focused on couple of examples you know i'm glad shows and i two examples you're one from this
0:18:51	marital therapy domain
0:18:53	and to know quickly on off to some domain just to show highlights some of the possibilities and challenges that
0:19:00	there are
0:19:02	so
0:19:03	i can't mention is already you know that you know lots of work is happening in multimodal signal acquisition and
0:19:08	processing you know everything from smart rooms
0:19:11	and only an instrument of space
0:19:13	to actually instrumenting people to sense a lot of different things up a sensing the user sensing the environment in
0:19:20	which things are happening because context becomes important
0:19:23	and you know doing this in a variety of locations
0:19:27	from laboratory to actually in classrooms and clicks and played around and so on
0:19:32	what are the important things there we learn is that depending on environment no there are lots of constraints that
0:19:37	come into play for example when we do our work at the hospital with that you know that would kids
0:19:41	with autism there's only restrictions and where we can place cameras where we can put the be yeah the microphones
0:19:47	either
0:19:49	no it interrupts what's happening there what the psychologist rain weather conditions trying to war it's just the structure for
0:19:56	the child because they are not sensitive to certain things and these are distracting and so on
0:20:01	so
0:20:02	even though no we'd like to capture the three D environment with like ten fifteen cameras is just not possible
0:20:08	so we have to work with these kinds of restrictions and hence you know robustness issues personal audio processing and
0:20:15	no language processing bu processing you know
0:20:17	are real they have to know we can just solve it by better sensing
0:20:22	likewise in ascending people we can do a lot of different things but we also have to worry about not
0:20:26	the proper you know not only the technological constraints but also the corresponding it second privacy constraints all these things
0:20:33	so challenging area
0:20:40	i
0:20:42	so those are two actors i
0:20:44	spoken different part
0:20:45	so there we've been collecting all the using actors to study behaviours you know in addition to working with actual
0:20:51	you know population
0:20:53	because you know we can do certain things and the lab or three a derrick on you know do with
0:20:57	data that we collect and hand in hand so this is a more formal motion capture database of dyadic interaction
0:21:03	but a lot of different emotional stuff that's been annotated rated and you know you interested look up
0:21:08	sure
0:21:09	likewise when using actors know that collaborating with the people in your school doing full body sort of interaction dyadic
0:21:17	interaction each of these cases right these were the scenarios that we're is chosen rich enough so that it goes
0:21:24	from the entire gamut of or not
0:21:27	them playing shakespeare and check off to actually doing a broth so rich enough of audio video motion capture data
0:21:34	to ask different questions
0:21:36	looks like this
0:21:42	so this actress
0:21:45	so that kind of data is very important in our data acquisition collection that's a point there so the next
0:21:50	point is you know like this is like a kind of summarises whatever happens at asr you know people have
0:21:55	been working on not only you know doing that in speech works but they're doing
0:21:59	number of different things extracting a variety of you no matter features which may help the not the speech understanding
0:22:05	problem the dialogue management problem you know speaker id problem all this is important for no doing B S P
0:22:12	that's all
0:22:13	also that a lot of work on T no emotion recognition again from speech
0:22:18	and from other modalities are what important questions there is no how do you to present emotions no do we
0:22:24	do categorical representation likeness a happy sad or do more dimension leasing or how oscar this or how to make
0:22:30	it is it or you know how that are also the person
0:22:34	to actually having profiles more statistical distributions are emotional behaviour
0:22:41	actually now people want to continuous tracking of emotional state variation used all sort of ongoing questions in the community
0:22:48	and people try to map those representations from multi modality is important there also
0:22:56	for example you know we know the interplay between you know visual and local features are pretty well known it's
0:23:02	very complex interplay and one could in fact learn things about okay how prosody and head motion related and how
0:23:09	they encode
0:23:10	for example not only linguistic information but also these para-linguistic information nice place
0:23:16	and you know if C number of studies involving and or one says that show that both the complementarity and
0:23:24	redundancy in information coding about no emotions in all these modality
0:23:29	for example you know you run most emotion recognizer with you know speech and facial expression you can show that
0:23:35	would speech others lots of confusion between anger and sort of happiness
0:23:40	but you know if you use face not that goes away you put together of course like any multimodal experiment
0:23:45	reagan sure boost in performance but the point i again here is like when you're trying to model these abstract
0:23:51	types of behaviours
0:23:53	a more the information that kind of encodes these types of constructs a you can have a handle on the
0:24:01	better it is for your competition model
0:24:05	so going back to that example i show those kids being uncertain not sure enough not to add things like
0:24:10	you know measure lexical nonverbal vocalisation like that person mm that little boy said no was hesitating you kind of
0:24:18	detect an model those and you know with the visual cues of you know hand and head motion you can
0:24:25	surely come fairly close to human agreement about not is the style certain or not in context so it's gonna
0:24:32	integrating that you can do things of the sort
0:24:36	in fact in many real-life situations that of course no interactions are based on
0:24:42	other people there who is and who you're interacting with so the idea is if you model you know humans
0:24:48	that are there the mutual influence between say a dyadic two people interacting no more you can do better in
0:24:54	predicting what would come next so for example in the dyadic interaction were we can model both these yeah
0:25:02	people that are in what's it has been why as sort of a data base unit
0:25:06	and you can show that by doing that right hand X cross dependencies between these people not only what they
0:25:12	did for but also what the other person did before you can pretty the upcoming state slightly better so this
0:25:18	this type of things can be done with the existing missionary you know with a number of different things
0:25:23	so
0:25:24	what would that kind of broad very high level overview of you know some of the computational things that are
0:25:29	happening in our field
0:25:30	so now we can answer to not a goal what i'm asking you know seen
0:25:34	how can these types of things be applied in two problems that people are asking in these various domains they're
0:25:41	doing this without as you know messing with that those fields right no matter to their peers that's been going
0:25:46	on for decades it all they want to predict things like based on sort of how long will the
0:25:51	matters last or can that be amended to questions
0:25:54	so you know we come there and say well we have some computational ideas and i can be held
0:26:00	so that's right
0:26:02	so psychology research all depends a lot on observation judgements a you know many times the in fact report these
0:26:09	interactions and code to go to
0:26:13	very painstaking and careful coding off these behaviours based on you know a good theoretical research frameworks that particular lab
0:26:23	might have
0:26:24	and they develop a lot of coding standards and so on
0:26:28	so
0:26:31	yeah i'll show you some examples of
0:26:37	earlier
0:27:00	i
0:27:03	so various couples interacting okay this is actually not real clinical data
0:27:10	what i'm gonna talk about that later is actually based on clinical trial data
0:27:15	so they create these manual but this man decoding process with which the analyses kind of not very scalable it
0:27:20	takes a lot of time and you know and that training coders use integrated that no students in psychology linguistics
0:27:28	are recruited not very reliable
0:27:31	inter coder reliability is also tough
0:27:34	and so we ask you know the very simplistic question of a word can technology help to code these kind
0:27:39	audio-visual data these behavioural sort of characterization
0:27:43	so and there's a measure is in fact are very difficult for humans to make that can help you know
0:27:49	all these
0:27:49	measurements of timing and you know even battery station if you do how long a person speaks actually very important
0:27:55	in a show later on
0:27:57	that tells you quite a bit
0:27:58	and we can you know consistently sort of able to quantify some aspects of these at least the low level
0:28:05	human behaviour
0:28:07	so here's the same kind of chart no here for example we are interested in very couple discussing a problem
0:28:13	who wanna know for example you know how
0:28:17	a spouse's blaming how much blame is one spouse putting on the other person other spouse
0:28:22	two weeks it's not symmetric necessarily
0:28:25	so this is what we wanna do so to help with that so we have a big corpus up from
0:28:31	one hundred thirty four P this just couples were enrolled in a clinical trial
0:28:38	and received couples therapy so we have access to one hundred hours of data or so we not intended for
0:28:45	doing these automated processing yeah no transcription and so one it also has video was sought some examples and this
0:28:54	is what we start with
0:28:55	so
0:28:56	and it also has a very nice for us this that it has a explored ratings of these interaction session-level
0:29:04	each ten minute you know every couple that a ten minute long problem solving interaction
0:29:09	and they could for a number of things number of behavioural patterns that were of interest to researchers in this
0:29:14	domain for example
0:29:16	one coding global goal was like
0:29:18	is the husband showing acceptance so pretty abstract a question and the description that was that corresponds to that process
0:29:26	will indicate understanding acceptance apartments use
0:29:30	feelings and behaviours listens to the partner with an open mind positive attitude and so on so this is what
0:29:35	the court a straight internalise and rated on a scale of one to nine
0:29:40	okay
0:29:41	so this is the kind of the behaviours we try to see whether we can pretty with that these signal
0:29:47	cues right so most we start with the most obvious thing or simplest thing we know how to do
0:29:52	so we said well let's focus on a few of those codes besides like you know acceptance blame positive aspect
0:29:58	negative aspect sadness
0:30:00	and so one each mark for the yeah
0:30:03	both husband and the wife
0:30:04	and with that the ratings no one through nine is there are no histograms of those that would given by
0:30:10	people
0:30:11	we said to make it so even simple but simpler for us we said well let's just focus on the
0:30:16	top twenty percent and top to bottom twenty percent
0:30:19	no separating extremes
0:30:21	and you see what can we do this
0:30:23	"'kay"
0:30:24	from say things that we know to do like measure speech properties no and measure transcribe it and say can
0:30:32	towards tell me something
0:30:33	and if i know that how successful like be in predicting these codes that the humans can get a was
0:30:39	a problem
0:30:40	so that's a surcharge it's busy but what it just says is what we most of us here due right
0:30:46	we kind of get all your at you know you get rid of things that are hopeless and then we
0:30:51	do speech signal processing we measured no be due but or now recall that
0:30:56	at
0:30:56	and measure things like you know pitch and intensity and peas and mfccs and drive lots of different statistical functionals
0:31:05	at the utterance level like different data levels of temporal granularities
0:31:10	and throw it into our favourite machine learning a tool
0:31:15	and try to predict that the that particular category to be interested in
0:31:19	likewise we can also do you know transcription generate lattices and then you can use those discourse specific
0:31:27	i know for K for classification
0:31:29	"'kay" so that's
0:31:30	exactly what we did so here's a transcript of interest i don't example so what it's like you know what
0:31:37	exactly what you can spend you know everything there where the money is one of the things that we like
0:31:41	this like a
0:31:42	think that they're worried about in the fight double
0:31:45	another thing is that
0:31:46	you'll see that when you look at the results
0:31:50	and in fact one of the other important things that the detection of all these non-verbal vocalisations and cues that
0:31:56	about their information bearing at least that's what the algorithms dallas
0:32:00	so i say mentioned right
0:32:03	lot of prosodic features an acoustic features and simple binary classification and you're the results just from a very simple
0:32:09	years yeah with the acoustic features right rating you know for many of these constructs like you know blame and
0:32:16	you know pasta negative behaviour you know we can do much better than ten
0:32:21	that's problem you know these local features and that was very encouraging
0:32:26	well there's certain things like not sadness and humour harder to a do just from acoustics and the reasons because
0:32:33	no remark on to capture any contextual cues are lexical cues are visual cues or anything at all
0:32:39	so then we said well okay now let's throw in a lexical information you look at the transcripts about their
0:32:44	a lot of work that scream at you saying hey this guy's really mad at that person you know they're
0:32:49	blaming each other for example in this transcript you know we highlight
0:32:53	and the kept saying it's aggravating yeah why
0:32:56	and so we said well can be automatically it captured these kinds of sailing works from the text
0:33:02	and simple again you know we'll language model
0:33:06	and you can score you know utterance X against these models to figure out okay which particular conditioned that case
0:33:15	this particular i know this these ports correspond to
0:33:22	so can do it with no like this is not necessarily just utterances but the interesting thing is you know
0:33:27	the kinds of things that a part of these models are very informative you know you've been very simple things
0:33:32	like okay in the blame situation you can look at the extremes of the hyperplane work
0:33:38	and the little blame words you know that you the second person
0:33:44	is actually got correlated with high plane quite a bit in fact very consistent with what psychologists that you know
0:33:50	predict hypothesize
0:33:52	compared to first person but you also see words like you know teaching
0:33:56	because cleaning seems to be a big deal if i
0:34:01	comes about living
0:34:03	quite a bit so
0:34:05	yeah that's right then you know we said well let's a simple thing we do but there that's just not
0:34:10	know
0:34:11	right a lot of challenges add to this problem domain
0:34:15	first of all you know any particular single feature stream is what we provide you just a small window as
0:34:20	it pointed out and it's noisy
0:34:22	so you know of course we want to do with multimodal E and you know you want also do it
0:34:26	in the context sensitive fashion
0:34:29	is more important thing is like many of these ratings you know many but domains they do it at the
0:34:34	session level they wanna get attached although just of that particular thing
0:34:37	but what is not clear is why in that particular unfolding of and that like to the space particular perceptual
0:34:44	judgement the people
0:34:46	so you want to know what was sailing
0:34:48	so we tried doing you know like these are first got it is using sort of multiple instance learning to
0:34:54	see whether we can do tree things that are possible
0:34:58	then that a point is that no when these ratings are done write it down it's not so that you
0:35:02	know in a more typical sort of i categorisation but they got they are posed as many times you know
0:35:10	i in a rank order list that is one is
0:35:13	you know sort of less than two is less than three tuple or no way into one are known or
0:35:17	can be also
0:35:19	do what people are trying to integrate this
0:35:22	yeah then these are kind of things trying to do more efficiently what people are doing
0:35:28	there are things that you know that are more than the felt sense case
0:35:31	people hypothesize that you know when that so in track there's some things about you know
0:35:37	synchrony in their interaction that happens that tells you how flexible that interaction proceeds no
0:35:43	so if you are able to quantify the spelling of this aspect of what is called entrainment then that'll be
0:35:48	useful you wanna known or can be bold signal models that actually try to do this
0:35:54	then another point is when people look at you know these are you know a particular behaviour apartment looking for
0:36:00	it
0:36:01	different exports even you know train people look at it differently you know and they responded different portions of the
0:36:07	data
0:36:08	so you wanna know how we can actually capture these data-dependent human a diversity in behaviour brown processing
0:36:18	into our models
0:36:19	so doing simple plurality or majority voting based you know a mission line techniques might not necessarily work well for
0:36:27	these kinds of knots track
0:36:29	processing
0:36:30	so the first thing is like you know the easiest thing like we had the language and acoustic information to
0:36:35	work together you know of course it's gonna do better yeah at least that's all these expressions a rate including
0:36:41	ours
0:36:42	and our that for one reason was our asr really was bad
0:36:47	because we went to new duties that these language models from the couples domain but what was encouraging is that
0:36:53	even with like a solar for thirty five percent that what iterate asr the information from the
0:37:00	from the language models
0:37:03	from the not a lattice is that we generated and acoustic bass tech classifiers no put together provided a fairly
0:37:11	decent sort of prediction of these codes and cycles is very excited about that
0:37:17	but what we did was to actually make it more multimodal be really need to have information about the nonverbal
0:37:23	cues quite a bit so be rigged up or latino rebar couch
0:37:27	for the therapy
0:37:28	and several microphone arrays and you know
0:37:33	synchronise with about ten htk emerson about that well a motion capture camera to provide data of the sort so
0:37:39	it's very useful to do more sort of a careful study of human
0:37:43	vocal non horrible a behavior interactions
0:37:46	so you're data like this
0:38:05	oh
0:38:06	so goes the conversation so you can do a lot of things yeah since we are collecting data in a
0:38:10	week and a rice and you know localise and do things of that sort quite well and
0:38:15	so we asked some questions like okay
0:38:18	describing approach avoidance behaviour which is very important so we need side of course you about
0:38:25	has been coupled interactive this guy was leading back quite a bit and you know effect expresses displeasure ins interact
0:38:31	very subtle cues just like this folks that come on C N body language experts right we tried to do
0:38:36	this
0:38:37	but signal processing
0:38:40	so approach avoidance is actually no moving toward or away from events or objects
0:38:46	and it actually is related doing psychology theory like you know emotion motivation and particularly in the couples domain relationship
0:38:53	that commitment
0:38:55	so people are very interesting if we can quantify that from using no vocal and no visual cues can be
0:39:01	actually predict or model this
0:39:02	so that was a problem that we took on we said okay we can post disaster no we had psychologists
0:39:09	rate this an ordinal scale want to know minus for two for a scale of nine
0:39:14	and we pose this as sort of an ordinal regression problem basically broke it down a series of sort of
0:39:20	binary classifiers one was the other one and two was the other and then we'll put the large a logistic
0:39:27	regression model on top of that
0:39:29	with these multimodal features both in all acoustic and visual features
0:39:34	so computer vision was stuff so we just took the motion capture data in slow but actual video data
0:39:41	things like we could get very clear you know my measurements of in a head body orientation you know the
0:39:47	folding arms are how the how much they're leaning and so on so at least to get an upper bound
0:39:51	idea of you know what kind of visual features are important to measure
0:39:55	approach avoid
0:39:56	and the usual audio features that i don't need to tell you guys about
0:40:00	pitch and mfcc and all that stuff
0:40:03	so interestingly no we showed that actually this or no formulation this that's published by a vector and one other
0:40:10	students matlab
0:40:12	i guess
0:40:13	that i would not formulation was actually very helpful and stuff just formulating is the plano sort of classification problem
0:40:21	and the charge you're sure actually the difference between using on all the lips svm with of just a plain
0:40:26	all svm
0:40:27	and lighters better be means that just the difference in the error rates so with audio video labels it's actually
0:40:35	better so
0:40:36	but again multimodal in all of this again say preaching to the point type of thing it's important but we
0:40:42	can actually use these audiovisual cues to measure something like this
0:40:47	what psychologist perceive as approach avoidance behaviour that wasn't great
0:40:53	so the point so far is that you know a multimodal approach this important
0:40:58	the next sort of a computational thing i wanna share is this whole notion of okay they often make these
0:41:04	sort of just all the judgements on data and you wanna know what like to it or from
0:41:12	pure learning point of view
0:41:14	how to make it more or less that is how do you choose sample the data says that you can
0:41:18	maximise the i-th here's you can post
0:41:21	two different ways
0:41:22	so i will show that the little study here
0:41:26	so we use that multiple instance learning again using this case study of this behavior interaction of these couples to
0:41:33	say well can be i defy speaker turn
0:41:36	that yeah that are salient you would normally session-level code so you have a ten minutes long session husband wife
0:41:42	note taking turns not talking about what are they talking about and we have
0:41:47	for rating so you wanna know which of these torrents would most explain that observed rating okay that's a problem
0:41:56	so as usual right you extracting all features from the signals and you want to identify turns that make the
0:42:04	difference so we use approach all i know that was density based svm a support for doing this that and
0:42:11	my whole problem
0:42:13	as follows so
0:42:14	very simple idea so you have this whole notion of backstrap pasta bags so hyped lame sessions low blame sessions
0:42:22	i acceptance looks at concessions the of data from that
0:42:25	so you
0:42:27	you create your feature space here so acoustic feature space
0:42:31	then you build these that was density and select these local maxima showing that they must be the prototype from
0:42:36	your data and then when you're ready to kind of evaluate and incoming session you compute the distance
0:42:45	minimum distance to these prototypes and use those
0:42:48	as you features rather than all the all the
0:42:51	simple idea
0:42:53	so the features that you considered or you don't in lexical features here for example i put this table here
0:42:59	just to again point out that not only are not
0:43:04	no lexical items important but things like no fillers and that nonverbal vocalisation
0:43:10	seem to pop up quite a bit by information get again selection so they are important for these kinds of
0:43:15	behaviour
0:43:17	signal processing stuff
0:43:18	and so we had all these different informative
0:43:22	features
0:43:25	and created feature vector procession patient is to ever since the density
0:43:29	and you're some results for the acceptance problem so we could show the one with these in my L select
0:43:37	a feature i think is all the features so not cool you know are we
0:43:41	this selected features no or
0:43:45	sort of meaningful but they also kind of boosted the performance of the wave be interpreted that these are sort
0:43:50	of reasonable ways of selecting these
0:43:53	sailing consensus that our definition of saliency to discrimination
0:43:58	but when we add intonation features for this problem at least for some of these construct it didn't really help
0:44:03	another way be added these intonation features as
0:44:07	as contours probably doesn't right or maybe they don't bear any information for these became constructs
0:44:13	so and that was true for this and the multiple this instance but based learning was true for many of
0:44:18	these behavioural descriptions we were looking for and that was increasing
0:44:23	but what we haven't done
0:44:25	is that you know you have really validate whether these sort of
0:44:30	machines a hypothesized instances are in fact something consistent what humans would do ask them to be a
0:44:39	if they're salient or not
0:44:41	so what things are up but i no interest in doing is how we can actually have do human experiments
0:44:48	are underway role to make this part of active learning you want to so machine propose a certain things humans
0:44:53	can either correct or not
0:44:54	and so on that's interesting stuff
0:44:56	and you could throw in or other features also
0:44:59	so
0:45:00	the next step topic i want to talk about again moving along this line of more getting more abstract this
0:45:06	is all modeling of entrainment
0:45:08	so entrainment this you know
0:45:10	kind of refers to or also called as interaction synchrony this natural naturally occurring in a coordination between and not
0:45:17	interested in tracking
0:45:19	ads are interacting people like multiple levels and along multiple communication channels
0:45:24	so you worked at interspeech this year no julia hirschberg of a fantastic talk on this
0:45:29	local lexical entrainment all
0:45:32	so and people have been hypothesized in that this is needed for all humans use this touchy the efficiency in
0:45:37	communicating and you know and
0:45:40	increasing mutual understanding and so on it's been extensively studied ins and psychology psycho linguistic sense
0:45:47	so what we want to try to see is that okay you have these kinds of we hear buttons
0:45:52	well measurements of these sorts a set of things
0:45:56	can be it can it be done and can didn't want these high level sort of behaviour characterization that yeah
0:46:02	so
0:46:04	but the thing is here you can't really ask human sanity hey are these people in training or not
0:46:09	it's very difficult to do particularly notable coli other sort of signal Q based things
0:46:14	and also unlike many places where they measure synchrony no they have signals and then you can do all mutual
0:46:20	information correlation measure
0:46:21	here because the turn-taking structure right really you know things are not aligned in time so we have to think
0:46:26	about other clever ways of computing is
0:46:29	and of course it's also directional how much i inching towards you not necessarily same as how much you entering
0:46:34	toward me so
0:46:36	no that's we try to figure out now how to compute how do two people sounded like in the spoken
0:46:42	trained case
0:46:44	as usual so measure acoustic features well tell you what about it
0:46:49	what we have maxed in german that here was to actually a concert at the what we call these pca
0:46:55	vocal characteristics space and then that similarity between these spaces for projecting the data onto D space to find some
0:47:01	similarity measure that was the basic idea
0:47:05	so features are as usual you know a pitch and frequency loudness and spectral features for vocal data at the
0:47:13	word level
0:47:14	and pca speech is reconstructed board at the level of the turn and the level of the whole session so
0:47:21	we have that
0:47:22	and then you can calculate very similarity measure
0:47:25	this both you're basically doing the pca means you're transforming a
0:47:30	to a different coordinate space so these components are not pose by then aligned with smaller so measuring angle "'cause"
0:47:38	i know that give you some notion of that some larger metric you can make those components with the varying
0:47:45	and you can use that as one kind of similarity metric
0:47:49	or you can project these data want to use pca space and calculate like a level
0:47:56	in calgary number of different similarity metrics
0:47:59	and then you ask questions hey what does this mean
0:48:02	so first thing is we thought well as a sanity check you know put for real dialogue basically hopefully there
0:48:07	must be some provision that these measures reflect i think it's artificial style
0:48:11	so we construct artificial dialogs from these things you know the randomized data from other people and created that
0:48:17	and just to just to sanity check to make sure that you know like these measures are
0:48:22	separate these things out no it doesn't tell you this entrainment or not but at least tells you know
0:48:28	something reflecting real dialogues enough so that was first
0:48:32	the second is you know this is what we sort of reflected on the literature in the second in the
0:48:38	domain where they feel that in train with this actually said so useful tool to
0:48:43	provide flexibility in a rhino this discoupled interactions
0:48:46	so a known fact that people think it's a precursor to know the empathy and so on so you wanna
0:48:51	say that you know in shame this was more in positive sort of interactions that a negative interaction
0:48:57	was so that was sort of indirectly via trying to see these interim measures that you
0:49:02	so
0:49:03	and encouraging that these measures were able just these interim measures right these similarity measures as features
0:49:09	we were able to note that i a statistically significant based distinguish between these by estimating interact
0:49:16	i was varies you know increasing so of course immediately want to build a prediction model and that so that
0:49:22	you be put these features in a factorial hmm model and try to see how just using these entrainment features
0:49:29	nothing else how well can you predict how negative or positive that interaction one
0:49:36	so
0:49:37	we could do what you know
0:49:39	quite better than chance of stance to present gonna such diverse that's pretty in great
0:49:44	again
0:49:46	here again that open questions all this is just a small look at the what this pretty tough problem in
0:49:51	a lot of open questions you know how we can actually show entrainment across modalities you know
0:49:59	and how do you actually do this in a very dynamic framework what are other different ways of quantifying this
0:50:05	and how the actual evaluated better than just doing this indirectly the lots of very open both theoretical and i
0:50:11	know a computation question
0:50:14	finally nodded quickly say they know that
0:50:17	you know human annotators that's the reference a number of cases
0:50:21	and often times we do fusion of various sorts you know whether human classifiers machine classifier
0:50:26	and
0:50:27	B
0:50:28	rely on diversity these classifiers so that they can in creates them you don't get better result
0:50:35	so what we wanna know is how actually we can build mathematical models that reflectees i'd ever since people so
0:50:41	for example no people of study reliability weighted you know a data to bow classifier models
0:50:48	and they show on that
0:50:50	better than these just doing simple plurality
0:50:52	and i my student card they did some work on and actually modeling this you know and em framework and
0:50:58	very encouraging
0:50:59	so the point you i wanna know these data using a lot of different things about the wisdom of crowds
0:51:04	in you know that wisdom of experts in all these things really i think particularly for modeling abstract things we
0:51:09	have to bring
0:51:11	explicit models of the evaluators into
0:51:14	the
0:51:16	that the classification problems to learning problems
0:51:20	so
0:51:21	so these are just you know some of the challenges that i just mentioned you know while attacking these types
0:51:26	of behavior questions as many others but i just want to keep a feel for
0:51:30	so what do very quickly you know i know that frank is showing its time thing
0:51:35	i wanna share some things about that ought to some feel just a few slides
0:51:40	so ought to some as you know it's like you we we've been hearing a lot about in the news
0:51:44	lately eight statistics and one in wanting to children were diagnosed and so on so yeah asking what can technology
0:51:53	to hear particularly you know people working in speech signal processing and related areas
0:51:58	one we can do it all computational techniques and tools to help better understand it all these various you know
0:52:03	communication social patterns and children one of the biggest hallmark's
0:52:07	is
0:52:08	difficulties and social communication pros prosody
0:52:12	perhaps a better site defined quantified these kinds of felt since five seconds
0:52:17	and the second thing is of course building or interfaces that can elicit increase held specific social communication behaviour
0:52:24	also example so it is important to do pursue these kinds of questions so we've been collecting data all child
0:52:30	psychologist interaction that will be about
0:52:33	at ninety kids today and no transcribed and both audio video data
0:52:38	and you can ask questions of various sorts with these types of data
0:52:43	in dallas
0:52:44	so in these areas interactions in the psychologists and the you know interactive child a rate that the child along
0:52:50	number of dimensions you know a or everything about you know showing empathy shared enjoyment the prosody and so on
0:52:58	and be looked at very simple measures of just do would be on these interactions a look how much
0:53:05	each spoken by child relative to seconds
0:53:08	tells you what the codes that are provided very interesting like what you know that thirty three no ratings that
0:53:14	cycle is provided for explained by us it yeah by these just simple measure
0:53:19	it's very interesting because it's observation based
0:53:22	and this can be done sort of you know consistently is that
0:53:25	two
0:53:26	the other thing is speaking rate so just look at you know normalized on speaking rate that explains other code
0:53:31	so
0:53:32	even with simple techniques that you have in hand and with the kinds of behaviour conscious people interested you can
0:53:38	actually provide tools and support these steps that
0:53:42	of course you can also use these dialogue systems and you know interface is the number of colleagues at developing
0:53:48	to actually illicit
0:53:49	interactions in a very systematic and reproducible way
0:53:53	because it's human interacting is no sort of variable because psychologist even though they're doing structured interaction i'm not gonna
0:53:59	be the same
0:54:00	and we want to see whether childhood in fact interact naturally with these kinds of character
0:54:06	and if we built that thing with cslu toolkit was robust it and we're in creating we had a number
0:54:11	of different emotional reasoning games storytelling and so on like this no principle
0:54:17	oh
0:54:20	i
0:54:20	oh
0:54:21	yeah
0:54:24	and so on so that i'll they don't wear is price we have collected data no each child came or
0:54:28	four times four hours each of what the of fifty hours of data
0:54:32	think it
0:54:33	and very encouraging we could actually see that they we extracted as they would contract the parents how the parents
0:54:39	interaction change to be a physiological data so a lot of very interesting questions we could do that we can
0:54:44	measure speech
0:54:45	these parameters language that parameters visual things
0:54:48	and that and so a lot of interesting questions to supplement what people are doing otherwise so a number increased
0:54:54	by that possible yeah i'll cut the slides there so anyway so it's in some other time what i want
0:55:00	someone nice at this point is to show that you know that what i know
0:55:03	what i should couple of examples there's like so many open challenges in these domains you know where a community
0:55:09	like
0:55:10	our skin i doubt contribute everything from you know robust capture and processing of these multimodal signals to actually deriving
0:55:18	basic find appropriate representations for computing
0:55:22	and you know doing signal processing know what kind of features no feature engineering help that some that are data-driven
0:55:29	some that are inspired by human-like processing
0:55:32	different modeling schemes mathematically schemes that can bring some quantitative sort of sight to these kinds of
0:55:37	very subject to type about human based assessments
0:55:40	to actually you know helping and the questions of what data privacy issues
0:55:46	so lots of interesting possibility
0:55:48	in a latino we've been are forced to work on you know number of different meant to have domains in
0:55:52	fact i just touched upon one here and a little bit on the arts and so that's why like
0:55:57	blocks
0:55:58	but there's lots more one could talk about the here like it's fascinating area
0:56:03	so in conclusion
0:56:05	you know human behavior can be described no same people interacting or we can
0:56:11	two different sets of people can describe the same thing from different perspectives depending on what they want look for
0:56:17	so that offers a lot of but channel is an opportunity as far as to the developed you are indeed
0:56:23	computational advances you know in sensing processing modeling folly did but i think what's most exciting for me is this
0:56:30	opportunity for interdisciplinary sort of a collaborative scholarship
0:56:34	here
0:56:35	and so in some
0:56:37	obviously we have a signal processing you know on the one hand held says do things that people know how
0:56:43	to do well perhaps more efficiently consistently
0:56:46	but what this tantalising is that you know we can actually provide no new tools and data
0:56:53	to offer insights that we haven't had before it's not yet so i think that's a that's exciting part here
0:56:59	so i'd like to thank you and all my collaborators as like hundreds of them to help this work with
0:57:05	teleported and mice of sponsors
0:57:08	so with that i'll can to and i'll show you some funding since it's a holiday season
0:57:14	the feedback
0:57:28	yeah
0:57:39	this was actually if it was wrapper
0:57:41	so i convinced him to good don't ask and two
0:57:43	right but
0:57:45	you can be busting
0:57:47	so thank you again
0:57:56	yeah thank you very much for this very interesting very lightning talk we have something like four minutes for questions
0:58:02	so i would like to open the floor
0:58:09	a question for multimodal signal processing a logo like as we know some people for the more formally
0:58:20	oh no we use a
0:58:23	also market like a the comparable comfortable distance for the communication but different
0:58:28	approximates you mean yeah yes and you know in fact that
0:58:32	the
0:58:33	body language data showed sort of very quickly of these actors doing it so we have a distance measures of
0:58:39	both that are estimated from video but also from all body capture
0:58:44	be a couple papers nike as to share on this body language business and how that would reflect the don't
0:58:51	can tell you something about this that
0:58:54	i think the dynamics of interaction and
0:58:58	approximates also sort of a feature in now
0:59:01	approach avoids
0:59:02	as when they're trying to come together or normal way
0:59:06	that in fact a little flip actually just a little mowing rushing away from the center of that interaction
0:59:12	well that's you and culturally invaded over the important question i think what you're alluding is to what are the
0:59:17	cultural sort of underpinnings of these types of features and how to demonstrate
0:59:22	even had data from different cultures in these studies except what we have
0:59:28	in the syllable taught some have data from kids growing up and let you know families in los angeles los
0:59:35	angeles is very multicultural
0:59:37	and
0:59:38	we have some data but we haven't had enough information to marginalise those effects yet
0:59:45	so the only thing we have
0:59:47	body language that things are but the actors
0:59:49	so far
0:59:51	sense
0:59:54	do we have another question
0:59:58	okay so well i have a question sounds
1:00:01	you mentioned very briefly on crowd sourcing so i'm kind of injustice how what's your view on what kind of
1:00:06	role crowd sourcing code
1:00:08	play here especially works really all a lot of our subjective measurements and so on
1:00:13	yeah so we used you know for more obvious things right the things like transcription or judgements of more things
1:00:21	that in define better
1:00:22	i
1:00:23	ask people raping that's easier but what i'm finding a difficult is to define these abstract tasks for ratings from
1:00:31	a lot of people
1:00:33	you're trying right now to do sarcastic
1:00:37	so cast more snark in is enough
1:00:41	were you trying to see we can use the wisdom of crowds but at least
1:00:45	the biggest challenge is to see how we can
1:00:47	partition these cards so that the kids are from people that won't be so we put all these questions one
1:00:53	but for behaviour processing the bigger challenges someone all these data are very protected by all kinds of restrictions so
1:01:01	we can't farm it out to do crowd sourcing types of things but the actors data we are able to
1:01:07	do things so
1:01:09	but we still haven't figured out how to do not abstract things because we have in turn make
1:01:15	this concept be internalised by the people that annotating
1:01:19	so simpler tasks that are in to do more easier i think
1:01:25	okay
1:01:26	is there any more questions from the floor
1:01:34	that was a great thank you
1:01:36	so a couple years ago that julia hirschberg gave a really interesting
1:01:41	summary overview of what it's being done on detecting nine
1:01:46	with obvious applications of course
1:01:49	and one of the main conclusions is that in fact
1:01:54	with detecting a you can you really need to
1:01:57	i know the price is there anyway
1:01:59	if you don't so it's still
1:02:02	it's a it's a step beyond
1:02:04	the earlier question about contradiction
1:02:07	and i wondered if you've come across any evidence for this thing with the kind of
1:02:12	data you're looking at
1:02:14	in the you know in fact this is actually a very important question how we can actually individualised personalised in
1:02:19	fact that's one of the i believe that strong points paper we competition
1:02:25	as we have enough data actually line particular specific patterns or an individual specific fairly well
1:02:33	in fact in on some right that's what actually what people always talk about all this is very heterogeneous right
1:02:40	because the symptom all of these very lacrosse children but with the children to they are actually very depending on
1:02:46	con
1:02:47	but the way that they present themselves are fairly into the specific there are gaps and there are we strains
1:02:54	every individual
1:02:55	and you can learn that from data fairly well these patterns over time which are not necessarily have to buy
1:03:01	these forty five minute set of interactions with that there is you know or a clinician
1:03:07	i do believe that
1:03:09	that the ability to be able to individual i six models you know that people talk about adaptation of bigram
1:03:16	modeling all these things all these techniques actually lenses
1:03:22	so
1:03:23	culture cultural aspects are you know slightly harder because not because we can try because it's very how to collect
1:03:30	data systematic control base so you can see this is because of that and not this and that's the but
1:03:37	individual low level models are easy i believe and
1:03:41	in fact that's why one of the things we did with these
1:03:44	computer character based interaction was to bring the same title word or again because they loved interact with computer characters
1:03:51	and have dialogues with these characters
1:03:53	and that
1:03:55	so we have several hours of data from the same child and you also have them interact and the parents
1:04:00	and with the unknown person like sort of randomly persons like also you have these human interaction family run for
1:04:07	my personal and human computer interaction
1:04:09	you can kind of actually trying to start beginning do a characterises child fairly well would be a real the
1:04:17	lexical use their you know what kind of initiative at you know initiatives one because things in that
1:04:23	we can we can begin to do even with that simple little speech and entropy ideas we you know we
1:04:29	can bring to the table
1:04:31	but line and stuff i don't know
1:04:34	i but i'm acting on it
1:04:36	yeah that's like killing your times people to okay speaker again
1:04:41	thanks

Behavioral Signal Processing

Invited Speakers

Shri Narayanan (University of Southern California)