0:00:12 | it's my great honour and pleasure to announce our stick |
---|
0:00:17 | invited speaker today three now ryan who will talk about behavioral signal processing |
---|
0:00:23 | so i three is the andrew viterbi professor at U S C |
---|
0:00:27 | this research focuses on human centred information processing and communication technologies |
---|
0:00:33 | and enjoying that he seems to be kind of the volume that holds for professor appointment |
---|
0:00:38 | i was very impressed to see that in electrical engineering computer science but also linguistics |
---|
0:00:43 | and psychology |
---|
0:00:44 | and i don't live in the us but one and told me that is also a regular guest on us |
---|
0:00:48 | television so |
---|
0:00:50 | so please help me welcome sheri not really looking forward to talk |
---|
0:01:03 | thank you |
---|
0:01:04 | right i |
---|
0:01:05 | some really honoured to be here and it was great to see a lot of friends of my haven't seen |
---|
0:01:10 | in a long time kind of come back to speech at least to check it out |
---|
0:01:14 | so they were asking you know what to say next crazy fringe E funny things i've been up to you |
---|
0:01:21 | so that's this talk today |
---|
0:01:23 | and |
---|
0:01:24 | the only bad little problem i have with this is because i haven't done very much in this topic yet |
---|
0:01:30 | machines but i would share whatever we've been up to in the last couple of years |
---|
0:01:35 | hopefully won't disappoint them able to spend you part yeah |
---|
0:01:38 | so the title is a bit here on signal processing i will yeah momentarily define what i mean by that |
---|
0:01:44 | the case be made of this terms of the got at least say what it is |
---|
0:01:49 | so |
---|
0:01:51 | but this is this work concerns you know human behaviour as we all know it's a very complex and multifaceted |
---|
0:01:58 | involves a very complex and intricate of main body kind of relations |
---|
0:02:03 | has the effect of you know but you know and the environment rolls interaction with other people and then barman |
---|
0:02:11 | a very low you know it's that reflected in how we communicate you mode our personality and interact with other |
---|
0:02:18 | people |
---|
0:02:19 | and also it's characterised by the generation processing a multimodal cues |
---|
0:02:24 | and often characterises typical atypical this water so one |
---|
0:02:29 | so one wonder you know what is the role of signal processing or signal bussing people in this |
---|
0:02:36 | business |
---|
0:02:38 | so you get across number of domains actually be here analysis for either explicitly or implicitly so essential to the |
---|
0:02:46 | starting from customer care you know you want to know a person is |
---|
0:02:51 | you know frustrate are very satisfied with the services that that's been rendered and you want to sell more things |
---|
0:02:57 | you know you wanna |
---|
0:02:58 | right here but at the level of an individual or group source or one |
---|
0:03:02 | in a learning and education you not only do you wanna know whether someone is getting a particular and so |
---|
0:03:09 | right or wrong you wanna know how they got it how confident are they |
---|
0:03:13 | and you know how can you actually adapt or this personalise learning is one of these you know grand challenges |
---|
0:03:18 | of engineering so to be able to do that you know we have to understand |
---|
0:03:23 | be here patterns and like that |
---|
0:03:25 | but more importantly and something that i'm five gotta know increasing passion about this whole area of mental health and |
---|
0:03:31 | wellbeing which i'll try to touch of one today S couple of my examples |
---|
0:03:35 | where a |
---|
0:03:37 | you'll behavior analysis very centrally the observation based or other means |
---|
0:03:43 | but you know when you look across no while the computational tools are used but mostly it's very human based |
---|
0:03:50 | so i thought before we go for it also shows some videos are examples of you know some of these |
---|
0:03:57 | typical problems one could ask |
---|
0:03:58 | so here this is like you're gonna see kids playing with actually a computer game talking to it |
---|
0:04:03 | the question is you know a can be tell if the child is you know something about their cognitive state |
---|
0:04:09 | you know confident are |
---|
0:04:11 | not |
---|
0:04:12 | so let's look at this little girl |
---|
0:04:18 | right |
---|
0:04:20 | or you can you |
---|
0:04:22 | and mute audio please |
---|
0:04:36 | alright let's try again |
---|
0:04:43 | hold on i checked many times |
---|
0:04:47 | something about you people an idea |
---|
0:04:50 | let's see |
---|
0:04:53 | it's still a |
---|
0:04:54 | okay |
---|
0:05:06 | or answer |
---|
0:05:09 | yeah |
---|
0:05:16 | i |
---|
0:05:20 | where is this |
---|
0:05:28 | well i |
---|
0:05:30 | oh |
---|
0:05:31 | oh |
---|
0:05:33 | i |
---|
0:05:36 | oh |
---|
0:05:43 | i |
---|
0:05:45 | so just looking at us |
---|
0:05:50 | we season or from there is sort of a vocal cues and you know that the language they're using the |
---|
0:05:55 | visual cues and looking around and looking away you can say something you know at least that these are different |
---|
0:06:02 | and you know the one of the questions we ask is like okay can be actually formally someone you know |
---|
0:06:06 | the these problems of measuring speaker |
---|
0:06:09 | so the next example |
---|
0:06:10 | it's from marital therapy or plastic all than your counselling |
---|
0:06:16 | so what you're gonna see us that a couple in writing |
---|
0:06:21 | and that people in this or social able to play a psychologist or doing this kind of research and people |
---|
0:06:29 | who are actually help in trying to help these couples in a look for a lot of things you know |
---|
0:06:34 | characterising aspect of dynamics in off |
---|
0:06:37 | looking at who's blaming homeland trying to figure out what that is and try to plan to treatment based on |
---|
0:06:43 | that so let's look at this video |
---|
0:06:45 | should i tried again |
---|
0:06:48 | i |
---|
0:06:50 | okay |
---|
0:06:58 | no it's not me |
---|
0:07:01 | you know |
---|
0:07:02 | no |
---|
0:07:03 | the right leg |
---|
0:07:06 | right |
---|
0:07:10 | alright |
---|
0:07:45 | oh |
---|
0:07:46 | used car |
---|
0:07:48 | yeah but what you |
---|
0:07:50 | again |
---|
0:07:52 | but |
---|
0:07:53 | the one of these things |
---|
0:07:57 | or we try to make |
---|
0:08:01 | this is an example from |
---|
0:08:06 | the main word |
---|
0:08:07 | colour is actually |
---|
0:08:10 | you interaction with the child |
---|
0:08:12 | the sort of a semi structured interaction following a particular diagnostic for diagnostic test |
---|
0:08:23 | so that is engaged |
---|
0:08:28 | one |
---|
0:08:28 | one |
---|
0:08:29 | trying to figure out |
---|
0:08:33 | things |
---|
0:08:34 | or |
---|
0:08:35 | everything |
---|
0:08:37 | prosody to |
---|
0:08:39 | sure |
---|
0:08:41 | right |
---|
0:08:42 | you know get price that characterising |
---|
0:08:45 | if you ask |
---|
0:08:47 | six |
---|
0:08:49 | so |
---|
0:08:52 | i |
---|
0:08:53 | i |
---|
0:08:56 | oh right |
---|
0:09:01 | right |
---|
0:09:03 | i |
---|
0:09:06 | right |
---|
0:09:08 | i |
---|
0:09:10 | so you think you should probably observed is that no that child you know there was a clear place more |
---|
0:09:16 | no they could chart of last back or looked at the person's the cost nothing was happening this has just |
---|
0:09:22 | you know doing the task memory so why not sway |
---|
0:09:25 | and X I causes rate so these things sort of on a very i'll talk a little later on some |
---|
0:09:32 | scales that we've been developed in the I D S M |
---|
0:09:35 | or just want to confide |
---|
0:09:37 | and it all these are some of the things that are happening as you can see right very observation based |
---|
0:09:41 | but where people are looking at multimodal cues and trying to so vendor sentiment be |
---|
0:09:48 | so when you look at these human behavior signals write the kind of pro why |
---|
0:09:53 | a window into these high-level processes like you know i'll be you know what's it depends on how big or |
---|
0:09:58 | small the window is |
---|
0:09:59 | some or all working observable like this vocal and facial expressions and body posture others are covered you know we |
---|
0:10:06 | don't have access to them non the less intelligent special cases |
---|
0:10:10 | things like heart rate can lead to the remote response or even brain activity and from a single one of |
---|
0:10:16 | you know in this kind of information besides and you know different time scales to these different Q |
---|
0:10:22 | but you know the ability to process and you know sort of interpret decode these signals so can provide us |
---|
0:10:28 | some insights and understanding mind body relations |
---|
0:10:31 | but also more importantly no these how people process other people's behaviour patterns no that's a fine distinction bode plot |
---|
0:10:40 | are generated a processes but also hoping something process and |
---|
0:10:45 | and don't the measurements and quantification of these kinds of human behaviour both from the production perception respect is a |
---|
0:10:51 | fairly challenging problem i believe |
---|
0:10:55 | so here's my operational definition for what are called he'll signal processing basically traverse the competition methods that try to |
---|
0:11:03 | model human behavioral signals |
---|
0:11:05 | that are manifested in you know either will work and or covert signals |
---|
0:11:09 | i don't process by humans explicitly or implicitly you know |
---|
0:11:13 | and that you know eventually help facilitate no human analysis and decision making you know |
---|
0:11:19 | so |
---|
0:11:20 | the outcome is you know it's informatics which can be useful across domains you know whether to inform diagnostics are |
---|
0:11:26 | they not planned treatments already know a fire up an autonomous system do you know do personalised no teaching age |
---|
0:11:34 | and so on |
---|
0:11:35 | but in all these writers be here on signal processing what tries to do such varying levels face to quantify |
---|
0:11:40 | this human felt sense |
---|
0:11:42 | so and |
---|
0:11:44 | that's kind of that they don't like it's challenging from a very lot different dimensions and i'll try to get |
---|
0:11:50 | at least impress upon you some of those |
---|
0:11:54 | so i think about it right now of course technology's already held and not in this in this domain quite |
---|
0:12:00 | a bit a role in all of this is that relies on the significant foundational advances that have been made |
---|
0:12:05 | and number of the means no but well things that happened and been discussed |
---|
0:12:10 | i know deeply this conference to audio video data station you know a speech recognition understanding what was spoken |
---|
0:12:17 | two things like what they forced to talk about visual activity recognition about you know everything from the little descriptions |
---|
0:12:23 | of you know head pose orientation to |
---|
0:12:26 | complex you know |
---|
0:12:29 | classification of a normal activity |
---|
0:12:31 | to physiological aspect of signal processing |
---|
0:12:34 | but the thing is that the difference is that using these as building blocks no what you wanna do is |
---|
0:12:39 | to try to map it to more abstract domain relevant behaviours and that means no more new or a multimodal |
---|
0:12:46 | model modeling approach |
---|
0:12:48 | oh |
---|
0:12:49 | so people have been started to but work on this already you know in no solving various parts of disposal |
---|
0:12:55 | a right from sensing more people other people been trying to say how do you actually measure human behaviour and |
---|
0:13:01 | sort of ecologically valid be that is not disturbing the process that we're trying to measure |
---|
0:13:06 | from you know instrumenting environments but that no cameras and the microphones and other types of things to actually instrumenting |
---|
0:13:13 | people with sensors by computing that's of techniques |
---|
0:13:16 | in speech a lot you know increasingly people are doing very rich and rich processing a large know what's |
---|
0:13:23 | been said by whom and |
---|
0:13:24 | how |
---|
0:13:25 | i think to computing is you see a lot of papers have been published in this area |
---|
0:13:30 | and also it's neutron so still signal processing about how modeling individual group interaction turn-taking dynamics and non-verbal cue processing |
---|
0:13:39 | and so on so that these are all kind of no essential building blocks for speech |
---|
0:13:44 | so |
---|
0:13:47 | in somewhere you know the ingredients for being able to do this is of course you know people are working |
---|
0:13:52 | in signal processing areas on acquisition how do you acquire these things are you build these types of systems and |
---|
0:13:58 | meaningful way many dimensions might wanna make are you know the kind of behaviour is you want to track |
---|
0:14:04 | might not happen in at sonic no you might wanna do it in no in wild animal in the wild |
---|
0:14:09 | so to speak you know and playgrounds in classrooms at home |
---|
0:14:13 | for example the montana modeling hidden buttons of elderly |
---|
0:14:17 | and also you know body computing and there's lots of interesting signal processing challenges their analysis you know how do |
---|
0:14:23 | you what features kind of tell you more about particular behaviour patterns of interest |
---|
0:14:29 | and how do you do this robustly no questions that we ask your noise are you |
---|
0:14:33 | and more importantly also modeling these behavioural constructs a better decide by this expert |
---|
0:14:40 | oh and provide the capability of you know both descriptive and pretty to you know modeling |
---|
0:14:47 | so this is kind of not easy because |
---|
0:14:51 | one the observations off these that here buttons are you know how large amounts of uncertainty |
---|
0:14:58 | at best partial |
---|
0:14:59 | there's lots of you know there's no didn't mention this talk and the vision computer vision talk about representations know |
---|
0:15:07 | how are you what are the representations that we |
---|
0:15:10 | i have to define |
---|
0:15:11 | to compute these things the first place no you mention experiment where they gave visual scenes and ask people describe |
---|
0:15:18 | right so imagine now if you are psychologist is absorbing a couple interacting that one of the things that you're |
---|
0:15:24 | looking for how they describe the before we even set out to actually man |
---|
0:15:29 | observable cues to be some presentation so |
---|
0:15:32 | that itself is a first class of source problem what kind of presentations be specified |
---|
0:15:36 | and given you know we are talking about human behaviour there's fast model heterogeneity |
---|
0:15:42 | and that basically differences and how people the bu patterns of people over time and across people |
---|
0:15:49 | and variability in how these data are generated and use |
---|
0:15:53 | so |
---|
0:15:54 | what do people do you know that you know each of these domains you look at a i'll show you |
---|
0:16:00 | some examples they have their own specific constructs for example in all and language assessment or you know in a |
---|
0:16:06 | learning situation say literacy |
---|
0:16:08 | when they tried to figure out what kind of you know help but little a child needs that when they're |
---|
0:16:13 | learning to read they're looking at night just to know if a child is making it particular sound or in |
---|
0:16:18 | all of two and they are a number of things come into play in or disfluencies in fact the rate |
---|
0:16:23 | of disfluencies station they play to |
---|
0:16:25 | implicit role when we did some experiments |
---|
0:16:28 | in you know and video should be C D you know how wondering not only are they monitoring physical activity |
---|
0:16:33 | but also you know emotional state and still want to know model a decision making |
---|
0:16:38 | and so on |
---|
0:16:39 | and a lot of common features because after all you know the kinda sensing we have access to are limited |
---|
0:16:46 | now we have an audio microphones be a bit you can write some physiological sense |
---|
0:16:51 | and so the approach tends to be at least at some little the little bit levels him tends to be |
---|
0:16:56 | the same |
---|
0:16:57 | but important part is no to see |
---|
0:17:00 | how exports a human expert and i see signal absorb them and learn and see try to see how we |
---|
0:17:05 | can augment the cape |
---|
0:17:07 | so that's why the kind of i think the hallmarks of the way i look at the cable signal processing |
---|
0:17:12 | is to provide supporting tools that would help the human expert and not supplant in on a total automation of |
---|
0:17:19 | replacing what they're doing and i think that's probably not the most beneficial thing to do |
---|
0:17:24 | so |
---|
0:17:26 | oh pictorially you look at this particular chart you know it this is what happens today you know people levels |
---|
0:17:32 | or |
---|
0:17:35 | but phenomena that they're trying to do no observe say for example child interact with the teacher and they don't |
---|
0:17:41 | get a lot of data to listen to but look at the child and see how confident make some judgements |
---|
0:17:45 | about how the child is reading and provide appropriate scaffolding or you know intervention |
---|
0:17:52 | what you're saying is that perhaps you know signal processing in the machine learning and then all computational tools can |
---|
0:17:58 | come in handy one based on trying to sort of be called what human experts to try to learn a |
---|
0:18:04 | what are the features that you see no either explicitly or implicitly learned that |
---|
0:18:09 | build models that can help with some of these predictive capabilities there's certain things you know there are beyond human |
---|
0:18:15 | processing capabilities for example in a look you know fine pitch dynamics or looking at you know what happened the |
---|
0:18:21 | beginning of the session and of the session some things |
---|
0:18:24 | computer models can do better |
---|
0:18:27 | provide feedback and hopefully not this can kind of reinforce each other nicely and no common of conducted and use |
---|
0:18:34 | it as some informatics so that's kind of the idea here |
---|
0:18:37 | so |
---|
0:18:39 | with that kind of background what i'm gonna do the rest of the talk is to signal quickly don't to |
---|
0:18:43 | some of these building blocks that indeed |
---|
0:18:46 | but mostly focused on couple of examples you know i'm glad shows and i two examples you're one from this |
---|
0:18:51 | marital therapy domain |
---|
0:18:53 | and to know quickly on off to some domain just to show highlights some of the possibilities and challenges that |
---|
0:19:00 | there are |
---|
0:19:02 | so |
---|
0:19:03 | i can't mention is already you know that you know lots of work is happening in multimodal signal acquisition and |
---|
0:19:08 | processing you know everything from smart rooms |
---|
0:19:11 | and only an instrument of space |
---|
0:19:13 | to actually instrumenting people to sense a lot of different things up a sensing the user sensing the environment in |
---|
0:19:20 | which things are happening because context becomes important |
---|
0:19:23 | and you know doing this in a variety of locations |
---|
0:19:27 | from laboratory to actually in classrooms and clicks and played around and so on |
---|
0:19:32 | what are the important things there we learn is that depending on environment no there are lots of constraints that |
---|
0:19:37 | come into play for example when we do our work at the hospital with that you know that would kids |
---|
0:19:41 | with autism there's only restrictions and where we can place cameras where we can put the be yeah the microphones |
---|
0:19:47 | either |
---|
0:19:49 | no it interrupts what's happening there what the psychologist rain weather conditions trying to war it's just the structure for |
---|
0:19:56 | the child because they are not sensitive to certain things and these are distracting and so on |
---|
0:20:01 | so |
---|
0:20:02 | even though no we'd like to capture the three D environment with like ten fifteen cameras is just not possible |
---|
0:20:08 | so we have to work with these kinds of restrictions and hence you know robustness issues personal audio processing and |
---|
0:20:15 | no language processing bu processing you know |
---|
0:20:17 | are real they have to know we can just solve it by better sensing |
---|
0:20:22 | likewise in ascending people we can do a lot of different things but we also have to worry about not |
---|
0:20:26 | the proper you know not only the technological constraints but also the corresponding it second privacy constraints all these things |
---|
0:20:33 | so challenging area |
---|
0:20:40 | i |
---|
0:20:42 | so those are two actors i |
---|
0:20:44 | spoken different part |
---|
0:20:45 | so there we've been collecting all the using actors to study behaviours you know in addition to working with actual |
---|
0:20:51 | you know population |
---|
0:20:53 | because you know we can do certain things and the lab or three a derrick on you know do with |
---|
0:20:57 | data that we collect and hand in hand so this is a more formal motion capture database of dyadic interaction |
---|
0:21:03 | but a lot of different emotional stuff that's been annotated rated and you know you interested look up |
---|
0:21:08 | sure |
---|
0:21:09 | likewise when using actors know that collaborating with the people in your school doing full body sort of interaction dyadic |
---|
0:21:17 | interaction each of these cases right these were the scenarios that we're is chosen rich enough so that it goes |
---|
0:21:24 | from the entire gamut of or not |
---|
0:21:27 | them playing shakespeare and check off to actually doing a broth so rich enough of audio video motion capture data |
---|
0:21:34 | to ask different questions |
---|
0:21:36 | looks like this |
---|
0:21:42 | so this actress |
---|
0:21:45 | so that kind of data is very important in our data acquisition collection that's a point there so the next |
---|
0:21:50 | point is you know like this is like a kind of summarises whatever happens at asr you know people have |
---|
0:21:55 | been working on not only you know doing that in speech works but they're doing |
---|
0:21:59 | number of different things extracting a variety of you no matter features which may help the not the speech understanding |
---|
0:22:05 | problem the dialogue management problem you know speaker id problem all this is important for no doing B S P |
---|
0:22:12 | that's all |
---|
0:22:13 | also that a lot of work on T no emotion recognition again from speech |
---|
0:22:18 | and from other modalities are what important questions there is no how do you to present emotions no do we |
---|
0:22:24 | do categorical representation likeness a happy sad or do more dimension leasing or how oscar this or how to make |
---|
0:22:30 | it is it or you know how that are also the person |
---|
0:22:34 | to actually having profiles more statistical distributions are emotional behaviour |
---|
0:22:41 | actually now people want to continuous tracking of emotional state variation used all sort of ongoing questions in the community |
---|
0:22:48 | and people try to map those representations from multi modality is important there also |
---|
0:22:56 | for example you know we know the interplay between you know visual and local features are pretty well known it's |
---|
0:23:02 | very complex interplay and one could in fact learn things about okay how prosody and head motion related and how |
---|
0:23:09 | they encode |
---|
0:23:10 | for example not only linguistic information but also these para-linguistic information nice place |
---|
0:23:16 | and you know if C number of studies involving and or one says that show that both the complementarity and |
---|
0:23:24 | redundancy in information coding about no emotions in all these modality |
---|
0:23:29 | for example you know you run most emotion recognizer with you know speech and facial expression you can show that |
---|
0:23:35 | would speech others lots of confusion between anger and sort of happiness |
---|
0:23:40 | but you know if you use face not that goes away you put together of course like any multimodal experiment |
---|
0:23:45 | reagan sure boost in performance but the point i again here is like when you're trying to model these abstract |
---|
0:23:51 | types of behaviours |
---|
0:23:53 | a more the information that kind of encodes these types of constructs a you can have a handle on the |
---|
0:24:01 | better it is for your competition model |
---|
0:24:05 | so going back to that example i show those kids being uncertain not sure enough not to add things like |
---|
0:24:10 | you know measure lexical nonverbal vocalisation like that person mm that little boy said no was hesitating you kind of |
---|
0:24:18 | detect an model those and you know with the visual cues of you know hand and head motion you can |
---|
0:24:25 | surely come fairly close to human agreement about not is the style certain or not in context so it's gonna |
---|
0:24:32 | integrating that you can do things of the sort |
---|
0:24:36 | in fact in many real-life situations that of course no interactions are based on |
---|
0:24:42 | other people there who is and who you're interacting with so the idea is if you model you know humans |
---|
0:24:48 | that are there the mutual influence between say a dyadic two people interacting no more you can do better in |
---|
0:24:54 | predicting what would come next so for example in the dyadic interaction were we can model both these yeah |
---|
0:25:02 | people that are in what's it has been why as sort of a data base unit |
---|
0:25:06 | and you can show that by doing that right hand X cross dependencies between these people not only what they |
---|
0:25:12 | did for but also what the other person did before you can pretty the upcoming state slightly better so this |
---|
0:25:18 | this type of things can be done with the existing missionary you know with a number of different things |
---|
0:25:23 | so |
---|
0:25:24 | what would that kind of broad very high level overview of you know some of the computational things that are |
---|
0:25:29 | happening in our field |
---|
0:25:30 | so now we can answer to not a goal what i'm asking you know seen |
---|
0:25:34 | how can these types of things be applied in two problems that people are asking in these various domains they're |
---|
0:25:41 | doing this without as you know messing with that those fields right no matter to their peers that's been going |
---|
0:25:46 | on for decades it all they want to predict things like based on sort of how long will the |
---|
0:25:51 | matters last or can that be amended to questions |
---|
0:25:54 | so you know we come there and say well we have some computational ideas and i can be held |
---|
0:26:00 | so that's right |
---|
0:26:02 | so psychology research all depends a lot on observation judgements a you know many times the in fact report these |
---|
0:26:09 | interactions and code to go to |
---|
0:26:13 | very painstaking and careful coding off these behaviours based on you know a good theoretical research frameworks that particular lab |
---|
0:26:23 | might have |
---|
0:26:24 | and they develop a lot of coding standards and so on |
---|
0:26:28 | so |
---|
0:26:31 | yeah i'll show you some examples of |
---|
0:26:37 | earlier |
---|
0:27:00 | i |
---|
0:27:03 | so various couples interacting okay this is actually not real clinical data |
---|
0:27:10 | what i'm gonna talk about that later is actually based on clinical trial data |
---|
0:27:15 | so they create these manual but this man decoding process with which the analyses kind of not very scalable it |
---|
0:27:20 | takes a lot of time and you know and that training coders use integrated that no students in psychology linguistics |
---|
0:27:28 | are recruited not very reliable |
---|
0:27:31 | inter coder reliability is also tough |
---|
0:27:34 | and so we ask you know the very simplistic question of a word can technology help to code these kind |
---|
0:27:39 | audio-visual data these behavioural sort of characterization |
---|
0:27:43 | so and there's a measure is in fact are very difficult for humans to make that can help you know |
---|
0:27:49 | all these |
---|
0:27:49 | measurements of timing and you know even battery station if you do how long a person speaks actually very important |
---|
0:27:55 | in a show later on |
---|
0:27:57 | that tells you quite a bit |
---|
0:27:58 | and we can you know consistently sort of able to quantify some aspects of these at least the low level |
---|
0:28:05 | human behaviour |
---|
0:28:07 | so here's the same kind of chart no here for example we are interested in very couple discussing a problem |
---|
0:28:13 | who wanna know for example you know how |
---|
0:28:17 | a spouse's blaming how much blame is one spouse putting on the other person other spouse |
---|
0:28:22 | two weeks it's not symmetric necessarily |
---|
0:28:25 | so this is what we wanna do so to help with that so we have a big corpus up from |
---|
0:28:31 | one hundred thirty four P this just couples were enrolled in a clinical trial |
---|
0:28:38 | and received couples therapy so we have access to one hundred hours of data or so we not intended for |
---|
0:28:45 | doing these automated processing yeah no transcription and so one it also has video was sought some examples and this |
---|
0:28:54 | is what we start with |
---|
0:28:55 | so |
---|
0:28:56 | and it also has a very nice for us this that it has a explored ratings of these interaction session-level |
---|
0:29:04 | each ten minute you know every couple that a ten minute long problem solving interaction |
---|
0:29:09 | and they could for a number of things number of behavioural patterns that were of interest to researchers in this |
---|
0:29:14 | domain for example |
---|
0:29:16 | one coding global goal was like |
---|
0:29:18 | is the husband showing acceptance so pretty abstract a question and the description that was that corresponds to that process |
---|
0:29:26 | will indicate understanding acceptance apartments use |
---|
0:29:30 | feelings and behaviours listens to the partner with an open mind positive attitude and so on so this is what |
---|
0:29:35 | the court a straight internalise and rated on a scale of one to nine |
---|
0:29:40 | okay |
---|
0:29:41 | so this is the kind of the behaviours we try to see whether we can pretty with that these signal |
---|
0:29:47 | cues right so most we start with the most obvious thing or simplest thing we know how to do |
---|
0:29:52 | so we said well let's focus on a few of those codes besides like you know acceptance blame positive aspect |
---|
0:29:58 | negative aspect sadness |
---|
0:30:00 | and so one each mark for the yeah |
---|
0:30:03 | both husband and the wife |
---|
0:30:04 | and with that the ratings no one through nine is there are no histograms of those that would given by |
---|
0:30:10 | people |
---|
0:30:11 | we said to make it so even simple but simpler for us we said well let's just focus on the |
---|
0:30:16 | top twenty percent and top to bottom twenty percent |
---|
0:30:19 | no separating extremes |
---|
0:30:21 | and you see what can we do this |
---|
0:30:23 | "'kay" |
---|
0:30:24 | from say things that we know to do like measure speech properties no and measure transcribe it and say can |
---|
0:30:32 | towards tell me something |
---|
0:30:33 | and if i know that how successful like be in predicting these codes that the humans can get a was |
---|
0:30:39 | a problem |
---|
0:30:40 | so that's a surcharge it's busy but what it just says is what we most of us here due right |
---|
0:30:46 | we kind of get all your at you know you get rid of things that are hopeless and then we |
---|
0:30:51 | do speech signal processing we measured no be due but or now recall that |
---|
0:30:56 | at |
---|
0:30:56 | and measure things like you know pitch and intensity and peas and mfccs and drive lots of different statistical functionals |
---|
0:31:05 | at the utterance level like different data levels of temporal granularities |
---|
0:31:10 | and throw it into our favourite machine learning a tool |
---|
0:31:15 | and try to predict that the that particular category to be interested in |
---|
0:31:19 | likewise we can also do you know transcription generate lattices and then you can use those discourse specific |
---|
0:31:27 | i know for K for classification |
---|
0:31:29 | "'kay" so that's |
---|
0:31:30 | exactly what we did so here's a transcript of interest i don't example so what it's like you know what |
---|
0:31:37 | exactly what you can spend you know everything there where the money is one of the things that we like |
---|
0:31:41 | this like a |
---|
0:31:42 | think that they're worried about in the fight double |
---|
0:31:45 | another thing is that |
---|
0:31:46 | you'll see that when you look at the results |
---|
0:31:50 | and in fact one of the other important things that the detection of all these non-verbal vocalisations and cues that |
---|
0:31:56 | about their information bearing at least that's what the algorithms dallas |
---|
0:32:00 | so i say mentioned right |
---|
0:32:03 | lot of prosodic features an acoustic features and simple binary classification and you're the results just from a very simple |
---|
0:32:09 | years yeah with the acoustic features right rating you know for many of these constructs like you know blame and |
---|
0:32:16 | you know pasta negative behaviour you know we can do much better than ten |
---|
0:32:21 | that's problem you know these local features and that was very encouraging |
---|
0:32:26 | well there's certain things like not sadness and humour harder to a do just from acoustics and the reasons because |
---|
0:32:33 | no remark on to capture any contextual cues are lexical cues are visual cues or anything at all |
---|
0:32:39 | so then we said well okay now let's throw in a lexical information you look at the transcripts about their |
---|
0:32:44 | a lot of work that scream at you saying hey this guy's really mad at that person you know they're |
---|
0:32:49 | blaming each other for example in this transcript you know we highlight |
---|
0:32:53 | and the kept saying it's aggravating yeah why |
---|
0:32:56 | and so we said well can be automatically it captured these kinds of sailing works from the text |
---|
0:33:02 | and simple again you know we'll language model |
---|
0:33:06 | and you can score you know utterance X against these models to figure out okay which particular conditioned that case |
---|
0:33:15 | this particular i know this these ports correspond to |
---|
0:33:22 | so can do it with no like this is not necessarily just utterances but the interesting thing is you know |
---|
0:33:27 | the kinds of things that a part of these models are very informative you know you've been very simple things |
---|
0:33:32 | like okay in the blame situation you can look at the extremes of the hyperplane work |
---|
0:33:38 | and the little blame words you know that you the second person |
---|
0:33:44 | is actually got correlated with high plane quite a bit in fact very consistent with what psychologists that you know |
---|
0:33:50 | predict hypothesize |
---|
0:33:52 | compared to first person but you also see words like you know teaching |
---|
0:33:56 | because cleaning seems to be a big deal if i |
---|
0:34:01 | comes about living |
---|
0:34:03 | quite a bit so |
---|
0:34:05 | yeah that's right then you know we said well let's a simple thing we do but there that's just not |
---|
0:34:10 | know |
---|
0:34:11 | right a lot of challenges add to this problem domain |
---|
0:34:15 | first of all you know any particular single feature stream is what we provide you just a small window as |
---|
0:34:20 | it pointed out and it's noisy |
---|
0:34:22 | so you know of course we want to do with multimodal E and you know you want also do it |
---|
0:34:26 | in the context sensitive fashion |
---|
0:34:29 | is more important thing is like many of these ratings you know many but domains they do it at the |
---|
0:34:34 | session level they wanna get attached although just of that particular thing |
---|
0:34:37 | but what is not clear is why in that particular unfolding of and that like to the space particular perceptual |
---|
0:34:44 | judgement the people |
---|
0:34:46 | so you want to know what was sailing |
---|
0:34:48 | so we tried doing you know like these are first got it is using sort of multiple instance learning to |
---|
0:34:54 | see whether we can do tree things that are possible |
---|
0:34:58 | then that a point is that no when these ratings are done write it down it's not so that you |
---|
0:35:02 | know in a more typical sort of i categorisation but they got they are posed as many times you know |
---|
0:35:10 | i in a rank order list that is one is |
---|
0:35:13 | you know sort of less than two is less than three tuple or no way into one are known or |
---|
0:35:17 | can be also |
---|
0:35:19 | do what people are trying to integrate this |
---|
0:35:22 | yeah then these are kind of things trying to do more efficiently what people are doing |
---|
0:35:28 | there are things that you know that are more than the felt sense case |
---|
0:35:31 | people hypothesize that you know when that so in track there's some things about you know |
---|
0:35:37 | synchrony in their interaction that happens that tells you how flexible that interaction proceeds no |
---|
0:35:43 | so if you are able to quantify the spelling of this aspect of what is called entrainment then that'll be |
---|
0:35:48 | useful you wanna known or can be bold signal models that actually try to do this |
---|
0:35:54 | then another point is when people look at you know these are you know a particular behaviour apartment looking for |
---|
0:36:00 | it |
---|
0:36:01 | different exports even you know train people look at it differently you know and they responded different portions of the |
---|
0:36:07 | data |
---|
0:36:08 | so you wanna know how we can actually capture these data-dependent human a diversity in behaviour brown processing |
---|
0:36:18 | into our models |
---|
0:36:19 | so doing simple plurality or majority voting based you know a mission line techniques might not necessarily work well for |
---|
0:36:27 | these kinds of knots track |
---|
0:36:29 | processing |
---|
0:36:30 | so the first thing is like you know the easiest thing like we had the language and acoustic information to |
---|
0:36:35 | work together you know of course it's gonna do better yeah at least that's all these expressions a rate including |
---|
0:36:41 | ours |
---|
0:36:42 | and our that for one reason was our asr really was bad |
---|
0:36:47 | because we went to new duties that these language models from the couples domain but what was encouraging is that |
---|
0:36:53 | even with like a solar for thirty five percent that what iterate asr the information from the |
---|
0:37:00 | from the language models |
---|
0:37:03 | from the not a lattice is that we generated and acoustic bass tech classifiers no put together provided a fairly |
---|
0:37:11 | decent sort of prediction of these codes and cycles is very excited about that |
---|
0:37:17 | but what we did was to actually make it more multimodal be really need to have information about the nonverbal |
---|
0:37:23 | cues quite a bit so be rigged up or latino rebar couch |
---|
0:37:27 | for the therapy |
---|
0:37:28 | and several microphone arrays and you know |
---|
0:37:33 | synchronise with about ten htk emerson about that well a motion capture camera to provide data of the sort so |
---|
0:37:39 | it's very useful to do more sort of a careful study of human |
---|
0:37:43 | vocal non horrible a behavior interactions |
---|
0:37:46 | so you're data like this |
---|
0:38:05 | oh |
---|
0:38:06 | so goes the conversation so you can do a lot of things yeah since we are collecting data in a |
---|
0:38:10 | week and a rice and you know localise and do things of that sort quite well and |
---|
0:38:15 | so we asked some questions like okay |
---|
0:38:18 | describing approach avoidance behaviour which is very important so we need side of course you about |
---|
0:38:25 | has been coupled interactive this guy was leading back quite a bit and you know effect expresses displeasure ins interact |
---|
0:38:31 | very subtle cues just like this folks that come on C N body language experts right we tried to do |
---|
0:38:36 | this |
---|
0:38:37 | but signal processing |
---|
0:38:40 | so approach avoidance is actually no moving toward or away from events or objects |
---|
0:38:46 | and it actually is related doing psychology theory like you know emotion motivation and particularly in the couples domain relationship |
---|
0:38:53 | that commitment |
---|
0:38:55 | so people are very interesting if we can quantify that from using no vocal and no visual cues can be |
---|
0:39:01 | actually predict or model this |
---|
0:39:02 | so that was a problem that we took on we said okay we can post disaster no we had psychologists |
---|
0:39:09 | rate this an ordinal scale want to know minus for two for a scale of nine |
---|
0:39:14 | and we pose this as sort of an ordinal regression problem basically broke it down a series of sort of |
---|
0:39:20 | binary classifiers one was the other one and two was the other and then we'll put the large a logistic |
---|
0:39:27 | regression model on top of that |
---|
0:39:29 | with these multimodal features both in all acoustic and visual features |
---|
0:39:34 | so computer vision was stuff so we just took the motion capture data in slow but actual video data |
---|
0:39:41 | things like we could get very clear you know my measurements of in a head body orientation you know the |
---|
0:39:47 | folding arms are how the how much they're leaning and so on so at least to get an upper bound |
---|
0:39:51 | idea of you know what kind of visual features are important to measure |
---|
0:39:55 | approach avoid |
---|
0:39:56 | and the usual audio features that i don't need to tell you guys about |
---|
0:40:00 | pitch and mfcc and all that stuff |
---|
0:40:03 | so interestingly no we showed that actually this or no formulation this that's published by a vector and one other |
---|
0:40:10 | students matlab |
---|
0:40:12 | i guess |
---|
0:40:13 | that i would not formulation was actually very helpful and stuff just formulating is the plano sort of classification problem |
---|
0:40:21 | and the charge you're sure actually the difference between using on all the lips svm with of just a plain |
---|
0:40:26 | all svm |
---|
0:40:27 | and lighters better be means that just the difference in the error rates so with audio video labels it's actually |
---|
0:40:35 | better so |
---|
0:40:36 | but again multimodal in all of this again say preaching to the point type of thing it's important but we |
---|
0:40:42 | can actually use these audiovisual cues to measure something like this |
---|
0:40:47 | what psychologist perceive as approach avoidance behaviour that wasn't great |
---|
0:40:53 | so the point so far is that you know a multimodal approach this important |
---|
0:40:58 | the next sort of a computational thing i wanna share is this whole notion of okay they often make these |
---|
0:41:04 | sort of just all the judgements on data and you wanna know what like to it or from |
---|
0:41:12 | pure learning point of view |
---|
0:41:14 | how to make it more or less that is how do you choose sample the data says that you can |
---|
0:41:18 | maximise the i-th here's you can post |
---|
0:41:21 | two different ways |
---|
0:41:22 | so i will show that the little study here |
---|
0:41:26 | so we use that multiple instance learning again using this case study of this behavior interaction of these couples to |
---|
0:41:33 | say well can be i defy speaker turn |
---|
0:41:36 | that yeah that are salient you would normally session-level code so you have a ten minutes long session husband wife |
---|
0:41:42 | note taking turns not talking about what are they talking about and we have |
---|
0:41:47 | for rating so you wanna know which of these torrents would most explain that observed rating okay that's a problem |
---|
0:41:56 | so as usual right you extracting all features from the signals and you want to identify turns that make the |
---|
0:42:04 | difference so we use approach all i know that was density based svm a support for doing this that and |
---|
0:42:11 | my whole problem |
---|
0:42:13 | as follows so |
---|
0:42:14 | very simple idea so you have this whole notion of backstrap pasta bags so hyped lame sessions low blame sessions |
---|
0:42:22 | i acceptance looks at concessions the of data from that |
---|
0:42:25 | so you |
---|
0:42:27 | you create your feature space here so acoustic feature space |
---|
0:42:31 | then you build these that was density and select these local maxima showing that they must be the prototype from |
---|
0:42:36 | your data and then when you're ready to kind of evaluate and incoming session you compute the distance |
---|
0:42:45 | minimum distance to these prototypes and use those |
---|
0:42:48 | as you features rather than all the all the |
---|
0:42:51 | simple idea |
---|
0:42:53 | so the features that you considered or you don't in lexical features here for example i put this table here |
---|
0:42:59 | just to again point out that not only are not |
---|
0:43:04 | no lexical items important but things like no fillers and that nonverbal vocalisation |
---|
0:43:10 | seem to pop up quite a bit by information get again selection so they are important for these kinds of |
---|
0:43:15 | behaviour |
---|
0:43:17 | signal processing stuff |
---|
0:43:18 | and so we had all these different informative |
---|
0:43:22 | features |
---|
0:43:25 | and created feature vector procession patient is to ever since the density |
---|
0:43:29 | and you're some results for the acceptance problem so we could show the one with these in my L select |
---|
0:43:37 | a feature i think is all the features so not cool you know are we |
---|
0:43:41 | this selected features no or |
---|
0:43:45 | sort of meaningful but they also kind of boosted the performance of the wave be interpreted that these are sort |
---|
0:43:50 | of reasonable ways of selecting these |
---|
0:43:53 | sailing consensus that our definition of saliency to discrimination |
---|
0:43:58 | but when we add intonation features for this problem at least for some of these construct it didn't really help |
---|
0:44:03 | another way be added these intonation features as |
---|
0:44:07 | as contours probably doesn't right or maybe they don't bear any information for these became constructs |
---|
0:44:13 | so and that was true for this and the multiple this instance but based learning was true for many of |
---|
0:44:18 | these behavioural descriptions we were looking for and that was increasing |
---|
0:44:23 | but what we haven't done |
---|
0:44:25 | is that you know you have really validate whether these sort of |
---|
0:44:30 | machines a hypothesized instances are in fact something consistent what humans would do ask them to be a |
---|
0:44:39 | if they're salient or not |
---|
0:44:41 | so what things are up but i no interest in doing is how we can actually have do human experiments |
---|
0:44:48 | are underway role to make this part of active learning you want to so machine propose a certain things humans |
---|
0:44:53 | can either correct or not |
---|
0:44:54 | and so on that's interesting stuff |
---|
0:44:56 | and you could throw in or other features also |
---|
0:44:59 | so |
---|
0:45:00 | the next step topic i want to talk about again moving along this line of more getting more abstract this |
---|
0:45:06 | is all modeling of entrainment |
---|
0:45:08 | so entrainment this you know |
---|
0:45:10 | kind of refers to or also called as interaction synchrony this natural naturally occurring in a coordination between and not |
---|
0:45:17 | interested in tracking |
---|
0:45:19 | ads are interacting people like multiple levels and along multiple communication channels |
---|
0:45:24 | so you worked at interspeech this year no julia hirschberg of a fantastic talk on this |
---|
0:45:29 | local lexical entrainment all |
---|
0:45:32 | so and people have been hypothesized in that this is needed for all humans use this touchy the efficiency in |
---|
0:45:37 | communicating and you know and |
---|
0:45:40 | increasing mutual understanding and so on it's been extensively studied ins and psychology psycho linguistic sense |
---|
0:45:47 | so what we want to try to see is that okay you have these kinds of we hear buttons |
---|
0:45:52 | well measurements of these sorts a set of things |
---|
0:45:56 | can be it can it be done and can didn't want these high level sort of behaviour characterization that yeah |
---|
0:46:02 | so |
---|
0:46:04 | but the thing is here you can't really ask human sanity hey are these people in training or not |
---|
0:46:09 | it's very difficult to do particularly notable coli other sort of signal Q based things |
---|
0:46:14 | and also unlike many places where they measure synchrony no they have signals and then you can do all mutual |
---|
0:46:20 | information correlation measure |
---|
0:46:21 | here because the turn-taking structure right really you know things are not aligned in time so we have to think |
---|
0:46:26 | about other clever ways of computing is |
---|
0:46:29 | and of course it's also directional how much i inching towards you not necessarily same as how much you entering |
---|
0:46:34 | toward me so |
---|
0:46:36 | no that's we try to figure out now how to compute how do two people sounded like in the spoken |
---|
0:46:42 | trained case |
---|
0:46:44 | as usual so measure acoustic features well tell you what about it |
---|
0:46:49 | what we have maxed in german that here was to actually a concert at the what we call these pca |
---|
0:46:55 | vocal characteristics space and then that similarity between these spaces for projecting the data onto D space to find some |
---|
0:47:01 | similarity measure that was the basic idea |
---|
0:47:05 | so features are as usual you know a pitch and frequency loudness and spectral features for vocal data at the |
---|
0:47:13 | word level |
---|
0:47:14 | and pca speech is reconstructed board at the level of the turn and the level of the whole session so |
---|
0:47:21 | we have that |
---|
0:47:22 | and then you can calculate very similarity measure |
---|
0:47:25 | this both you're basically doing the pca means you're transforming a |
---|
0:47:30 | to a different coordinate space so these components are not pose by then aligned with smaller so measuring angle "'cause" |
---|
0:47:38 | i know that give you some notion of that some larger metric you can make those components with the varying |
---|
0:47:45 | and you can use that as one kind of similarity metric |
---|
0:47:49 | or you can project these data want to use pca space and calculate like a level |
---|
0:47:56 | in calgary number of different similarity metrics |
---|
0:47:59 | and then you ask questions hey what does this mean |
---|
0:48:02 | so first thing is we thought well as a sanity check you know put for real dialogue basically hopefully there |
---|
0:48:07 | must be some provision that these measures reflect i think it's artificial style |
---|
0:48:11 | so we construct artificial dialogs from these things you know the randomized data from other people and created that |
---|
0:48:17 | and just to just to sanity check to make sure that you know like these measures are |
---|
0:48:22 | separate these things out no it doesn't tell you this entrainment or not but at least tells you know |
---|
0:48:28 | something reflecting real dialogues enough so that was first |
---|
0:48:32 | the second is you know this is what we sort of reflected on the literature in the second in the |
---|
0:48:38 | domain where they feel that in train with this actually said so useful tool to |
---|
0:48:43 | provide flexibility in a rhino this discoupled interactions |
---|
0:48:46 | so a known fact that people think it's a precursor to know the empathy and so on so you wanna |
---|
0:48:51 | say that you know in shame this was more in positive sort of interactions that a negative interaction |
---|
0:48:57 | was so that was sort of indirectly via trying to see these interim measures that you |
---|
0:49:02 | so |
---|
0:49:03 | and encouraging that these measures were able just these interim measures right these similarity measures as features |
---|
0:49:09 | we were able to note that i a statistically significant based distinguish between these by estimating interact |
---|
0:49:16 | i was varies you know increasing so of course immediately want to build a prediction model and that so that |
---|
0:49:22 | you be put these features in a factorial hmm model and try to see how just using these entrainment features |
---|
0:49:29 | nothing else how well can you predict how negative or positive that interaction one |
---|
0:49:36 | so |
---|
0:49:37 | we could do what you know |
---|
0:49:39 | quite better than chance of stance to present gonna such diverse that's pretty in great |
---|
0:49:44 | again |
---|
0:49:46 | here again that open questions all this is just a small look at the what this pretty tough problem in |
---|
0:49:51 | a lot of open questions you know how we can actually show entrainment across modalities you know |
---|
0:49:59 | and how do you actually do this in a very dynamic framework what are other different ways of quantifying this |
---|
0:50:05 | and how the actual evaluated better than just doing this indirectly the lots of very open both theoretical and i |
---|
0:50:11 | know a computation question |
---|
0:50:14 | finally nodded quickly say they know that |
---|
0:50:17 | you know human annotators that's the reference a number of cases |
---|
0:50:21 | and often times we do fusion of various sorts you know whether human classifiers machine classifier |
---|
0:50:26 | and |
---|
0:50:27 | B |
---|
0:50:28 | rely on diversity these classifiers so that they can in creates them you don't get better result |
---|
0:50:35 | so what we wanna know is how actually we can build mathematical models that reflectees i'd ever since people so |
---|
0:50:41 | for example no people of study reliability weighted you know a data to bow classifier models |
---|
0:50:48 | and they show on that |
---|
0:50:50 | better than these just doing simple plurality |
---|
0:50:52 | and i my student card they did some work on and actually modeling this you know and em framework and |
---|
0:50:58 | very encouraging |
---|
0:50:59 | so the point you i wanna know these data using a lot of different things about the wisdom of crowds |
---|
0:51:04 | in you know that wisdom of experts in all these things really i think particularly for modeling abstract things we |
---|
0:51:09 | have to bring |
---|
0:51:11 | explicit models of the evaluators into |
---|
0:51:14 | the |
---|
0:51:16 | that the classification problems to learning problems |
---|
0:51:20 | so |
---|
0:51:21 | so these are just you know some of the challenges that i just mentioned you know while attacking these types |
---|
0:51:26 | of behavior questions as many others but i just want to keep a feel for |
---|
0:51:30 | so what do very quickly you know i know that frank is showing its time thing |
---|
0:51:35 | i wanna share some things about that ought to some feel just a few slides |
---|
0:51:40 | so ought to some as you know it's like you we we've been hearing a lot about in the news |
---|
0:51:44 | lately eight statistics and one in wanting to children were diagnosed and so on so yeah asking what can technology |
---|
0:51:53 | to hear particularly you know people working in speech signal processing and related areas |
---|
0:51:58 | one we can do it all computational techniques and tools to help better understand it all these various you know |
---|
0:52:03 | communication social patterns and children one of the biggest hallmark's |
---|
0:52:07 | is |
---|
0:52:08 | difficulties and social communication pros prosody |
---|
0:52:12 | perhaps a better site defined quantified these kinds of felt since five seconds |
---|
0:52:17 | and the second thing is of course building or interfaces that can elicit increase held specific social communication behaviour |
---|
0:52:24 | also example so it is important to do pursue these kinds of questions so we've been collecting data all child |
---|
0:52:30 | psychologist interaction that will be about |
---|
0:52:33 | at ninety kids today and no transcribed and both audio video data |
---|
0:52:38 | and you can ask questions of various sorts with these types of data |
---|
0:52:43 | in dallas |
---|
0:52:44 | so in these areas interactions in the psychologists and the you know interactive child a rate that the child along |
---|
0:52:50 | number of dimensions you know a or everything about you know showing empathy shared enjoyment the prosody and so on |
---|
0:52:58 | and be looked at very simple measures of just do would be on these interactions a look how much |
---|
0:53:05 | each spoken by child relative to seconds |
---|
0:53:08 | tells you what the codes that are provided very interesting like what you know that thirty three no ratings that |
---|
0:53:14 | cycle is provided for explained by us it yeah by these just simple measure |
---|
0:53:19 | it's very interesting because it's observation based |
---|
0:53:22 | and this can be done sort of you know consistently is that |
---|
0:53:25 | two |
---|
0:53:26 | the other thing is speaking rate so just look at you know normalized on speaking rate that explains other code |
---|
0:53:31 | so |
---|
0:53:32 | even with simple techniques that you have in hand and with the kinds of behaviour conscious people interested you can |
---|
0:53:38 | actually provide tools and support these steps that |
---|
0:53:42 | of course you can also use these dialogue systems and you know interface is the number of colleagues at developing |
---|
0:53:48 | to actually illicit |
---|
0:53:49 | interactions in a very systematic and reproducible way |
---|
0:53:53 | because it's human interacting is no sort of variable because psychologist even though they're doing structured interaction i'm not gonna |
---|
0:53:59 | be the same |
---|
0:54:00 | and we want to see whether childhood in fact interact naturally with these kinds of character |
---|
0:54:06 | and if we built that thing with cslu toolkit was robust it and we're in creating we had a number |
---|
0:54:11 | of different emotional reasoning games storytelling and so on like this no principle |
---|
0:54:17 | oh |
---|
0:54:20 | i |
---|
0:54:20 | oh |
---|
0:54:21 | yeah |
---|
0:54:24 | and so on so that i'll they don't wear is price we have collected data no each child came or |
---|
0:54:28 | four times four hours each of what the of fifty hours of data |
---|
0:54:32 | think it |
---|
0:54:33 | and very encouraging we could actually see that they we extracted as they would contract the parents how the parents |
---|
0:54:39 | interaction change to be a physiological data so a lot of very interesting questions we could do that we can |
---|
0:54:44 | measure speech |
---|
0:54:45 | these parameters language that parameters visual things |
---|
0:54:48 | and that and so a lot of interesting questions to supplement what people are doing otherwise so a number increased |
---|
0:54:54 | by that possible yeah i'll cut the slides there so anyway so it's in some other time what i want |
---|
0:55:00 | someone nice at this point is to show that you know that what i know |
---|
0:55:03 | what i should couple of examples there's like so many open challenges in these domains you know where a community |
---|
0:55:09 | like |
---|
0:55:10 | our skin i doubt contribute everything from you know robust capture and processing of these multimodal signals to actually deriving |
---|
0:55:18 | basic find appropriate representations for computing |
---|
0:55:22 | and you know doing signal processing know what kind of features no feature engineering help that some that are data-driven |
---|
0:55:29 | some that are inspired by human-like processing |
---|
0:55:32 | different modeling schemes mathematically schemes that can bring some quantitative sort of sight to these kinds of |
---|
0:55:37 | very subject to type about human based assessments |
---|
0:55:40 | to actually you know helping and the questions of what data privacy issues |
---|
0:55:46 | so lots of interesting possibility |
---|
0:55:48 | in a latino we've been are forced to work on you know number of different meant to have domains in |
---|
0:55:52 | fact i just touched upon one here and a little bit on the arts and so that's why like |
---|
0:55:57 | blocks |
---|
0:55:58 | but there's lots more one could talk about the here like it's fascinating area |
---|
0:56:03 | so in conclusion |
---|
0:56:05 | you know human behavior can be described no same people interacting or we can |
---|
0:56:11 | two different sets of people can describe the same thing from different perspectives depending on what they want look for |
---|
0:56:17 | so that offers a lot of but channel is an opportunity as far as to the developed you are indeed |
---|
0:56:23 | computational advances you know in sensing processing modeling folly did but i think what's most exciting for me is this |
---|
0:56:30 | opportunity for interdisciplinary sort of a collaborative scholarship |
---|
0:56:34 | here |
---|
0:56:35 | and so in some |
---|
0:56:37 | obviously we have a signal processing you know on the one hand held says do things that people know how |
---|
0:56:43 | to do well perhaps more efficiently consistently |
---|
0:56:46 | but what this tantalising is that you know we can actually provide no new tools and data |
---|
0:56:53 | to offer insights that we haven't had before it's not yet so i think that's a that's exciting part here |
---|
0:56:59 | so i'd like to thank you and all my collaborators as like hundreds of them to help this work with |
---|
0:57:05 | teleported and mice of sponsors |
---|
0:57:08 | so with that i'll can to and i'll show you some funding since it's a holiday season |
---|
0:57:14 | the feedback |
---|
0:57:28 | yeah |
---|
0:57:39 | this was actually if it was wrapper |
---|
0:57:41 | so i convinced him to good don't ask and two |
---|
0:57:43 | right but |
---|
0:57:45 | you can be busting |
---|
0:57:47 | so thank you again |
---|
0:57:56 | yeah thank you very much for this very interesting very lightning talk we have something like four minutes for questions |
---|
0:58:02 | so i would like to open the floor |
---|
0:58:09 | a question for multimodal signal processing a logo like as we know some people for the more formally |
---|
0:58:20 | oh no we use a |
---|
0:58:23 | also market like a the comparable comfortable distance for the communication but different |
---|
0:58:28 | approximates you mean yeah yes and you know in fact that |
---|
0:58:32 | the |
---|
0:58:33 | body language data showed sort of very quickly of these actors doing it so we have a distance measures of |
---|
0:58:39 | both that are estimated from video but also from all body capture |
---|
0:58:44 | be a couple papers nike as to share on this body language business and how that would reflect the don't |
---|
0:58:51 | can tell you something about this that |
---|
0:58:54 | i think the dynamics of interaction and |
---|
0:58:58 | approximates also sort of a feature in now |
---|
0:59:01 | approach avoids |
---|
0:59:02 | as when they're trying to come together or normal way |
---|
0:59:06 | that in fact a little flip actually just a little mowing rushing away from the center of that interaction |
---|
0:59:12 | well that's you and culturally invaded over the important question i think what you're alluding is to what are the |
---|
0:59:17 | cultural sort of underpinnings of these types of features and how to demonstrate |
---|
0:59:22 | even had data from different cultures in these studies except what we have |
---|
0:59:28 | in the syllable taught some have data from kids growing up and let you know families in los angeles los |
---|
0:59:35 | angeles is very multicultural |
---|
0:59:37 | and |
---|
0:59:38 | we have some data but we haven't had enough information to marginalise those effects yet |
---|
0:59:45 | so the only thing we have |
---|
0:59:47 | body language that things are but the actors |
---|
0:59:49 | so far |
---|
0:59:51 | sense |
---|
0:59:54 | do we have another question |
---|
0:59:58 | okay so well i have a question sounds |
---|
1:00:01 | you mentioned very briefly on crowd sourcing so i'm kind of injustice how what's your view on what kind of |
---|
1:00:06 | role crowd sourcing code |
---|
1:00:08 | play here especially works really all a lot of our subjective measurements and so on |
---|
1:00:13 | yeah so we used you know for more obvious things right the things like transcription or judgements of more things |
---|
1:00:21 | that in define better |
---|
1:00:22 | i |
---|
1:00:23 | ask people raping that's easier but what i'm finding a difficult is to define these abstract tasks for ratings from |
---|
1:00:31 | a lot of people |
---|
1:00:33 | you're trying right now to do sarcastic |
---|
1:00:37 | so cast more snark in is enough |
---|
1:00:41 | were you trying to see we can use the wisdom of crowds but at least |
---|
1:00:45 | the biggest challenge is to see how we can |
---|
1:00:47 | partition these cards so that the kids are from people that won't be so we put all these questions one |
---|
1:00:53 | but for behaviour processing the bigger challenges someone all these data are very protected by all kinds of restrictions so |
---|
1:01:01 | we can't farm it out to do crowd sourcing types of things but the actors data we are able to |
---|
1:01:07 | do things so |
---|
1:01:09 | but we still haven't figured out how to do not abstract things because we have in turn make |
---|
1:01:15 | this concept be internalised by the people that annotating |
---|
1:01:19 | so simpler tasks that are in to do more easier i think |
---|
1:01:25 | okay |
---|
1:01:26 | is there any more questions from the floor |
---|
1:01:34 | that was a great thank you |
---|
1:01:36 | so a couple years ago that julia hirschberg gave a really interesting |
---|
1:01:41 | summary overview of what it's being done on detecting nine |
---|
1:01:46 | with obvious applications of course |
---|
1:01:49 | and one of the main conclusions is that in fact |
---|
1:01:54 | with detecting a you can you really need to |
---|
1:01:57 | i know the price is there anyway |
---|
1:01:59 | if you don't so it's still |
---|
1:02:02 | it's a it's a step beyond |
---|
1:02:04 | the earlier question about contradiction |
---|
1:02:07 | and i wondered if you've come across any evidence for this thing with the kind of |
---|
1:02:12 | data you're looking at |
---|
1:02:14 | in the you know in fact this is actually a very important question how we can actually individualised personalised in |
---|
1:02:19 | fact that's one of the i believe that strong points paper we competition |
---|
1:02:25 | as we have enough data actually line particular specific patterns or an individual specific fairly well |
---|
1:02:33 | in fact in on some right that's what actually what people always talk about all this is very heterogeneous right |
---|
1:02:40 | because the symptom all of these very lacrosse children but with the children to they are actually very depending on |
---|
1:02:46 | con |
---|
1:02:47 | but the way that they present themselves are fairly into the specific there are gaps and there are we strains |
---|
1:02:54 | every individual |
---|
1:02:55 | and you can learn that from data fairly well these patterns over time which are not necessarily have to buy |
---|
1:03:01 | these forty five minute set of interactions with that there is you know or a clinician |
---|
1:03:07 | i do believe that |
---|
1:03:09 | that the ability to be able to individual i six models you know that people talk about adaptation of bigram |
---|
1:03:16 | modeling all these things all these techniques actually lenses |
---|
1:03:22 | so |
---|
1:03:23 | culture cultural aspects are you know slightly harder because not because we can try because it's very how to collect |
---|
1:03:30 | data systematic control base so you can see this is because of that and not this and that's the but |
---|
1:03:37 | individual low level models are easy i believe and |
---|
1:03:41 | in fact that's why one of the things we did with these |
---|
1:03:44 | computer character based interaction was to bring the same title word or again because they loved interact with computer characters |
---|
1:03:51 | and have dialogues with these characters |
---|
1:03:53 | and that |
---|
1:03:55 | so we have several hours of data from the same child and you also have them interact and the parents |
---|
1:04:00 | and with the unknown person like sort of randomly persons like also you have these human interaction family run for |
---|
1:04:07 | my personal and human computer interaction |
---|
1:04:09 | you can kind of actually trying to start beginning do a characterises child fairly well would be a real the |
---|
1:04:17 | lexical use their you know what kind of initiative at you know initiatives one because things in that |
---|
1:04:23 | we can we can begin to do even with that simple little speech and entropy ideas we you know we |
---|
1:04:29 | can bring to the table |
---|
1:04:31 | but line and stuff i don't know |
---|
1:04:34 | i but i'm acting on it |
---|
1:04:36 | yeah that's like killing your times people to okay speaker again |
---|
1:04:41 | thanks |
---|