0:00:17 | good morning everyone welcome to date three of us signal and on the like to |
---|
0:00:24 | be here to introduce our third keynote speaker professor helen mapping from chinese university of |
---|
0:00:29 | hong kong the howling gotta phd from mit |
---|
0:00:34 | and she has been professor in a in hong kong chinese university of hong kong |
---|
0:00:41 | for a sometime it's not count the number of years and in addition to what |
---|
0:00:47 | she's done abilities aspects of speech and language processing language learning exact role |
---|
0:00:52 | she is also involved in universal thing should be an associate universe archie's also given |
---|
0:00:57 | presentations the world economic forum and world peace conference on the main i'm so she's |
---|
0:01:04 | is not just doing research but actually trying to get a |
---|
0:01:09 | a the information about speech and language and a help other people so without for |
---|
0:01:16 | the to do that like to introduce professor how nine |
---|
0:01:31 | so thank you very much talent for the kind introduction of the morning ladies and |
---|
0:01:36 | gentlemen i'm really delighted to be here i wish to thank the organizers for the |
---|
0:01:40 | very kind invitation |
---|
0:01:42 | and i've been working as i once the a lot on language learning in recent |
---|
0:01:48 | years but upon receiving the invitation from stick to al |
---|
0:01:51 | i thought of this is a |
---|
0:01:53 | excellent opportunity for me to take stock of what i've been doing |
---|
0:01:58 | rather serendipity |
---|
0:02:00 | on dialogue |
---|
0:02:01 | so i decided to |
---|
0:02:05 | choose this topic the many facets of dialogue for |
---|
0:02:09 | my presentation |
---|
0:02:11 | and in fact |
---|
0:02:14 | the different fact that some going to cover |
---|
0:02:16 | include |
---|
0:02:17 | dialogue in teaching and learning |
---|
0:02:19 | dialogue and e commerce |
---|
0:02:21 | dialogue in cognitive assessment and the first three are more application oriented and then |
---|
0:02:28 | the next to a more research oriented extracting semantic patterns from dialogues |
---|
0:02:32 | and modeling user emotion changes in dialogues |
---|
0:02:38 | so here we go the first one |
---|
0:02:40 | is on |
---|
0:02:42 | dialogue in teaching and learning |
---|
0:02:44 | where |
---|
0:02:45 | this project is |
---|
0:02:46 | about investigating student discussion dialogues and learning outcomes in flip classroom teaching |
---|
0:02:54 | so how is that my phd of it and more so too is |
---|
0:03:00 | the research assistant in our t |
---|
0:03:02 | i don't have three undergraduate student helpers in this project |
---|
0:03:08 | so |
---|
0:03:09 | this project came about because back in twenty twelve |
---|
0:03:13 | that was actually a sweeping change in university education and home call |
---|
0:03:18 | where |
---|
0:03:19 | well the university have to migrate from a three year |
---|
0:03:23 | curriculum to a for your curriculum |
---|
0:03:26 | so what was said then we're admitting |
---|
0:03:28 | students |
---|
0:03:29 | who are one year younger |
---|
0:03:31 | and we have to design a curriculum for first year engineering students which is brought |
---|
0:03:38 | based meeting |
---|
0:03:39 | or engineering students need to |
---|
0:03:42 | take those course this |
---|
0:03:44 | and among these is the engineering a freshman |
---|
0:03:48 | math course |
---|
0:03:49 | and because it's a broad base that mission |
---|
0:03:52 | so we have really because this |
---|
0:03:54 | and after a few years of teaching these big classes |
---|
0:03:58 | we realise that we need to |
---|
0:04:01 | sort of the students better |
---|
0:04:03 | i specially for the each students |
---|
0:04:05 | so we designed a |
---|
0:04:08 | elite freshman amount of course |
---|
0:04:10 | where it has a much more demanding a curriculum and of course students can opt |
---|
0:04:15 | in an opt out of this course |
---|
0:04:18 | it's basically of freshman year engineering math course |
---|
0:04:22 | but we have this elite course and we have a very dedicated a teacher my |
---|
0:04:28 | colleague a professor sit on jackie |
---|
0:04:31 | and he's very creative and innovative and he has been |
---|
0:04:35 | trying out many different |
---|
0:04:39 | ways to teach the elite students |
---|
0:04:41 | and so many different ways to flip it's constant |
---|
0:04:46 | and eventually he's settled upon a |
---|
0:04:50 | a mode where i'm gonna talk about that so in general is you know flip |
---|
0:04:56 | classroom teaching involves having students watch online video lectures before they come into class and |
---|
0:05:03 | then class it's all dedicated to base a cost discussions |
---|
0:05:08 | so students are given |
---|
0:05:10 | in class exercise this and they work in teams |
---|
0:05:14 | and they discuss and in fact survey try to solve these problems and sometimes the |
---|
0:05:20 | team |
---|
0:05:20 | get picked to go up to the front and |
---|
0:05:24 | presents but there there's solution to the their classmates |
---|
0:05:29 | now this is that setting |
---|
0:05:33 | and in fact it's in a computer lab so you have to see computers i |
---|
0:05:36 | think it will be ideal if we have peace a reconfigurable furniture in a classroom |
---|
0:05:41 | but hopefully it will come someday so |
---|
0:05:45 | as i mentioned every week |
---|
0:05:48 | the class |
---|
0:05:49 | time it's |
---|
0:05:50 | spent on |
---|
0:05:52 | peer to peer learning and group discussions and some clips are selected to present their |
---|
0:05:56 | solution |
---|
0:05:57 | so |
---|
0:05:59 | since we to let my students record |
---|
0:06:05 | the student group discussions during class |
---|
0:06:08 | so the dots are where the computer monitors are placed in the room |
---|
0:06:13 | and the red dots are where we put the speech recorders |
---|
0:06:18 | and |
---|
0:06:20 | so you can see the students in groups and we actually get consents from most |
---|
0:06:25 | of the groups |
---|
0:06:26 | except for two |
---|
0:06:27 | which are shown here to record their discussions |
---|
0:06:31 | so technically |
---|
0:06:34 | the contents of an audio file looks like this |
---|
0:06:36 | so the lecture or woodstock the class |
---|
0:06:40 | by addressing the whole class and also of course also close the cost |
---|
0:06:45 | so we have lecture speech |
---|
0:06:47 | at the beginning and at the end |
---|
0:06:49 | and |
---|
0:06:50 | at various |
---|
0:06:51 | points in time |
---|
0:06:53 | in the class |
---|
0:06:54 | sometimes the lecture was speak and sometimes the ta will speak |
---|
0:06:57 | again addressing the whole class |
---|
0:07:01 | and there are times |
---|
0:07:02 | when i still included finishes an exercise and they're invited to go up to the |
---|
0:07:07 | front to present their solution but all the other times are open for the |
---|
0:07:12 | student groups to discuss |
---|
0:07:15 | within the team within the group to try to solve |
---|
0:07:18 | the problem at hand |
---|
0:07:20 | so this is the content of the audio file |
---|
0:07:23 | so it's actually |
---|
0:07:25 | we have two types of speech |
---|
0:07:27 | one which is directed at the whole class |
---|
0:07:30 | and one |
---|
0:07:31 | which is the student group discussions |
---|
0:07:34 | so we devised a methodology to automatic separation |
---|
0:07:38 | between these two types |
---|
0:07:39 | so that we can filter out the we want to be able to filter out |
---|
0:07:44 | the |
---|
0:07:45 | student group discussions speech |
---|
0:07:47 | for further processing and studying here |
---|
0:07:51 | this methodology we will be presenting a interspeech next week |
---|
0:07:55 | now |
---|
0:07:56 | it's actually |
---|
0:07:57 | within that student group discussions |
---|
0:07:59 | we actually segment the speech the audio |
---|
0:08:03 | and this expectation is based on speaker change |
---|
0:08:06 | and also if there's a pause |
---|
0:08:08 | of more than one second duration then we'll segmented and |
---|
0:08:12 | we have a lot of student helpers helping us in terms of transcribing |
---|
0:08:17 | the speech |
---|
0:08:18 | and a typical transcription looks like this |
---|
0:08:23 | so each segment includes |
---|
0:08:25 | the name |
---|
0:08:27 | so for example gets more bits known as and report the call themselves and reburned |
---|
0:08:31 | and here are the |
---|
0:08:32 | i segments in fact that students we teach and we lecture in english but when |
---|
0:08:37 | they are |
---|
0:08:38 | open to discussing among themselves some of them |
---|
0:08:41 | discussed input on parliamentary |
---|
0:08:43 | philip and discussed in |
---|
0:08:44 | in a cantonese |
---|
0:08:46 | so |
---|
0:08:47 | so here the speech is actually in chinese |
---|
0:08:50 | and but i've translated it for presentation here so just to play for you |
---|
0:09:00 | each of these segments in turn |
---|
0:09:02 | so basically the first segment is a speaker a male speaker |
---|
0:09:08 | say it really should be the same and then the females because they know these |
---|
0:09:11 | piece to always exactly the same and so on so i'm gonna play for you |
---|
0:09:14 | what the audio sounds like starting with the first segment |
---|
0:09:21 | so that the first segment seconds segments |
---|
0:09:28 | third segment |
---|
0:09:32 | of segments and the last so very noisy |
---|
0:09:38 | and |
---|
0:09:39 | so what we have been working on is the transcription |
---|
0:09:44 | now |
---|
0:09:45 | the class exercise is generally take one which to solve |
---|
0:09:49 | at each week i three classes |
---|
0:09:51 | and so together the recordings composed a set |
---|
0:09:55 | we have ten groups and over semester where we are able to record over twelve |
---|
0:10:00 | weeks a we end up with a hundred and twenty |
---|
0:10:03 | a weekly group discussions sets which we do not by w g d s |
---|
0:10:07 | i don't speeds |
---|
0:10:09 | fifty two have been transcribed this is from the previous offering |
---|
0:10:13 | well as yours offering of the course |
---|
0:10:15 | and the total a number of hours of the audio is five hundred fifty a |
---|
0:10:19 | worse |
---|
0:10:20 | and the total colours of discussion is about two hundred eighty hours and we've transcribed |
---|
0:10:27 | about a hundred hours |
---|
0:10:29 | so what we do care |
---|
0:10:30 | as the beginning a beginning step |
---|
0:10:33 | it's to look at the weekly group discussions that and try to look at |
---|
0:10:38 | the discussions of the students and see whether it is relevant |
---|
0:10:42 | so the core topic |
---|
0:10:44 | and also whether it and also what level of activity |
---|
0:10:48 | there was in communicative exchange |
---|
0:10:52 | and that we try to conduct analysis to tie with the academic performance |
---|
0:10:57 | of the group in the course |
---|
0:10:59 | so |
---|
0:11:00 | if we look at peace to |
---|
0:11:03 | measures a relevance to the course topic in fact we divide that up into |
---|
0:11:09 | two components |
---|
0:11:10 | the first is the number of matching map terms |
---|
0:11:14 | that's occur in the speech |
---|
0:11:16 | so for example here is |
---|
0:11:18 | it group audio |
---|
0:11:20 | i |
---|
0:11:29 | so basically they if there's a circle that usually use polar coordinates |
---|
0:11:34 | and i've |
---|
0:11:35 | used polar coordinates and then i've used it for integration but the variable y has |
---|
0:11:40 | some problems |
---|
0:11:41 | so that's what he thing |
---|
0:11:42 | and in this |
---|
0:11:43 | segments |
---|
0:11:45 | we actually see the matching map terms based on some textbooks and mapped dictionaries these |
---|
0:11:52 | other resources that we have chosen |
---|
0:11:55 | and so we not take note of those |
---|
0:12:00 | then the next component it's on content similarity and we figured that because the discussion |
---|
0:12:05 | is there is solved and in cost exercise so they should bear similarity that discussions |
---|
0:12:11 | content should have similarity to the in class exercise so to measure that's |
---|
0:12:16 | we trained a |
---|
0:12:17 | what effect model |
---|
0:12:19 | and when we use that |
---|
0:12:21 | to compute a segment vector so far |
---|
0:12:24 | each segment in the discussion |
---|
0:12:26 | we got a segment vector and we also get a document vector |
---|
0:12:30 | from the in class exercise and we measure the cosine similarity |
---|
0:12:33 | so here's an example of the a high similarity segment is on top versus the |
---|
0:12:39 | low similarity segment and the bottom so you can see that's upon first glance the |
---|
0:12:44 | top to segments they are indeed about some math |
---|
0:12:50 | and then that the third one it's which chapter so it's referring to the text |
---|
0:12:56 | probably |
---|
0:12:57 | whereas the low similarity segments are general conversation |
---|
0:13:02 | so that has to do with the relevance of the content we also measure the |
---|
0:13:08 | level of activity in information exchange and for that |
---|
0:13:11 | we |
---|
0:13:13 | counts the number of segments in the inter in the discussion dialogue |
---|
0:13:17 | and also the number of words |
---|
0:13:19 | in the discussion dialogue and we add both |
---|
0:13:22 | chinese characters and english words together |
---|
0:13:26 | so it's actually for a weekly group discussions that we have |
---|
0:13:30 | four features |
---|
0:13:31 | two |
---|
0:13:34 | putting to relevance to the course topic and two for information exchange measures |
---|
0:13:39 | now |
---|
0:13:40 | the next thing we do is to look at |
---|
0:13:43 | be academic performance |
---|
0:13:45 | so the learning outcome |
---|
0:13:46 | that corresponds to each week scores topic |
---|
0:13:49 | it's measured through the relevant question components |
---|
0:13:53 | that's it's present in the way we've sets the midterm paper and the final exam |
---|
0:13:58 | paper |
---|
0:13:59 | so |
---|
0:14:00 | basically we have a score and the final exam count sixty percent |
---|
0:14:04 | the midterm talents forty percent but we have set the questions that's the course content |
---|
0:14:11 | for each week will be present in different components |
---|
0:14:14 | in the midterm and |
---|
0:14:16 | final papers respectively |
---|
0:14:19 | therefore we are able to |
---|
0:14:21 | look at a groups overall performance according to the course content for a particular week |
---|
0:14:29 | so this is the way we did the analysis and here's the |
---|
0:14:33 | quick summary |
---|
0:14:34 | so basically we looked at the high performing groups |
---|
0:14:38 | versus the low performing groups and it's not surprise we can see that's |
---|
0:14:42 | the high performing groups generally have a much higher average proportion of |
---|
0:14:46 | matching map terms in the discussions |
---|
0:14:49 | and also they have higher content similarity so |
---|
0:14:52 | the worth it that use the discussion content it's much more relevant |
---|
0:14:57 | and |
---|
0:14:58 | in terms of communicative exchange activity the high-performing groups have many more |
---|
0:15:04 | total segments exchanged and |
---|
0:15:08 | more words |
---|
0:15:10 | note that the first three measures so these three matching map terms content similarity |
---|
0:15:16 | and number of segments exchanged |
---|
0:15:18 | we did a success significance test and it's significant that the fourth one is at |
---|
0:15:24 | point a weight so but i think it's still relevance and it still important an |
---|
0:15:30 | important feature |
---|
0:15:32 | so what have presented to you is if the first step |
---|
0:15:35 | where we |
---|
0:15:37 | collected the data and we try to investigate to the discussion dialogues in that it |
---|
0:15:41 | flip classroom setting |
---|
0:15:42 | in relation to learning outcomes |
---|
0:15:45 | in terms of for the investigation what |
---|
0:15:48 | our team will like to understand it's how |
---|
0:15:52 | can |
---|
0:15:53 | the student discussion |
---|
0:15:57 | become if and if pair effective platform for peer to peer learning how the dialogue |
---|
0:16:03 | facilitate learning and then hands learning |
---|
0:16:06 | and for more if they're high-performing teams |
---|
0:16:09 | because a very efficient exchange |
---|
0:16:12 | in the dialogues |
---|
0:16:14 | whether |
---|
0:16:14 | we can use that information to inform formation |
---|
0:16:19 | so right now that students would form a group to what the beginning of the |
---|
0:16:22 | semester and they stick with that before the entire semester so |
---|
0:16:26 | where thinking that if there cry performing groups as the results are very effective discussions |
---|
0:16:33 | maybe if we are able to swap the groups around and |
---|
0:16:38 | and |
---|
0:16:39 | not this dialogue exchange the benefits of the dialogue exchange to learning |
---|
0:16:44 | spread that maybe |
---|
0:16:45 | you know rising tide |
---|
0:16:47 | races all boats so maybe you and hands learning for the whole class |
---|
0:16:50 | so that's the direction we'd like to take this investigation |
---|
0:16:55 | so that the first section |
---|
0:16:57 | no i will want to the second section which is on e commerce |
---|
0:17:00 | so that this is actually the ching don't dialogue challenge in the summer of twenty |
---|
0:17:04 | eighteen |
---|
0:17:06 | and i had a summer |
---|
0:17:08 | in turn |
---|
0:17:08 | that year and i ching and is the undergraduate students and so i said well |
---|
0:17:14 | maybe you may be interested in joining the team don't dialogue challenge but you have |
---|
0:17:19 | no background luckily i have also had a part time a postdoctoral fellow duct according |
---|
0:17:25 | to |
---|
0:17:25 | and also doctor a value is a recent graduate from a group i'm he's not |
---|
0:17:30 | working for the startup speech acts limited |
---|
0:17:33 | and in particular i'd like to thank a doctor bones order to show don't go |
---|
0:17:37 | and |
---|
0:17:37 | miss them on track of |
---|
0:17:39 | don't ai for running that's general dialogue challenge from which we've benefited a lot of |
---|
0:17:46 | a special student |
---|
0:17:47 | junior and undergraduate student |
---|
0:17:49 | learning a lot |
---|
0:17:50 | so |
---|
0:17:51 | the goal of this dialogue challenge is to develop a chat part for you commerce |
---|
0:17:55 | customer service |
---|
0:17:56 | using gin don's very large dataset |
---|
0:17:59 | they're giving us |
---|
0:18:00 | they gave us one million chinese customer service conversations sessions |
---|
0:18:04 | what amounts to twenty million conversation utterances or turns |
---|
0:18:07 | this data covers ten after sales topics |
---|
0:18:10 | and their unlabeled and for each of these topics may have for the subtopics so |
---|
0:18:16 | for example in voice modification this topic |
---|
0:18:19 | it can have |
---|
0:18:20 | the subtopics of changing the name |
---|
0:18:22 | changing the in voiced type asking about e invoices extraction |
---|
0:18:27 | and the task it's to do the following we have a context |
---|
0:18:31 | which consists of |
---|
0:18:32 | the two previous conversation on |
---|
0:18:35 | turns |
---|
0:18:35 | so the two |
---|
0:18:36 | so therefore utterances |
---|
0:18:38 | from the two previous turns and the current query |
---|
0:18:41 | from the user or from the customer |
---|
0:18:44 | and the task is to generate a response for this context |
---|
0:18:49 | okay so it's basically a of five utterance group |
---|
0:18:54 | and we need to generate a response |
---|
0:18:57 | and but generally that response from the system is evaluated by experts |
---|
0:19:02 | a human experts to for from customer service |
---|
0:19:07 | so there are two very well known approach is the retrieval-based approach and the gender |
---|
0:19:11 | and racial based approach |
---|
0:19:13 | and we |
---|
0:19:15 | take advantage of the training data with the context and response pairs |
---|
0:19:19 | in building bees |
---|
0:19:20 | so i retrieval-based approaches very standard basically if the tf-idf plus cosine similarity |
---|
0:19:26 | and our generation based approach is also a very standard configuration where we segmented |
---|
0:19:33 | be chinese |
---|
0:19:35 | context |
---|
0:19:36 | the two previous |
---|
0:19:38 | dialogue turns together with the current query |
---|
0:19:40 | with that met that's |
---|
0:19:42 | and then also we segment the response |
---|
0:19:45 | and we feed those data and we model that statistical relation between the context |
---|
0:19:49 | and the response |
---|
0:19:50 | using i think to stick with attention |
---|
0:19:53 | using this model |
---|
0:19:55 | and so that's the training and also be inference phases |
---|
0:19:58 | now |
---|
0:19:59 | lee |
---|
0:20:00 | system that we eventually submitted is a hybrid model |
---|
0:20:04 | based on a |
---|
0:20:05 | very commonly used rescoring framework |
---|
0:20:08 | so what we did words to generate using their retrieval-based approach |
---|
0:20:14 | and that's response alternatives |
---|
0:20:16 | where we chose and to be twenty |
---|
0:20:18 | so that it's |
---|
0:20:19 | that there's enough choice that's but also it won't take too long |
---|
0:20:22 | and |
---|
0:20:23 | and we use the generation based approach to rescore |
---|
0:20:26 | these twenty responses so |
---|
0:20:29 | then i think about that it's be the generation based approach will |
---|
0:20:34 | consider |
---|
0:20:35 | the |
---|
0:20:35 | given context and hand and the chosen response the relationship between those |
---|
0:20:40 | and then we use this |
---|
0:20:42 | we scored |
---|
0:20:45 | the highest scoring response so we rescore it and we're a racket and use and |
---|
0:20:50 | we check whether the highest scoring response has exceeded the threshold and this is arbitrarily |
---|
0:20:56 | trout chosen |
---|
0:20:57 | at points out of five |
---|
0:20:58 | so if it exceeds a threshold then we'll output that response |
---|
0:21:02 | otherwise we think that maybe that this signed that's our which we will base model |
---|
0:21:09 | does not have enough information to choose the right response so we just use the |
---|
0:21:13 | entire i think to seek |
---|
0:21:15 | to generate that a new response |
---|
0:21:17 | and so that the system and we got a technology innovation award for the system |
---|
0:21:22 | so it has been a very fruitful experience especially for my undergraduate students and she |
---|
0:21:27 | decided after this a general dialogue challenge to pursue a phd so she's actually starting |
---|
0:21:33 | her first term as the phd student in our lab now |
---|
0:21:37 | and also we got valuable data resources from the industry doing this summer |
---|
0:21:42 | and i think |
---|
0:21:43 | moving forward we'd like to |
---|
0:21:45 | look into flexible use of context information |
---|
0:21:48 | for different kinds of user inputs ranging from chit chats to one shot information-seeking enquiries |
---|
0:21:54 | followup questions multi intent input et cetera and i think time yesterday i saw a |
---|
0:21:59 | professor of folk owens |
---|
0:22:03 | poster and i think i you have the a very comprehensive decomposition of this problem |
---|
0:22:08 | so that's my second project and now i'm gonna move to the third project which |
---|
0:22:14 | is looking at dialogue in cognitive screening |
---|
0:22:17 | so investigating spoken language model markets in euro psychological dialogues for cognitive screening this is |
---|
0:22:24 | actually a recently funded project is the very big project and we have a frost |
---|
0:22:29 | university t |
---|
0:22:31 | so there's the chinese university team |
---|
0:22:33 | and we also have colleagues from h k u s t and also polytechnic university |
---|
0:22:38 | so |
---|
0:22:39 | but also from chinese university not only do we have engineers we also have |
---|
0:22:44 | linguists |
---|
0:22:45 | psychologists urologist |
---|
0:22:48 | jerry education center and how just on our team so i'm really excited about this |
---|
0:22:52 | team |
---|
0:22:53 | and |
---|
0:22:54 | we have our teaching hospital which is the prince of wales hospital and we also |
---|
0:22:59 | building a new see which k teaching hospital which is a private hospital so i |
---|
0:23:03 | think we're gonna be able to get |
---|
0:23:05 | any |
---|
0:23:06 | subjects to |
---|
0:23:08 | participate in our study |
---|
0:23:10 | so is actually this study focus on focuses on your cooperativeness order |
---|
0:23:17 | so it's and another time for dimension |
---|
0:23:19 | and it is and you know well that's know that the global population is ageing |
---|
0:23:24 | fast and actually hong kong's population is ageing even faster |
---|
0:23:28 | and cd neurocognitive is order |
---|
0:23:31 | it's very prevalent among older at outs |
---|
0:23:34 | it has an insidious onset it's chronic and progressive and there's a general global deterioration |
---|
0:23:40 | and memory |
---|
0:23:41 | communication thinking judgement and either probably to functions |
---|
0:23:44 | and it's the most incapacitated |
---|
0:23:46 | disease |
---|
0:23:48 | now that cd manifests itself in communicative impairments such as uncoordinated articulation like this a |
---|
0:23:55 | trio the subject may |
---|
0:23:57 | news the capability in language use such as an aphasia |
---|
0:24:00 | they may have a reduced vocabulary programmer weakened listening reading and writing |
---|
0:24:05 | and the existing detection methods include brain scans blood tests |
---|
0:24:09 | and face-to-face neural psychological and p assessments which include structured |
---|
0:24:14 | semi-structured and free-form dialogues |
---|
0:24:17 | so if we want dialogue is where the participant is invited to |
---|
0:24:24 | to do a picture description so the given a picture or sometimes the process |
---|
0:24:29 | and asked to describe it |
---|
0:24:31 | now |
---|
0:24:33 | my colleagues in the teaching hot scroll they have been recording |
---|
0:24:38 | actually we we're allowed to record their then you're psychological tasks |
---|
0:24:43 | and that will provide some that provide some initial data for our research so is |
---|
0:24:48 | actually |
---|
0:24:49 | the flow of the conversation includes the mmse |
---|
0:24:53 | the many a mental state examination together with the montreal cognitive assessment a test |
---|
0:24:59 | so it's the combination of both and there's some overlapping component so that's shared |
---|
0:25:05 | and |
---|
0:25:06 | we have about two hundred hours of a conversations between the clinicians and the subjects |
---|
0:25:10 | it's a one on one |
---|
0:25:12 | and euro psychological test |
---|
0:25:15 | now here's an example so we have normal subjects and also others were cognitively impaired |
---|
0:25:22 | and here are some examples of the |
---|
0:25:25 | excerpts of the conversation so this is from a normal subject was ask about the |
---|
0:25:31 | commonality between a training on a bicycle |
---|
0:25:33 | and this is answer |
---|
0:25:36 | and then the condition has size is big and then the subjects that yes to |
---|
0:25:39 | train as long of the bike a smaller is in it and then the pledges |
---|
0:25:43 | that's o |
---|
0:25:44 | okay but what's called between them and the subjects that's both values for transport |
---|
0:25:49 | now for the cognitively impaired subject the |
---|
0:25:53 | the this is more typical and in fact the original |
---|
0:25:57 | dialogue is in tiny so we also translated to into english for presentation here |
---|
0:26:03 | and this is that the dialogue for a cooperative impaired subject so we did not |
---|
0:26:08 | vary preliminary analysis based on about twenty individuals gender balance |
---|
0:26:13 | and we look at than average number of utterances in and p assessment as |
---|
0:26:18 | so you can see |
---|
0:26:19 | that for males |
---|
0:26:21 | so the total number of utterance the total number of utterances drop as we move |
---|
0:26:26 | from the normal to the cognitively impaired |
---|
0:26:28 | and also the same trend for the female |
---|
0:26:31 | and then the cat time that sort of the reaction time |
---|
0:26:34 | there's a general increase small increase |
---|
0:26:37 | going from the normal to the cognitive impaired and this is for the male and |
---|
0:26:41 | this one is for the female |
---|
0:26:42 | also the normal subjects tend to speak faster so they put out more about how |
---|
0:26:48 | your number of average characters per minute and average number of words per minute |
---|
0:26:52 | and |
---|
0:26:55 | so this is very preliminary data |
---|
0:26:58 | and what we're looking at |
---|
0:26:59 | different linguistic features such as |
---|
0:27:04 | parameter quality |
---|
0:27:06 | information density fluency and also acoustic features such as |
---|
0:27:10 | and that it in addition to reaction time duration of pauses hesitations pitch prosody et |
---|
0:27:15 | cetera so will be looking at a whole spectrum of these features |
---|
0:27:19 | and also my student has developed an initial prototype which illustrates how interactive screening may |
---|
0:27:26 | be done |
---|
0:27:27 | and here's the |
---|
0:27:29 | a demonstration video to show you |
---|
0:27:32 | so it's actually it starts with |
---|
0:27:38 | a word recall |
---|
0:27:39 | exercise |
---|
0:27:41 | please listen carefully i and going to state three words that i want you to |
---|
0:27:47 | try to remember and repeat then back to me |
---|
0:27:51 | please repeat the following three words to me |
---|
0:27:55 | c then |
---|
0:27:57 | can |
---|
0:27:58 | radar |
---|
0:28:00 | say a response it'd be |
---|
0:28:05 | well |
---|
0:28:07 | season |
---|
0:28:08 | it should |
---|
0:28:10 | river |
---|
0:28:18 | good |
---|
0:28:20 | please remember that three words that were presented and recall them later on |
---|
0:28:27 | please your best to describe what is happening in the picture about |
---|
0:28:33 | cap on the button below to begin our complete your response |
---|
0:28:42 | i see |
---|
0:28:43 | a family of four |
---|
0:28:46 | or sitting in the living room |
---|
0:28:50 | there is a order |
---|
0:28:53 | monitor |
---|
0:28:55 | carol |
---|
0:28:57 | and the board |
---|
0:28:59 | they are do you do we are we to release |
---|
0:29:06 | i can't really see much clearly i don't know |
---|
0:29:12 | that's |
---|
0:29:14 | good |
---|
0:29:16 | tap on data and that an if you have completed the task |
---|
0:29:20 | tap on the try again that into redid the picture description task |
---|
0:29:31 | please say that three words i asked you to remember earlier in the |
---|
0:29:37 | recall and say that three words to me |
---|
0:29:41 | say a response it'd be |
---|
0:29:47 | season |
---|
0:29:50 | rumour |
---|
0:29:53 | i don't remember the last one |
---|
0:29:56 | summer |
---|
0:29:58 | u denotes the |
---|
0:30:07 | so basically the system tries just or a job |
---|
0:30:11 | the results of everyone several |
---|
0:30:13 | the data |
---|
0:30:14 | and so they're score charts |
---|
0:30:17 | related to for example how many contracts a answers |
---|
0:30:21 | correct responses were given the response time length get the gap time exact role so |
---|
0:30:27 | i need to i need to state clearly that |
---|
0:30:30 | the voice is actually so the voice is based on know that speech is based |
---|
0:30:36 | on |
---|
0:30:37 | real data but it's in chinese |
---|
0:30:39 | so my student |
---|
0:30:42 | translated to english and try to mimic the |
---|
0:30:45 | the pause it and also used as you would think that the subject like to |
---|
0:30:50 | say i think that's it so sort of talk |
---|
0:30:53 | talking to himself |
---|
0:30:54 | so he also mimic that so that is for illustration only |
---|
0:30:58 | are most about data |
---|
0:31:00 | will be in chinese cantonese or maybe |
---|
0:31:02 | mandarin |
---|
0:31:04 | so as a quick summary spoken dialogue offers easy accessibility |
---|
0:31:09 | and high feature |
---|
0:31:11 | resolution i'm talking about even millisecond resolution |
---|
0:31:14 | in terms of reaction time and pause time extractor |
---|
0:31:17 | for cognitive assessment so we want to be able to develop |
---|
0:31:21 | a very speech language and dialogue processing technologies |
---|
0:31:24 | to support holistic assessment of various cognitive functions |
---|
0:31:28 | and domains |
---|
0:31:29 | by combining dialog interaction with other interactions |
---|
0:31:33 | and also we want to further develop this platform as the support of two |
---|
0:31:37 | for cognitive screening |
---|
0:31:40 | so that's the end of the third projects and now i'm gonna move away from |
---|
0:31:45 | the applications oriented facets to a more research oriented facets |
---|
0:31:50 | so the for project is on extracting |
---|
0:31:53 | semantic patterns from user inputs |
---|
0:31:55 | in dialogues and we've been developing a convex probably topic model for that and this |
---|
0:32:01 | work done by a doctor according to myself and my colleague are professor younger |
---|
0:32:06 | so |
---|
0:32:07 | this study actually use it at its two and three |
---|
0:32:11 | and to get about five thousand utterances to support our investigation |
---|
0:32:16 | and that complex probably topic model |
---|
0:32:19 | it's really and unsupervised approach |
---|
0:32:22 | that is applicable to short text |
---|
0:32:24 | and it can help us automatically identify semantic patterns from a dialogue corpus |
---|
0:32:30 | via a geometric technique |
---|
0:32:32 | so as shown here this that with the well-known m eight is |
---|
0:32:37 | examples |
---|
0:32:38 | we can see that semantic pattern of |
---|
0:32:40 | show me flights |
---|
0:32:41 | so this is an intent |
---|
0:32:43 | and also another |
---|
0:32:44 | semantic pattern of going from an origin to a destination and also |
---|
0:32:50 | another |
---|
0:32:50 | semantic pattern on a certain day |
---|
0:32:54 | so we begin the space of m dimensions where if the vocabulary size and each |
---|
0:33:00 | utterance forms in this space i'd point and the coordinates of the points |
---|
0:33:06 | we |
---|
0:33:07 | you close to the sum normalize worked out of that axis |
---|
0:33:11 | so that there are two steps in our approach the first one is to embed |
---|
0:33:15 | the utterances into a low dimensional affine subspace using principal component analysis so it's actually |
---|
0:33:21 | this is a very common technique and the principal components in to capture |
---|
0:33:26 | features that can optimally distinguish points by their semantic differences |
---|
0:33:31 | then we want to the second step where we try to generate a compact |
---|
0:33:36 | compact convex polytope |
---|
0:33:39 | two |
---|
0:33:40 | and close or the and bedded utterance points |
---|
0:33:43 | and this is using |
---|
0:33:44 | the quick whole algorithm |
---|
0:33:46 | so i think illustration |
---|
0:33:50 | this is what we call a normal type |
---|
0:33:54 | convex polytope |
---|
0:33:55 | and all these |
---|
0:33:57 | points are always points so there are the illustrate be utterances in the corpus |
---|
0:34:03 | residing in that space |
---|
0:34:05 | maybe affine subspace |
---|
0:34:07 | and the |
---|
0:34:08 | compact a compact convex polytope the various ease of the pot the polytope |
---|
0:34:14 | each vertex is actually |
---|
0:34:16 | a point from the set of from the collection of utterance points |
---|
0:34:21 | so each vertex |
---|
0:34:23 | also corresponds to an utterance |
---|
0:34:25 | now |
---|
0:34:26 | we can then connect the linguistic aspects |
---|
0:34:29 | of the utterances within the corpus to be geometric aspect of the convex palmtop |
---|
0:34:37 | so it's actually you can think of the utterances in the dialogue corpus they become |
---|
0:34:42 | embedded points in the affine subspace |
---|
0:34:44 | the scope of the corpus |
---|
0:34:47 | it's now and complex by be compact |
---|
0:34:50 | convex polytope |
---|
0:34:51 | that is delineated by the boundaries connecting liver disease |
---|
0:34:55 | and then the semantic patterns of the language of the corpus |
---|
0:34:59 | it's not represented |
---|
0:35:01 | as |
---|
0:35:02 | the vertices |
---|
0:35:03 | of the complex |
---|
0:35:05 | on of the compact convex polytope |
---|
0:35:09 | now |
---|
0:35:09 | because the very sees represents extreme points of the polytope |
---|
0:35:14 | each are displayed can also be formed by a linear combination of the party types |
---|
0:35:18 | for disease |
---|
0:35:20 | so let's look at the a this corpus |
---|
0:35:23 | be a this corpora |
---|
0:35:24 | and as you know and it is we have these intents |
---|
0:35:28 | and we also colour code them here and that we plot the utterances in be |
---|
0:35:33 | a that's training corpora |
---|
0:35:35 | in that space and which shows a two-dimensional space that you can |
---|
0:35:39 | see all the plots on a plane |
---|
0:35:41 | and then we won the quick all algorithm and it came up with this polytope |
---|
0:35:48 | so this is the most compact one |
---|
0:35:51 | and you can see |
---|
0:35:52 | that the most compact |
---|
0:35:54 | a polytope |
---|
0:35:55 | meets |
---|
0:35:56 | twelve or to see so v one v two |
---|
0:35:59 | well the way to be twelve |
---|
0:36:04 | now each word x actually also |
---|
0:36:06 | corresponds to an utterance |
---|
0:36:08 | so you can look at |
---|
0:36:10 | the vertices one |
---|
0:36:11 | tonight they're all |
---|
0:36:13 | dark blue in colour and in fact they all |
---|
0:36:16 | correspond to an address with the intent class think of lights |
---|
0:36:21 | but next |
---|
0:36:22 | is light blue |
---|
0:36:23 | and actually a corresponds to |
---|
0:36:25 | the intents of |
---|
0:36:27 | abbreviation |
---|
0:36:29 | and then vertex eleven is also dark blue so with vertex twelve |
---|
0:36:34 | so this is |
---|
0:36:36 | an illustration |
---|
0:36:37 | of the convex polytope |
---|
0:36:39 | now we can then look at each vertex |
---|
0:36:43 | so we want to view nine they all |
---|
0:36:47 | corresponds one hundred just so you can see |
---|
0:36:49 | you want to v nine |
---|
0:36:51 | so these not be one vertex once a vertex nine over here they're very close |
---|
0:36:55 | together and essentially they are well |
---|
0:36:58 | capturing the semantic pattern |
---|
0:37:00 | of |
---|
0:37:01 | from some origin to some destination and these are all |
---|
0:37:07 | address this with the you labeled intent of flight |
---|
0:37:10 | now vertex twelve it's very close by |
---|
0:37:14 | and |
---|
0:37:15 | but it's twelve itself the constituent utterance its flights to baltimore |
---|
0:37:20 | so just having the destination |
---|
0:37:23 | and |
---|
0:37:24 | we when we also want to look at work text ten and eleven so let's |
---|
0:37:28 | go to the next page |
---|
0:37:29 | no vertex |
---|
0:37:30 | and here in green |
---|
0:37:32 | the other |
---|
0:37:34 | utterances and if you look at the constants one utterances you can see that they're |
---|
0:37:39 | all questions are what is an abbreviation |
---|
0:37:43 | and then vertex alive it so the nearest neighbors of vertex eleven |
---|
0:37:49 | basically all capture show me |
---|
0:37:51 | show me some flights |
---|
0:37:53 | okay so |
---|
0:37:54 | you can see |
---|
0:37:55 | that the versus ease the a generally together with their nearest neighbors capture some car |
---|
0:38:01 | semantic patterns |
---|
0:38:02 | now |
---|
0:38:03 | for the context polytope we don't have any control on the number of er to |
---|
0:38:08 | seize and it's usually unknown until you actually run the algorithm |
---|
0:38:13 | so if you want to |
---|
0:38:15 | control the number of vertices we can use |
---|
0:38:18 | a simplex |
---|
0:38:20 | and here again |
---|
0:38:22 | we want to put plot in two d two dimensions so we chose a simplex |
---|
0:38:26 | with three birdies so if we want to constrain it you |
---|
0:38:30 | three courtesies we can use |
---|
0:38:32 | the sequential quadratic programming algorithm |
---|
0:38:35 | to come up with the minimum volume simplex |
---|
0:38:38 | so just |
---|
0:38:40 | for you to recall |
---|
0:38:42 | this is the normal type convex polytope |
---|
0:38:44 | so you can see |
---|
0:38:45 | it has twelve were to see now we want to |
---|
0:38:49 | control the number of vertices into three is that we want to |
---|
0:38:52 | generate a |
---|
0:38:54 | minute volume simplex and here is the output of the algorithm |
---|
0:38:58 | okay so we can now see |
---|
0:39:00 | we have the |
---|
0:39:01 | minimum volume simplex with the river receives |
---|
0:39:04 | and |
---|
0:39:05 | if you look at this minimum volume simplex vertex one |
---|
0:39:08 | two and three |
---|
0:39:09 | and if you compare with the previous normal type |
---|
0:39:14 | convex polytope so let's look at vertex one of the simplex |
---|
0:39:18 | and it just corresponds to vertex eleven of the normal type polytope |
---|
0:39:23 | and it also happens to coincide with an utterance |
---|
0:39:27 | now if we go to vertex summary of the simplex you can see that there's |
---|
0:39:32 | the light |
---|
0:39:33 | blue |
---|
0:39:34 | dots here and that actually corresponds to |
---|
0:39:37 | for next |
---|
0:39:38 | and |
---|
0:39:38 | of the normal type up until so it's very close by |
---|
0:39:43 | so the vertex |
---|
0:39:44 | three of the simplex is very close to what extent of than normal type probably |
---|
0:39:50 | channel |
---|
0:39:51 | know what about |
---|
0:39:52 | all these policies from one to nine and also verdicts twelve |
---|
0:39:56 | these are all |
---|
0:39:58 | we grouped into |
---|
0:40:00 | into here |
---|
0:40:02 | and we have a little bit by |
---|
0:40:04 | extending vertex to |
---|
0:40:06 | so you can see that is actually that's minimum |
---|
0:40:09 | well in seven flights it's not encompassing all the utterance this week no longer guaranteed |
---|
0:40:14 | that the verdict itself is an utterance points but |
---|
0:40:18 | we have only three policies and the resulting |
---|
0:40:21 | minimum value a minute volume simplex is formed by extrapolating the three lines |
---|
0:40:26 | and joining the previous |
---|
0:40:27 | not more type take bounding convex hull the vertices from that convex hull |
---|
0:40:32 | including v ten |
---|
0:40:34 | we tend to be a lot of n we eleven t v twelve |
---|
0:40:37 | and then v eight and nine in be three lines |
---|
0:40:41 | now |
---|
0:40:42 | we can also look at |
---|
0:40:44 | for this minimum volume simplex for each vertex we can look at it further so |
---|
0:40:49 | for example |
---|
0:40:50 | the first four attacks |
---|
0:40:53 | you can look at feast on |
---|
0:40:54 | nearest neighbors and here is the list of the utterances |
---|
0:40:58 | that corresponds to e point each point |
---|
0:41:01 | in the nearest neighbor group and they all have the pattern of show me |
---|
0:41:06 | some flights from someplace to someplace show me flights so that some a semantic parser |
---|
0:41:11 | now let's look at |
---|
0:41:13 | verdicts two |
---|
0:41:15 | so this is where you can see the patterns are from a and order to |
---|
0:41:20 | a destination |
---|
0:41:21 | for every vertex |
---|
0:41:23 | because it's also residing in |
---|
0:41:25 | the m dimensional space so the |
---|
0:41:29 | coordinates can actually show was what are the top words the strongest words that are |
---|
0:41:32 | most representative of the board chuck's |
---|
0:41:34 | so you can also see |
---|
0:41:36 | the list of ten top words for those verdicts coordinates of each you |
---|
0:41:41 | now let's look at b three |
---|
0:41:44 | the we and its nearest neighbors are shown here and it's mostly |
---|
0:41:48 | about what it's |
---|
0:41:50 | for by an abbreviation |
---|
0:41:51 | okay so the minimum volume simplex actually also shows it allows us to pick |
---|
0:41:57 | the number of vertices what is this we want to use and also shows some |
---|
0:42:01 | of the semantic patterns |
---|
0:42:02 | there are captured |
---|
0:42:04 | and we paid three because we wanna be able to plot it |
---|
0:42:07 | in fact and we can pick any arbitrary number of higher dimensions |
---|
0:42:12 | so |
---|
0:42:13 | we can examine at a higher dimensionality that semantic patterns |
---|
0:42:17 | by analysing the nearest neighbors and also the top words of the verdict sees |
---|
0:42:21 | so for example we ran |
---|
0:42:23 | well one with sixteen dimensions |
---|
0:42:25 | so we end up with seventeen courtesies |
---|
0:42:27 | and i like that |
---|
0:42:28 | first ten here |
---|
0:42:30 | followed by the next |
---|
0:42:31 | seven so seventeen altogether |
---|
0:42:33 | and then here are the top words for each vertex and also the representative nearest |
---|
0:42:38 | neighbor |
---|
0:42:40 | so you can see that |
---|
0:42:42 | for example verdicts full |
---|
0:42:44 | it's cut it's capturing the semantic patterns show me something |
---|
0:42:48 | and number x |
---|
0:42:50 | from someplace to someplace |
---|
0:42:52 | for x |
---|
0:42:52 | eight |
---|
0:42:53 | what does |
---|
0:42:54 | some abbreviation me |
---|
0:42:56 | and verdicts nine |
---|
0:42:58 | asking about ground transportation |
---|
0:43:01 | we also have er to seize one |
---|
0:43:03 | two |
---|
0:43:06 | five which |
---|
0:43:08 | really |
---|
0:43:11 | related to locations |
---|
0:43:12 | and i think |
---|
0:43:13 | that's because the perhaps due to data sparsity |
---|
0:43:17 | and also verdicts the re |
---|
0:43:19 | it's about can i get something i would like something |
---|
0:43:23 | and vortex |
---|
0:43:24 | so then |
---|
0:43:25 | it's really a bunch of |
---|
0:43:27 | frequently occurring words and i guess |
---|
0:43:29 | now if we look at the next set inverted c |
---|
0:43:32 | a vortex |
---|
0:43:33 | thirteen it's |
---|
0:43:35 | about flights from someplace |
---|
0:43:37 | maybe to someplace as well |
---|
0:43:39 | fourteen is what is something |
---|
0:43:41 | sixty s list all |
---|
0:43:43 | something and again verdicts eleven |
---|
0:43:47 | fifteen and seventeen or location names |
---|
0:43:51 | word x twelve |
---|
0:43:53 | is an airline |
---|
0:43:54 | name |
---|
0:43:55 | exactly about either date a date or an airline so i think this is the |
---|
0:43:59 | case where |
---|
0:44:00 | we may have been |
---|
0:44:02 | to address it introducing the subspace dimensions |
---|
0:44:05 | and i think if we have one this |
---|
0:44:08 | same experiment more dimensions hopefully it will |
---|
0:44:11 | separate the day from the airline |
---|
0:44:14 | so basically we're just playing around with this complex probably topic model as an a |
---|
0:44:22 | tool for exploratory data analysis |
---|
0:44:25 | and |
---|
0:44:26 | i like the geometric nature because it helps me interpret the semantic patterns |
---|
0:44:31 | and my hope is to extend this |
---|
0:44:34 | from |
---|
0:44:34 | semantic pattern extraction to tracking dialog states in the future |
---|
0:44:39 | so that section four |
---|
0:44:41 | and now |
---|
0:44:42 | section five |
---|
0:44:44 | i last section which is on |
---|
0:44:46 | affective design |
---|
0:44:47 | for conversational agents |
---|
0:44:49 | modeling user emotion changes in a dialogue |
---|
0:44:51 | this is actually the phd work of monotony |
---|
0:44:54 | of with the students from to enquire university |
---|
0:44:57 | and we also interned |
---|
0:44:59 | in our lab in hong kong for a couple of summers because direct supervisor is |
---|
0:45:05 | professor at your wafting part university |
---|
0:45:07 | and this work it's conducted in their drink wa |
---|
0:45:11 | chinese university joint research center a media sizes technologies and systems |
---|
0:45:15 | which is and schlangen |
---|
0:45:16 | and it just funded by the |
---|
0:45:18 | national |
---|
0:45:19 | natural science foundation of china |
---|
0:45:21 | hong kong research grants council part we search scheme |
---|
0:45:25 | so |
---|
0:45:26 | a long time goal is to impart i |
---|
0:45:29 | sensitivity |
---|
0:45:31 | into conversational agents |
---|
0:45:32 | which is important for user engagement and also for supporting |
---|
0:45:36 | socially intelligence conversations |
---|
0:45:39 | so |
---|
0:45:40 | that's work look at inferring users emotion changes |
---|
0:45:44 | i mean assumption is that emotive state change is related to the user's emotive state |
---|
0:45:50 | in the covariance |
---|
0:45:51 | dialogue turn and also the corresponding system response |
---|
0:45:56 | so the objective is to infer the users emotion states |
---|
0:46:00 | and also be emotive state change |
---|
0:46:02 | which can in the future inform the generation of the system response |
---|
0:46:09 | we use the p at a model pleasure arousal dominance framework for describing |
---|
0:46:14 | emotions in a three dimensional continuous space |
---|
0:46:18 | so pleasure it's more about positive and negative emotions are rows or is about mental |
---|
0:46:24 | alertness and dominance is about more about control |
---|
0:46:28 | so this is a real dialogue which is originally in chinese and again i |
---|
0:46:32 | i have translated into english here for presentation |
---|
0:46:35 | so this is a dialogue between a chat bots and the user |
---|
0:46:39 | and |
---|
0:46:40 | we have |
---|
0:46:42 | annotated the p i d values |
---|
0:46:44 | for each dialogue turn |
---|
0:46:45 | so you can see for example in dialogue turn to |
---|
0:46:50 | the user study broke up with me and the response from the system |
---|
0:46:53 | is let it go you deserve a better one and you see that the from |
---|
0:46:57 | the dialogue turn all the values of p a and the all |
---|
0:47:00 | increase |
---|
0:47:02 | and |
---|
0:47:03 | and then |
---|
0:47:04 | for example in dialogue turn eight |
---|
0:47:07 | that use just said |
---|
0:47:08 | actually |
---|
0:47:10 | and the systems that use get me |
---|
0:47:12 | would seem to amuse the user |
---|
0:47:14 | so and also soft and the dominance |
---|
0:47:16 | the value of the dominance |
---|
0:47:18 | so these are the values that we work within the p d space and this |
---|
0:47:22 | is our approach joe what's inferring emotive state change |
---|
0:47:27 | on the left it's the speech input on the right is the output of emotion |
---|
0:47:31 | recognition |
---|
0:47:32 | and the prediction of emotion stick change |
---|
0:47:35 | now we start by integrating the acoustic and lexical features |
---|
0:47:39 | from the speech import |
---|
0:47:41 | and |
---|
0:47:42 | this is basically i'm multimodal fusion problem |
---|
0:47:45 | and it is achieved by concatenating the features and then applying p |
---|
0:47:50 | multitask learning convolutional |
---|
0:47:52 | fusion auto-encoder |
---|
0:47:54 | so it's go through different layers of convolution and max |
---|
0:47:57 | and |
---|
0:47:58 | and also max pooling |
---|
0:48:01 | and |
---|
0:48:02 | then we also |
---|
0:48:05 | capture the system response as a whole utterance |
---|
0:48:08 | and it is |
---|
0:48:09 | this is because the holistic message is received by the user and the entire message |
---|
0:48:13 | plays a role in influencing the users emotions |
---|
0:48:17 | now the system response co and coding that uses a long short-term memory recurrent auto-encoder |
---|
0:48:23 | and it is trained to map the system response into a sentence level vector |
---|
0:48:27 | representation |
---|
0:48:30 | next the user's input |
---|
0:48:32 | and the system's response are further |
---|
0:48:34 | combined using convolutional fusion |
---|
0:48:37 | and |
---|
0:48:38 | the framework |
---|
0:48:39 | then performs emotion recognition using a stacked hidden layer |
---|
0:48:43 | started only years and the results will be |
---|
0:48:46 | further used for inferring emotive state change |
---|
0:48:49 | and for this we use a multitask learning structured output layer |
---|
0:48:54 | so that the dependency between them emotion state change |
---|
0:48:57 | and the |
---|
0:48:59 | emotion recognition output is captured |
---|
0:49:02 | so in other words the e motive state change its conditioned on the recognise |
---|
0:49:06 | emotion state of the current query |
---|
0:49:10 | now the experimentation is done on i you mocap which is a corpus of very |
---|
0:49:14 | widely used |
---|
0:49:15 | in emotion recognition system |
---|
0:49:17 | and also that so go voice assistant corpus so that so what is its did |
---|
0:49:22 | corpus it has over four million put on what utterances in |
---|
0:49:27 | three domains |
---|
0:49:28 | it is transcribed by an asr engine with five point five percent whatever rates |
---|
0:49:32 | now we actually look at the chat dialogues |
---|
0:49:36 | and |
---|
0:49:36 | there are |
---|
0:49:37 | ninety eight thousand of such conversations between for the forty nine turns but we use |
---|
0:49:43 | a pre-trained |
---|
0:49:45 | you know emotional dnn to filter out the |
---|
0:49:48 | the |
---|
0:49:49 | neutral |
---|
0:49:50 | dialogues |
---|
0:49:51 | a neutral conversations so we ended up with about nine thousand |
---|
0:49:55 | emotive conversations |
---|
0:49:56 | with over fifty two thousand utterances which are selected for labeling |
---|
0:50:01 | so labeling the p a d values |
---|
0:50:03 | and then we run the emotion recognition and also the emotion state change |
---|
0:50:09 | prediction |
---|
0:50:10 | so we use a whole suite of evaluation criteria on but predicted emotive states |
---|
0:50:17 | in p a d values and also the emotive state changes in p d values |
---|
0:50:21 | the unweighted accuracy |
---|
0:50:24 | the mean accuracy of different emotion categories |
---|
0:50:26 | the mean absolute error and also the concordance correlation coefficient |
---|
0:50:31 | now |
---|
0:50:32 | this is a |
---|
0:50:33 | benchmark against other recent work using other methods |
---|
0:50:37 | and for i mocap and also for the so go data sets |
---|
0:50:44 | the proposed approach |
---|
0:50:45 | actually achieves competitive performance |
---|
0:50:48 | in emotion recognition |
---|
0:50:50 | now in emotion |
---|
0:50:52 | change prediction actually |
---|
0:50:54 | our proposed approach achieves a significantly better performance then be other approaches |
---|
0:51:00 | but they're still room for improvement if you compare with |
---|
0:51:03 | a human performance in human annotation |
---|
0:51:07 | so to sum up this is among the first efforts to analyze |
---|
0:51:11 | user input features |
---|
0:51:13 | both acoustical and lexical features |
---|
0:51:15 | together with the system response to understand how the user emotion changes |
---|
0:51:21 | due to the system response and the dialogue |
---|
0:51:24 | and we have achieved competitive performance in impulsive state change prediction |
---|
0:51:29 | and we believe that this is a very important a step |
---|
0:51:33 | to work to what's having socially intelligent virtual assistants |
---|
0:51:38 | with the incorporation of affect sensitivity for human computer interaction |
---|
0:51:44 | so |
---|
0:51:45 | so my talk is in five chunks but this is the overall summary |
---|
0:51:49 | basically |
---|
0:51:51 | when i look back at all these different projects |
---|
0:51:54 | you know with it very |
---|
0:51:57 | tries on the message that |
---|
0:51:58 | much can be gleaned |
---|
0:52:00 | from dialogues |
---|
0:52:01 | to understand many important phenomena including |
---|
0:52:04 | how group discussions may facilitate learning |
---|
0:52:07 | a student would discussions may facilitate learning |
---|
0:52:10 | however the cuffs customer experience can be shaped by chopper responses and also the status |
---|
0:52:15 | of an individual's cognitive health |
---|
0:52:17 | and i guess i'm preaching to the choir here but i really truly believe there's |
---|
0:52:21 | tremendous potential |
---|
0:52:23 | we've only seen |
---|
0:52:24 | the tip of an iceberg |
---|
0:52:25 | and there's tremendous potential with abundant opportunities and a lot research so thank you very |
---|
0:52:30 | much |
---|
0:52:38 | thank you very much do we have questions |
---|
0:52:47 | thank you very much going to us or regarding the topic three cognitive impairment so |
---|
0:52:52 | we also working on that but still |
---|
0:52:55 | so the heavy cognitive impairment of people is easy to detect case of just a |
---|
0:53:01 | small conversation we can identify this guy so going to put compare |
---|
0:53:06 | but i think problem is the mild cognitive impairment and ci voice on a is |
---|
0:53:14 | a very difficult to detect |
---|
0:53:16 | so i think so the final goal of this well maybe how to estimate the |
---|
0:53:22 | degree of cognitive impairment using features so what the sig |
---|
0:53:29 | so thank you very much for the question |
---|
0:53:32 | indeed |
---|
0:53:34 | in our study we will be covering |
---|
0:53:38 | come to the normal adults also what they not call |
---|
0:53:44 | minor in and cd that so the new terminology |
---|
0:53:49 | if |
---|
0:53:49 | my nancy the my small |
---|
0:53:52 | and you will have a disorder |
---|
0:53:54 | and major big |
---|
0:53:56 | you have to disorder |
---|
0:53:58 | and |
---|
0:53:59 | so this is a what are learnt from our colleagues in eulogy so |
---|
0:54:06 | for elderly people we need to be more diligent in engaging them in these |
---|
0:54:14 | a positive assessments "'cause" they're a really exercises and there's subjective fluctuations going from one |
---|
0:54:23 | exercise to another so therefore the more frequent you can |
---|
0:54:28 | take the assessment of better |
---|
0:54:29 | and |
---|
0:54:31 | and the issue is not and axle scoring so the |
---|
0:54:35 | that's obviously it's more the personal level and if there's any sudden changes perhaps more |
---|
0:54:41 | drastic changes |
---|
0:54:43 | in the |
---|
0:54:44 | scoring level of the individual that is off |
---|
0:54:48 | that would be an important |
---|
0:54:50 | sign |
---|
0:54:51 | and |
---|
0:54:53 | and also tracking |
---|
0:54:55 | frequently is important |
---|
0:54:57 | so in the sometimes that are whole minor and cd more mild cognitive impairments harder |
---|
0:55:03 | to detect those and also you have to work |
---|
0:55:08 | again sort of the natural cognitive decline due to ageing and the pathological cognitive decline |
---|
0:55:15 | so it's a it's in a complex problem but nevertheless because |
---|
0:55:21 | dimension is such a big problem and people talk about |
---|
0:55:25 | the dimension is not any of the age and global population |
---|
0:55:30 | and there's not sure |
---|
0:55:31 | so we just have to work very hard on how to do early |
---|
0:55:37 | early detection and intervention thank you for the |
---|
0:55:41 | question |
---|
0:55:46 | thank you for this very nice thought maybe topics really impressive i was wondering especially |
---|
0:55:52 | in relation to the classrooms and to the cognitive screening |
---|
0:55:57 | the moment of understood by your |
---|
0:55:59 | working on transcriptions rate on the basis of transcription of you made any experiments |
---|
0:56:04 | but with this or and if so what was your experience there what's the likelihood |
---|
0:56:10 | of being sufficiently good |
---|
0:56:12 | so the |
---|
0:56:14 | the classroom |
---|
0:56:16 | it is very difficult |
---|
0:56:18 | that's why we have two |
---|
0:56:19 | we have no choice but work on transcriptions |
---|
0:56:22 | but so for |
---|
0:56:24 | the |
---|
0:56:26 | the |
---|
0:56:27 | the way we have recorded these neural psychological tests |
---|
0:56:32 | it's actually between recognition and thus subject |
---|
0:56:35 | so the conditions of i think that they don't want any sense |
---|
0:56:39 | so we just put a phone there |
---|
0:56:41 | and we can send the subject of course |
---|
0:56:43 | and |
---|
0:56:44 | depend on the device some of it we think it's doable |
---|
0:56:48 | but we went to have a response on |
---|
0:56:51 | speaker adaptive training and noise of is the |
---|
0:56:55 | speech processing we |
---|
0:56:56 | we need to fall in the kitchen sink to be able to do |
---|
0:57:00 | well |
---|
0:57:12 | thanks for agree though |
---|
0:57:14 | is |
---|
0:57:15 | on the cognitive assessment from a discourse structure point of view actually i was wondering |
---|
0:57:22 | what sort of processing now you plan to do on those descriptions that they provide |
---|
0:57:27 | apart from you know speech processing and lexical the cohesion any thoughts about in on |
---|
0:57:35 | discourse coherence rhetorical relation |
---|
0:57:39 | among the sentence is that they provide and so on |
---|
0:57:42 | so thank you for that the one of a question we must look at that |
---|
0:57:45 | we must okay that we haven't looked at that yet but is actually i have |
---|
0:57:51 | for her from our you know our colleagues to other clinicians face a coherence in |
---|
0:57:56 | following the |
---|
0:57:59 | discourse of a dialog oftentimes show problems |
---|
0:58:03 | if there's cognitive impairment so that is definitely |
---|
0:58:06 | one aspect that we must |
---|
0:58:09 | and in fact we would welcome any |
---|
0:58:11 | interest the collaborators to look at that together |
---|
0:58:14 | thank you for regression |
---|
0:58:20 | a thanks for the survey instinct to you i'm to consider what to talk about |
---|
0:58:26 | the emotional modeling the pat space move modeling is that just based on speech input |
---|
0:58:32 | was are you also using i also using to analyse things like |
---|
0:58:37 | us a nonverbal as a signals like laughter or sighing little things like that |
---|
0:58:43 | right now we don't have that's it will be wonderful if we can have that |
---|
0:58:46 | those features but right now it's really the speech input so acoustics and lexical input |
---|
0:58:52 | and also the sentence level of the system's response |
---|
0:59:03 | hi a question is about the a section five |
---|
0:59:07 | so you due to prediction task you did emotion recognition and the emotive change prediction |
---|
0:59:13 | so even though these some similar really think there is a subtle but important difference |
---|
0:59:17 | between the two |
---|
0:59:19 | so my question is |
---|
0:59:21 | do you use the same features to do both does do you think there are |
---|
0:59:26 | features that are more important for that you motives the rather than the emotion recognition |
---|
0:59:30 | and |
---|
0:59:32 | what difference have you seen |
---|
0:59:34 | between these two |
---|
0:59:36 | so requested so we think that |
---|
0:59:41 | for the current query |
---|
0:59:42 | based on the current user input we want to be able to |
---|
0:59:46 | understand the motion of the user |
---|
0:59:49 | but if you think about |
---|
0:59:51 | what comes next so depending on how to respond |
---|
0:59:54 | to the user |
---|
0:59:56 | the system response the users emotion change the next |
---|
1:00:00 | input |
---|
1:00:01 | maybe different |
---|
1:00:03 | right so for example |
---|
1:00:05 | in be |
---|
1:00:06 | in the |
---|
1:00:15 | so here this is a subject him talking about a breakup |
---|
1:00:22 | and |
---|
1:00:23 | i first the system tries to |
---|
1:00:26 | comfort the subject and then at some point you know the |
---|
1:00:31 | the country the dialogue goes |
---|
1:00:38 | i in timit assistive so are you real or not how can robot's no you |
---|
1:00:42 | like |
---|
1:00:43 | i know what you like as i do it should be |
---|
1:00:46 | and then |
---|
1:00:46 | the user says something |
---|
1:00:49 | and at this point it sort of like a in this i at this point |
---|
1:00:52 | of the dialogue you can you can respond in various ways but the talk about |
---|
1:00:57 | that all used here |
---|
1:00:58 | and then it seems that |
---|
1:01:02 | a and then the user says you must be real so i think |
---|
1:01:06 | but you most exchanges depend on a system response |
---|
1:01:09 | so if we can |
---|
1:01:11 | model that |
---|
1:01:12 | and the way we've model that is to |
---|
1:01:15 | to |
---|
1:01:17 | mostly task training where a |
---|
1:01:19 | e motion state change |
---|
1:01:22 | it's dependent on the |
---|
1:01:24 | recognize emotion |
---|
1:01:26 | we want to be able to capture this dependency |
---|
1:01:29 | and |
---|
1:01:30 | in |
---|
1:01:31 | and to be able you utilize this stuff |
---|
1:01:34 | dependency is we choose how to |
---|
1:01:37 | in the future choose how to |
---|
1:01:39 | recent on how to generate the system response so that you can hopefully died off |
---|
1:01:44 | dialogue be motioned change in the dialogue |
---|
1:01:47 | in the way you |
---|