0:00:15 | so my name is a recharging is not there are some in the operation and |
---|
0:00:20 | the today i'm gonna talk about the real data is question answering by a real |
---|
0:00:26 | users for a million samples is consistent first like this |
---|
0:00:32 | so |
---|
0:00:33 | now we are seeing a lot of |
---|
0:00:36 | samples okay because we are talking everyday the these little some people are talking to |
---|
0:00:41 | these characters everyday |
---|
0:00:42 | i criticism microsoft's we know in japan |
---|
0:00:45 | it is very famous people talking to a everyday and we have a like to |
---|
0:00:51 | get a box i image |
---|
0:00:53 | the people can tell to the virtual characters in this us small cost |
---|
0:00:58 | and also we have a |
---|
0:01:00 | more human like |
---|
0:01:01 | catherine you mentions in destiny as in david work |
---|
0:01:06 | so we are having |
---|
0:01:08 | many samples and they have consistent present it is |
---|
0:01:12 | and if we want them to the but double they need to have consistent just |
---|
0:01:16 | like this |
---|
0:01:17 | and to generate consistent responses what follows |
---|
0:01:21 | it's got each of the specific question answer yes |
---|
0:01:26 | like |
---|
0:01:27 | but the creation of that yes is as you know very costly |
---|
0:01:32 | so the motivation behind this work is that |
---|
0:01:35 | we want to efficiently |
---|
0:01:37 | what |
---|
0:01:38 | questions that there's for characters |
---|
0:01:41 | and in this work we particularly news |
---|
0:01:44 | the technique called role-play this question answering |
---|
0:01:47 | as a technique for collecting |
---|
0:01:49 | the |
---|
0:01:50 | questions that s |
---|
0:01:52 | and it before going into the details of this work i'm gonna explaining about what |
---|
0:01:56 | role play this question answering |
---|
0:02:00 | so in well database question answering |
---|
0:02:02 | in the middle we have |
---|
0:02:04 | a famous person |
---|
0:02:06 | and people users talk to this famous person |
---|
0:02:10 | and in this case this is an image and cutting down who is very famous |
---|
0:02:14 | we've got is a |
---|
0:02:15 | and |
---|
0:02:16 | at the back |
---|
0:02:17 | all this and scatter we have a bunch of all players to collectively play the |
---|
0:02:23 | role of the famous plus |
---|
0:02:26 | so if the user this user |
---|
0:02:29 | asks a question to this famous person like what to do you like |
---|
0:02:32 | and this question is broadcast |
---|
0:02:34 | do all the old place |
---|
0:02:36 | and better |
---|
0:02:38 | one of the probably as and so is the question by saying like high tech |
---|
0:02:42 | suites |
---|
0:02:42 | then this answer was like to use a while |
---|
0:02:46 | and |
---|
0:02:48 | this question a second formant |
---|
0:02:51 | a this there can be collected at a question answer for this task to |
---|
0:02:57 | since both players can enjoy playing the role of their favourite character |
---|
0:03:02 | and also the users can ask listen to their favourite character |
---|
0:03:06 | users can get highly motivated to provide questions okay is that this is how it |
---|
0:03:11 | works |
---|
0:03:13 | let the that there are some problems with this architecture |
---|
0:03:17 | so that is |
---|
0:03:18 | only a small scale experiment with paid users was performed |
---|
0:03:23 | to test the concept of the whole database question answering |
---|
0:03:26 | so because not clear if this key would work with okay we've users |
---|
0:03:32 | and also another problem is that the small scale experiment |
---|
0:03:35 | if not you must data |
---|
0:03:37 | to allow data driven methods to work |
---|
0:03:40 | so the applicability of the collected data to the creation of examples |
---|
0:03:45 | but not very fight |
---|
0:03:47 | so to us all these problems in this |
---|
0:03:50 | a to the protein that we buried by |
---|
0:03:53 | effectiveness of role played this question answering is real users |
---|
0:03:58 | six study we focus on two famous characters in japan |
---|
0:04:02 | and |
---|
0:04:03 | you setup we have signs for roleplay discuss something |
---|
0:04:06 | both the people to you know enjoy the class |
---|
0:04:11 | and for the second problem we created samples using the collected data |
---|
0:04:16 | quickly in this way |
---|
0:04:18 | and |
---|
0:04:19 | in this paper we propose a retriever based method |
---|
0:04:22 | and evaluate its performance by subjective evaluation |
---|
0:04:27 | so let me |
---|
0:04:28 | talk about |
---|
0:04:30 | that the data collection by you |
---|
0:04:32 | users |
---|
0:04:34 | so we focus on these two characters |
---|
0:04:37 | who are very concerned about |
---|
0:04:39 | why is not my reason actual present and he's a company c or and |
---|
0:04:45 | he's also youtube a who specialises you like the coverage of t v games |
---|
0:04:50 | and |
---|
0:04:51 | and the characters is a rig it is there is a fictional character is novel |
---|
0:04:56 | and it does is the company this you |
---|
0:04:58 | and head character is often referred to as the and the right |
---|
0:05:01 | according to mitigate here and their exact is mentally unstable and use extreme balance of |
---|
0:05:07 | brutality is an absolute |
---|
0:05:09 | but in most so they are two very distinct |
---|
0:05:12 | different chapters ones |
---|
0:05:13 | actually present |
---|
0:05:15 | male cat to another one that action factor of female part |
---|
0:05:20 | and we set up websites |
---|
0:05:23 | so that people can enjoy the role played this question answering |
---|
0:05:27 | so each task has the channel |
---|
0:05:29 | kind of maybe a kind of channel |
---|
0:05:31 | user channels for the fans on the japanese |
---|
0:05:34 | jamie service you can decode all that |
---|
0:05:36 | this is like are you to |
---|
0:05:38 | and |
---|
0:05:38 | we set up the side |
---|
0:05:40 | on their channels for the subscribers to enjoy role-play based question answering |
---|
0:05:46 | so this is how it's how the image that looks like fall right |
---|
0:05:50 | the people down |
---|
0:05:51 | for questions these are the questions posed questions |
---|
0:05:55 | and these are the given answers by several pages |
---|
0:05:59 | and this is how it looks like full |
---|
0:06:01 | sn |
---|
0:06:02 | you can post questions in the text few and the and this is a |
---|
0:06:07 | is imposed by the user and this is the answer posted by the well |
---|
0:06:13 | so this is how it looks like |
---|
0:06:16 | and we ran this kind of a trial for several model |
---|
0:06:21 | and this is what we get task to a few and shows the statistics of |
---|
0:06:27 | the collected data |
---|
0:06:30 | if you look at the these two |
---|
0:06:32 | number of users who participated and number of a questions okay as we obtain |
---|
0:06:37 | we obtain a have many uses a |
---|
0:06:40 | as you can see play roles of right and is a model three hundred people |
---|
0:06:44 | participated |
---|
0:06:46 | and we over ten thousand questions there's were collected for both |
---|
0:06:50 | that is right and there's |
---|
0:06:53 | and also houses for is a this is this is average |
---|
0:06:57 | words but also that are is that is pronounced as of is it will much |
---|
0:07:01 | longer and contain more wasn't matters |
---|
0:07:04 | so in that is a there was more talkative and my are not as talkative |
---|
0:07:09 | that is |
---|
0:07:11 | just filling their effects present utterance |
---|
0:07:15 | and this slide shows efficiency |
---|
0:07:18 | of the data collection process |
---|
0:07:20 | that this |
---|
0:07:21 | yes table shows |
---|
0:07:23 | how long we took to reach this number of questions up yes |
---|
0:07:28 | so |
---|
0:07:29 | for example |
---|
0:07:30 | to each two thousand |
---|
0:07:32 | there's |
---|
0:07:33 | if the standard a full scale of the seven day from right and about one |
---|
0:07:38 | day for is a and to reach ten thousand pairs |
---|
0:07:42 | it took about three months former i and eighteen days for testing |
---|
0:07:47 | so for both characters it is just about the couple of days to reach two |
---|
0:07:51 | thousand questions appears |
---|
0:07:53 | and what is a we collected |
---|
0:07:55 | ten thousand question answer pairs in just eighteen days i think if it is quite |
---|
0:07:59 | fast |
---|
0:08:00 | and deciding this confirms this chancy a role-play discourse something for the question |
---|
0:08:06 | you "'cause" note that uses doesn't run parry provided a to develop a they just |
---|
0:08:10 | boundary in |
---|
0:08:13 | provide data enjoying contrast |
---|
0:08:17 | and the decisive the quality of data and user satisfaction of the users |
---|
0:08:23 | so this shows |
---|
0:08:25 | this table shows the average score for example downstairs |
---|
0:08:30 | and the maximum score is five and we get very reasonable utterance correctly for the |
---|
0:08:36 | posted classes |
---|
0:08:38 | and for the user satisfaction of the users |
---|
0:08:42 | we had the three items for the questionnaire items usability a website willingness for future |
---|
0:08:48 | use and enjoyment of update and we see that users really enjoyed roleplaying |
---|
0:08:58 | so we have a created about the more than ten k okay sounds okay as |
---|
0:09:02 | in |
---|
0:09:03 | well maybe this question answering and now it's time to create samples using the click |
---|
0:09:08 | data |
---|
0:09:10 | so this is a overview of our proposed method |
---|
0:09:14 | basically we employ a retrieval-based approach that you haven't that question q |
---|
0:09:20 | and |
---|
0:09:20 | your question answer pairs of which leaves from this question answer pairs database that we |
---|
0:09:25 | have collected |
---|
0:09:26 | and if |
---|
0:09:27 | the score of this which ends up the is high |
---|
0:09:30 | in this exactly as or not |
---|
0:09:33 | so |
---|
0:09:35 | with the highest score is but and it's a prime |
---|
0:09:39 | is used as out of this task |
---|
0:09:42 | so for example this has a score of zero point nine and other ones how |
---|
0:09:46 | the scores based on |
---|
0:09:47 | the point nine then this would be selected and a prime the use of the |
---|
0:09:52 | output for this tuple |
---|
0:09:55 | and |
---|
0:09:55 | the important thing to do this |
---|
0:09:58 | how do we collected this goal |
---|
0:10:01 | so for this purpose we have this scoring function |
---|
0:10:04 | it is a weighted sum of six |
---|
0:10:06 | different |
---|
0:10:08 | school |
---|
0:10:09 | so score you types my school central school translation score |
---|
0:10:15 | so a rave transition score and semantic similarity score and these scores are integrated you |
---|
0:10:20 | calculate this overall score for the for each question that |
---|
0:10:25 | a nice |
---|
0:10:27 | describe these scores along by well |
---|
0:10:30 | for the initial sweets course |
---|
0:10:32 | so for the summer school |
---|
0:10:34 | this is what is given by the scene text with you but engine conclusions of |
---|
0:10:38 | asr service this question as a great |
---|
0:10:41 | and reason using with default settings it uses the m twenty five as such |
---|
0:10:47 | and for the question types |
---|
0:10:48 | my school |
---|
0:10:49 | you score is calculated on the basis of case of the question type of to |
---|
0:10:53 | match that of q prime and the number of named entities good prime requested by |
---|
0:10:59 | chris |
---|
0:11:01 | and also susceptible school |
---|
0:11:03 | we first extract centre was and the was mean noun phrases representing topics are extracted |
---|
0:11:09 | from all those q and q prime and if the overlap is score of while |
---|
0:11:13 | it's okay |
---|
0:11:16 | for the other three scores |
---|
0:11:18 | well for this some sessions for use a mural found this model can be a |
---|
0:11:23 | primary cue it is a generative probability of a prime given q at the school |
---|
0:11:29 | the model is proclaimed is in house the point five million question answer pairs and |
---|
0:11:34 | then fine tune is a quick collected questions up yes |
---|
0:11:38 | and for this purpose we use open and m t two |
---|
0:11:42 | and the reverse translation score is very similar to the translation score not be huge |
---|
0:11:48 | even a crime is used |
---|
0:11:49 | at school |
---|
0:11:51 | finally the semantic similarity score |
---|
0:11:55 | first sentence vectors are obtained from both q defined by using the averaged word vectors |
---|
0:12:00 | using welcome back |
---|
0:12:01 | then cosine similarity between two sentences because it's |
---|
0:12:05 | used at the school |
---|
0:12:07 | what do back model is trained from wikipedia articles |
---|
0:12:11 | note that all scores are normalized between zero and one before integrating the schools |
---|
0:12:17 | so it's i shows the overlapping to all the system |
---|
0:12:22 | so user question comes in then this look into document retrieval engine the same achieve |
---|
0:12:28 | this question answer pairs from discussions appears database |
---|
0:12:31 | and top and candidates aretha |
---|
0:12:33 | and for each of the candidate |
---|
0:12:35 | indicate the score |
---|
0:12:37 | by using these modules |
---|
0:12:39 | question-type system action a named entity recognition sent over the extraction module you are translation |
---|
0:12:44 | models |
---|
0:12:45 | and what of a model |
---|
0:12:47 | and we obtain g six |
---|
0:12:49 | scores that i just plain |
---|
0:12:52 | and |
---|
0:12:53 | we get the final ranking of the two it is a the and outputs the |
---|
0:12:57 | top and |
---|
0:12:58 | just the masses and did not use only top one also |
---|
0:13:01 | at the tuples response |
---|
0:13:05 | and |
---|
0:13:06 | because we have only about ten k |
---|
0:13:09 | questions appears in this database is that it can at the coverage of the questions |
---|
0:13:12 | you know you know |
---|
0:13:13 | so we additionally have another database which is an extended question answer pairs |
---|
0:13:18 | created from discussion on sub yes i just explaining but this is |
---|
0:13:24 | so to extend the questions that the as |
---|
0:13:27 | we first |
---|
0:13:29 | focus on this |
---|
0:13:30 | on the full |
---|
0:13:32 | in a in a in one particular questions up |
---|
0:13:35 | and we first that's for a very similar |
---|
0:13:38 | three in a feature space |
---|
0:13:40 | which has a very similar content on the normalized edit distance is below zero point |
---|
0:13:45 | one so they should be very similar on the surface |
---|
0:13:48 | and for this study we use |
---|
0:13:50 | the all that questions |
---|
0:13:54 | to which this was announced |
---|
0:13:56 | and we therefore these questions |
---|
0:14:00 | and |
---|
0:14:00 | a couple these questions is questions and the sounds that |
---|
0:14:04 | and these |
---|
0:14:05 | hubble's i mean do is extended question answer yes that's how we extend its question |
---|
0:14:13 | answer yes into this extended question answer yes |
---|
0:14:17 | and former i |
---|
0:14:19 | we all the thing additional wasn't really on |
---|
0:14:22 | questions that sample is a |
---|
0:14:24 | we obtain |
---|
0:14:26 | about one million additional questions okay yes |
---|
0:14:31 | so by using the proposed method |
---|
0:14:34 | we did an experiment to verify the effectiveness of the proposed method |
---|
0:14:40 | we use twenty six subjects |
---|
0:14:42 | each fold ryan is a |
---|
0:14:44 | and they were recruited from the transcribers data they are very tricky about the quality |
---|
0:14:49 | of the utterance is that they are five of the cactus |
---|
0:14:54 | and the procedure is that each subject evaluated ounces |
---|
0:14:58 | of the five methods for comparison i explained and misses later |
---|
0:15:02 | on a five point likert scale |
---|
0:15:05 | and |
---|
0:15:06 | you use test speakers questions which were the held-out data from the collected questions appears |
---|
0:15:13 | were used as input |
---|
0:15:16 | we have the two evaluation criteria |
---|
0:15:19 | why naturalness |
---|
0:15:21 | not knowing who's taking the answer is appropriate to the input question or not |
---|
0:15:26 | and have an s |
---|
0:15:27 | knowing that i think question is taking there is probably due to input question on |
---|
0:15:35 | so |
---|
0:15:36 | i |
---|
0:15:37 | describe the message for comparison we have five |
---|
0:15:40 | we have two baselines |
---|
0:15:42 | and to propose messes i wonder about |
---|
0:15:46 | as a problem as a baseline while it's called mail |
---|
0:15:51 | and it uses general-purpose three hundred k and crafted we use you can email a |
---|
0:15:57 | show intelligence markup language for response generation |
---|
0:16:01 | and personal pronouns and sentence and expressions of them |
---|
0:16:05 | but i lose to match those of the cast as |
---|
0:16:08 | so as you know this is applied massive amount of |
---|
0:16:11 | a handcrafted rules that we have been developing and we are using that |
---|
0:16:15 | for response generation in this and of set |
---|
0:16:19 | and baseline to this is called c |
---|
0:16:22 | and it is easy the answer to the highest ranking to it |
---|
0:16:26 | which achieved by to see which uses the in twenty five by using the input |
---|
0:16:30 | question other clear |
---|
0:16:32 | and this is the proposed method one it is called prob |
---|
0:16:36 | without you x d be extended database the proposed method without the extended question is |
---|
0:16:42 | like three |
---|
0:16:44 | and i have the all the all the weights in the scoring function a set |
---|
0:16:47 | to one |
---|
0:16:49 | for this proposed method |
---|
0:16:51 | and for the proposed method to it's called prob |
---|
0:16:55 | the proposed method this is the proposed method itself and all the weight us to |
---|
0:16:59 | do well |
---|
0:17:01 | and the upper bound |
---|
0:17:04 | it's called goals and it's a gold responses |
---|
0:17:06 | provide it online user's focus questions |
---|
0:17:10 | then we compare these five |
---|
0:17:13 | and this is shows the results |
---|
0:17:16 | for the five methods for both right and s |
---|
0:17:20 | and as you can see that the proposed method a much better than the baseline |
---|
0:17:25 | all right the proposed messes seeing significantly outperform the baselines |
---|
0:17:30 | and those the problem is that doesn't probably the text and database or not |
---|
0:17:36 | of what is a |
---|
0:17:38 | the proposed method outperforms one of the baselines which is mail |
---|
0:17:42 | and also proposed method is better than problem without extent database all naturalness |
---|
0:17:49 | the weighted by good and this is a |
---|
0:17:51 | at the bounds of the but close getting goals is the |
---|
0:17:55 | gold about data |
---|
0:17:59 | i show you some of the examples that a more interesting so for example this |
---|
0:18:03 | is for right and what you do you |
---|
0:18:06 | for lunch today and then we tend i have it's a compressed by for it |
---|
0:18:10 | is good at the g |
---|
0:18:12 | and it had a very high that's on the school but it does not very |
---|
0:18:16 | much like and so |
---|
0:18:18 | and the proposed method just return running |
---|
0:18:20 | but it was hot but it was that just like himself |
---|
0:18:26 | and via say |
---|
0:18:27 | use of cute with a question and |
---|
0:18:30 | we had the two |
---|
0:18:32 | responses like to thank you very embarrassing thank you from the proposed methods and they |
---|
0:18:36 | are very much higher scores |
---|
0:18:39 | so that mm lose may produce not frequencies |
---|
0:18:42 | but such happens is not necessary you too high |
---|
0:18:46 | and short answers just liked of these ram and thank you |
---|
0:18:50 | can lead to high schoolers showing that the content is utterances |
---|
0:18:53 | it's very important for |
---|
0:18:57 | so to summarize |
---|
0:18:59 | we successfully verify the effectiveness of our previous question answering |
---|
0:19:03 | by using real users |
---|
0:19:05 | and we successfully created samples using the selected questions yes |
---|
0:19:09 | and of future work |
---|
0:19:10 | you want to improve the quality |
---|
0:19:12 | of the proposed method and those so we want to try additional types of characters |
---|
0:19:17 | as targets for local a discussion on |
---|
0:19:20 | actually |
---|
0:19:28 | questions |
---|
0:19:49 | so actually this is a kind of a |
---|
0:19:53 | how they say people can compare different the answers and that's the winds in part |
---|
0:19:58 | of this the system |
---|
0:19:59 | the people can just actually there's a kind of like important here |
---|
0:20:02 | the people can just press this button then |
---|
0:20:04 | the you know you can you can see that this was much better utterances so |
---|
0:20:08 | it was kind of you know it's not a confusion but this kind of into |
---|
0:20:12 | the thing for comparing them |
---|
0:20:31 | yes a they are completely isolated |
---|
0:20:36 | no it was just this amounts to |
---|
0:21:09 | so we just wanted to make sure that |
---|
0:21:12 | we are not cheating so that that's not that the point |
---|
0:21:15 | and we could have done |
---|
0:21:17 | users |
---|
0:21:18 | but in their own questions and then evaluate the response but since we had a |
---|
0:21:23 | dataset we wanted to do kind of us as kind of a class wasn't survey |
---|
0:21:27 | so we can do that so we what how |
---|
0:21:46 | so we |
---|
0:21:48 | you have to be able reading with the this streaming service and that they have |
---|
0:21:53 | the right to be addicted and area |
---|
0:21:56 | so we have the rights to but our website and their fans on it was |
---|
0:22:00 | and we all of the right have been created |
---|
0:22:06 | and the other question |
---|
0:22:10 | okay so let's thank you gaze |
---|