0:00:15so my name is a recharging is not there are some in the operation and
0:00:20the today i'm gonna talk about the real data is question answering by a real
0:00:26users for a million samples is consistent first like this
0:00:32so
0:00:33now we are seeing a lot of
0:00:36samples okay because we are talking everyday the these little some people are talking to
0:00:41these characters everyday
0:00:42i criticism microsoft's we know in japan
0:00:45it is very famous people talking to a everyday and we have a like to
0:00:51get a box i image
0:00:53the people can tell to the virtual characters in this us small cost
0:00:58and also we have a
0:01:00more human like
0:01:01catherine you mentions in destiny as in david work
0:01:06so we are having
0:01:08many samples and they have consistent present it is
0:01:12and if we want them to the but double they need to have consistent just
0:01:16like this
0:01:17and to generate consistent responses what follows
0:01:21it's got each of the specific question answer yes
0:01:26like
0:01:27but the creation of that yes is as you know very costly
0:01:32so the motivation behind this work is that
0:01:35we want to efficiently
0:01:37what
0:01:38questions that there's for characters
0:01:41and in this work we particularly news
0:01:44the technique called role-play this question answering
0:01:47as a technique for collecting
0:01:49the
0:01:50questions that s
0:01:52and it before going into the details of this work i'm gonna explaining about what
0:01:56role play this question answering
0:02:00so in well database question answering
0:02:02in the middle we have
0:02:04a famous person
0:02:06and people users talk to this famous person
0:02:10and in this case this is an image and cutting down who is very famous
0:02:14we've got is a
0:02:15and
0:02:16at the back
0:02:17all this and scatter we have a bunch of all players to collectively play the
0:02:23role of the famous plus
0:02:26so if the user this user
0:02:29asks a question to this famous person like what to do you like
0:02:32and this question is broadcast
0:02:34do all the old place
0:02:36and better
0:02:38one of the probably as and so is the question by saying like high tech
0:02:42suites
0:02:42then this answer was like to use a while
0:02:46and
0:02:48this question a second formant
0:02:51a this there can be collected at a question answer for this task to
0:02:57since both players can enjoy playing the role of their favourite character
0:03:02and also the users can ask listen to their favourite character
0:03:06users can get highly motivated to provide questions okay is that this is how it
0:03:11works
0:03:13let the that there are some problems with this architecture
0:03:17so that is
0:03:18only a small scale experiment with paid users was performed
0:03:23to test the concept of the whole database question answering
0:03:26so because not clear if this key would work with okay we've users
0:03:32and also another problem is that the small scale experiment
0:03:35if not you must data
0:03:37to allow data driven methods to work
0:03:40so the applicability of the collected data to the creation of examples
0:03:45but not very fight
0:03:47so to us all these problems in this
0:03:50a to the protein that we buried by
0:03:53effectiveness of role played this question answering is real users
0:03:58six study we focus on two famous characters in japan
0:04:02and
0:04:03you setup we have signs for roleplay discuss something
0:04:06both the people to you know enjoy the class
0:04:11and for the second problem we created samples using the collected data
0:04:16quickly in this way
0:04:18and
0:04:19in this paper we propose a retriever based method
0:04:22and evaluate its performance by subjective evaluation
0:04:27so let me
0:04:28talk about
0:04:30that the data collection by you
0:04:32users
0:04:34so we focus on these two characters
0:04:37who are very concerned about
0:04:39why is not my reason actual present and he's a company c or and
0:04:45he's also youtube a who specialises you like the coverage of t v games
0:04:50and
0:04:51and the characters is a rig it is there is a fictional character is novel
0:04:56and it does is the company this you
0:04:58and head character is often referred to as the and the right
0:05:01according to mitigate here and their exact is mentally unstable and use extreme balance of
0:05:07brutality is an absolute
0:05:09but in most so they are two very distinct
0:05:12different chapters ones
0:05:13actually present
0:05:15male cat to another one that action factor of female part
0:05:20and we set up websites
0:05:23so that people can enjoy the role played this question answering
0:05:27so each task has the channel
0:05:29kind of maybe a kind of channel
0:05:31user channels for the fans on the japanese
0:05:34jamie service you can decode all that
0:05:36this is like are you to
0:05:38and
0:05:38we set up the side
0:05:40on their channels for the subscribers to enjoy role-play based question answering
0:05:46so this is how it's how the image that looks like fall right
0:05:50the people down
0:05:51for questions these are the questions posed questions
0:05:55and these are the given answers by several pages
0:05:59and this is how it looks like full
0:06:01sn
0:06:02you can post questions in the text few and the and this is a
0:06:07is imposed by the user and this is the answer posted by the well
0:06:13so this is how it looks like
0:06:16and we ran this kind of a trial for several model
0:06:21and this is what we get task to a few and shows the statistics of
0:06:27the collected data
0:06:30if you look at the these two
0:06:32number of users who participated and number of a questions okay as we obtain
0:06:37we obtain a have many uses a
0:06:40as you can see play roles of right and is a model three hundred people
0:06:44participated
0:06:46and we over ten thousand questions there's were collected for both
0:06:50that is right and there's
0:06:53and also houses for is a this is this is average
0:06:57words but also that are is that is pronounced as of is it will much
0:07:01longer and contain more wasn't matters
0:07:04so in that is a there was more talkative and my are not as talkative
0:07:09that is
0:07:11just filling their effects present utterance
0:07:15and this slide shows efficiency
0:07:18of the data collection process
0:07:20that this
0:07:21yes table shows
0:07:23how long we took to reach this number of questions up yes
0:07:28so
0:07:29for example
0:07:30to each two thousand
0:07:32there's
0:07:33if the standard a full scale of the seven day from right and about one
0:07:38day for is a and to reach ten thousand pairs
0:07:42it took about three months former i and eighteen days for testing
0:07:47so for both characters it is just about the couple of days to reach two
0:07:51thousand questions appears
0:07:53and what is a we collected
0:07:55ten thousand question answer pairs in just eighteen days i think if it is quite
0:07:59fast
0:08:00and deciding this confirms this chancy a role-play discourse something for the question
0:08:06you "'cause" note that uses doesn't run parry provided a to develop a they just
0:08:10boundary in
0:08:13provide data enjoying contrast
0:08:17and the decisive the quality of data and user satisfaction of the users
0:08:23so this shows
0:08:25this table shows the average score for example downstairs
0:08:30and the maximum score is five and we get very reasonable utterance correctly for the
0:08:36posted classes
0:08:38and for the user satisfaction of the users
0:08:42we had the three items for the questionnaire items usability a website willingness for future
0:08:48use and enjoyment of update and we see that users really enjoyed roleplaying
0:08:58so we have a created about the more than ten k okay sounds okay as
0:09:02in
0:09:03well maybe this question answering and now it's time to create samples using the click
0:09:08data
0:09:10so this is a overview of our proposed method
0:09:14basically we employ a retrieval-based approach that you haven't that question q
0:09:20and
0:09:20your question answer pairs of which leaves from this question answer pairs database that we
0:09:25have collected
0:09:26and if
0:09:27the score of this which ends up the is high
0:09:30in this exactly as or not
0:09:33so
0:09:35with the highest score is but and it's a prime
0:09:39is used as out of this task
0:09:42so for example this has a score of zero point nine and other ones how
0:09:46the scores based on
0:09:47the point nine then this would be selected and a prime the use of the
0:09:52output for this tuple
0:09:55and
0:09:55the important thing to do this
0:09:58how do we collected this goal
0:10:01so for this purpose we have this scoring function
0:10:04it is a weighted sum of six
0:10:06different
0:10:08school
0:10:09so score you types my school central school translation score
0:10:15so a rave transition score and semantic similarity score and these scores are integrated you
0:10:20calculate this overall score for the for each question that
0:10:25a nice
0:10:27describe these scores along by well
0:10:30for the initial sweets course
0:10:32so for the summer school
0:10:34this is what is given by the scene text with you but engine conclusions of
0:10:38asr service this question as a great
0:10:41and reason using with default settings it uses the m twenty five as such
0:10:47and for the question types
0:10:48my school
0:10:49you score is calculated on the basis of case of the question type of to
0:10:53match that of q prime and the number of named entities good prime requested by
0:10:59chris
0:11:01and also susceptible school
0:11:03we first extract centre was and the was mean noun phrases representing topics are extracted
0:11:09from all those q and q prime and if the overlap is score of while
0:11:13it's okay
0:11:16for the other three scores
0:11:18well for this some sessions for use a mural found this model can be a
0:11:23primary cue it is a generative probability of a prime given q at the school
0:11:29the model is proclaimed is in house the point five million question answer pairs and
0:11:34then fine tune is a quick collected questions up yes
0:11:38and for this purpose we use open and m t two
0:11:42and the reverse translation score is very similar to the translation score not be huge
0:11:48even a crime is used
0:11:49at school
0:11:51finally the semantic similarity score
0:11:55first sentence vectors are obtained from both q defined by using the averaged word vectors
0:12:00using welcome back
0:12:01then cosine similarity between two sentences because it's
0:12:05used at the school
0:12:07what do back model is trained from wikipedia articles
0:12:11note that all scores are normalized between zero and one before integrating the schools
0:12:17so it's i shows the overlapping to all the system
0:12:22so user question comes in then this look into document retrieval engine the same achieve
0:12:28this question answer pairs from discussions appears database
0:12:31and top and candidates aretha
0:12:33and for each of the candidate
0:12:35indicate the score
0:12:37by using these modules
0:12:39question-type system action a named entity recognition sent over the extraction module you are translation
0:12:44models
0:12:45and what of a model
0:12:47and we obtain g six
0:12:49scores that i just plain
0:12:52and
0:12:53we get the final ranking of the two it is a the and outputs the
0:12:57top and
0:12:58just the masses and did not use only top one also
0:13:01at the tuples response
0:13:05and
0:13:06because we have only about ten k
0:13:09questions appears in this database is that it can at the coverage of the questions
0:13:12you know you know
0:13:13so we additionally have another database which is an extended question answer pairs
0:13:18created from discussion on sub yes i just explaining but this is
0:13:24so to extend the questions that the as
0:13:27we first
0:13:29focus on this
0:13:30on the full
0:13:32in a in a in one particular questions up
0:13:35and we first that's for a very similar
0:13:38three in a feature space
0:13:40which has a very similar content on the normalized edit distance is below zero point
0:13:45one so they should be very similar on the surface
0:13:48and for this study we use
0:13:50the all that questions
0:13:54to which this was announced
0:13:56and we therefore these questions
0:14:00and
0:14:00a couple these questions is questions and the sounds that
0:14:04and these
0:14:05hubble's i mean do is extended question answer yes that's how we extend its question
0:14:13answer yes into this extended question answer yes
0:14:17and former i
0:14:19we all the thing additional wasn't really on
0:14:22questions that sample is a
0:14:24we obtain
0:14:26about one million additional questions okay yes
0:14:31so by using the proposed method
0:14:34we did an experiment to verify the effectiveness of the proposed method
0:14:40we use twenty six subjects
0:14:42each fold ryan is a
0:14:44and they were recruited from the transcribers data they are very tricky about the quality
0:14:49of the utterance is that they are five of the cactus
0:14:54and the procedure is that each subject evaluated ounces
0:14:58of the five methods for comparison i explained and misses later
0:15:02on a five point likert scale
0:15:05and
0:15:06you use test speakers questions which were the held-out data from the collected questions appears
0:15:13were used as input
0:15:16we have the two evaluation criteria
0:15:19why naturalness
0:15:21not knowing who's taking the answer is appropriate to the input question or not
0:15:26and have an s
0:15:27knowing that i think question is taking there is probably due to input question on
0:15:35so
0:15:36i
0:15:37describe the message for comparison we have five
0:15:40we have two baselines
0:15:42and to propose messes i wonder about
0:15:46as a problem as a baseline while it's called mail
0:15:51and it uses general-purpose three hundred k and crafted we use you can email a
0:15:57show intelligence markup language for response generation
0:16:01and personal pronouns and sentence and expressions of them
0:16:05but i lose to match those of the cast as
0:16:08so as you know this is applied massive amount of
0:16:11a handcrafted rules that we have been developing and we are using that
0:16:15for response generation in this and of set
0:16:19and baseline to this is called c
0:16:22and it is easy the answer to the highest ranking to it
0:16:26which achieved by to see which uses the in twenty five by using the input
0:16:30question other clear
0:16:32and this is the proposed method one it is called prob
0:16:36without you x d be extended database the proposed method without the extended question is
0:16:42like three
0:16:44and i have the all the all the weights in the scoring function a set
0:16:47to one
0:16:49for this proposed method
0:16:51and for the proposed method to it's called prob
0:16:55the proposed method this is the proposed method itself and all the weight us to
0:16:59do well
0:17:01and the upper bound
0:17:04it's called goals and it's a gold responses
0:17:06provide it online user's focus questions
0:17:10then we compare these five
0:17:13and this is shows the results
0:17:16for the five methods for both right and s
0:17:20and as you can see that the proposed method a much better than the baseline
0:17:25all right the proposed messes seeing significantly outperform the baselines
0:17:30and those the problem is that doesn't probably the text and database or not
0:17:36of what is a
0:17:38the proposed method outperforms one of the baselines which is mail
0:17:42and also proposed method is better than problem without extent database all naturalness
0:17:49the weighted by good and this is a
0:17:51at the bounds of the but close getting goals is the
0:17:55gold about data
0:17:59i show you some of the examples that a more interesting so for example this
0:18:03is for right and what you do you
0:18:06for lunch today and then we tend i have it's a compressed by for it
0:18:10is good at the g
0:18:12and it had a very high that's on the school but it does not very
0:18:16much like and so
0:18:18and the proposed method just return running
0:18:20but it was hot but it was that just like himself
0:18:26and via say
0:18:27use of cute with a question and
0:18:30we had the two
0:18:32responses like to thank you very embarrassing thank you from the proposed methods and they
0:18:36are very much higher scores
0:18:39so that mm lose may produce not frequencies
0:18:42but such happens is not necessary you too high
0:18:46and short answers just liked of these ram and thank you
0:18:50can lead to high schoolers showing that the content is utterances
0:18:53it's very important for
0:18:57so to summarize
0:18:59we successfully verify the effectiveness of our previous question answering
0:19:03by using real users
0:19:05and we successfully created samples using the selected questions yes
0:19:09and of future work
0:19:10you want to improve the quality
0:19:12of the proposed method and those so we want to try additional types of characters
0:19:17as targets for local a discussion on
0:19:20actually
0:19:28questions
0:19:49so actually this is a kind of a
0:19:53how they say people can compare different the answers and that's the winds in part
0:19:58of this the system
0:19:59the people can just actually there's a kind of like important here
0:20:02the people can just press this button then
0:20:04the you know you can you can see that this was much better utterances so
0:20:08it was kind of you know it's not a confusion but this kind of into
0:20:12the thing for comparing them
0:20:31yes a they are completely isolated
0:20:36no it was just this amounts to
0:21:09so we just wanted to make sure that
0:21:12we are not cheating so that that's not that the point
0:21:15and we could have done
0:21:17users
0:21:18but in their own questions and then evaluate the response but since we had a
0:21:23dataset we wanted to do kind of us as kind of a class wasn't survey
0:21:27so we can do that so we what how
0:21:46so we
0:21:48you have to be able reading with the this streaming service and that they have
0:21:53the right to be addicted and area
0:21:56so we have the rights to but our website and their fans on it was
0:22:00and we all of the right have been created
0:22:06and the other question
0:22:10okay so let's thank you gaze