so my name is a recharging is not there are some in the operation and

the today i'm gonna talk about the real data is question answering by a real

users for a million samples is consistent first like this

so

now we are seeing a lot of

samples okay because we are talking everyday the these little some people are talking to

these characters everyday

i criticism microsoft's we know in japan

it is very famous people talking to a everyday and we have a like to

get a box i image

the people can tell to the virtual characters in this us small cost

and also we have a

more human like

catherine you mentions in destiny as in david work

so we are having

many samples and they have consistent present it is

and if we want them to the but double they need to have consistent just

like this

and to generate consistent responses what follows

it's got each of the specific question answer yes

like

but the creation of that yes is as you know very costly

so the motivation behind this work is that

we want to efficiently

what

questions that there's for characters

and in this work we particularly news

the technique called role-play this question answering

as a technique for collecting

the

questions that s

and it before going into the details of this work i'm gonna explaining about what

role play this question answering

so in well database question answering

in the middle we have

a famous person

and people users talk to this famous person

and in this case this is an image and cutting down who is very famous

we've got is a

and

at the back

all this and scatter we have a bunch of all players to collectively play the

role of the famous plus

so if the user this user

asks a question to this famous person like what to do you like

and this question is broadcast

do all the old place

and better

one of the probably as and so is the question by saying like high tech

suites

then this answer was like to use a while

and

this question a second formant

a this there can be collected at a question answer for this task to

since both players can enjoy playing the role of their favourite character

and also the users can ask listen to their favourite character

users can get highly motivated to provide questions okay is that this is how it

works

let the that there are some problems with this architecture

so that is

only a small scale experiment with paid users was performed

to test the concept of the whole database question answering

so because not clear if this key would work with okay we've users

and also another problem is that the small scale experiment

if not you must data

to allow data driven methods to work

so the applicability of the collected data to the creation of examples

but not very fight

so to us all these problems in this

a to the protein that we buried by

effectiveness of role played this question answering is real users

six study we focus on two famous characters in japan

and

you setup we have signs for roleplay discuss something

both the people to you know enjoy the class

and for the second problem we created samples using the collected data

quickly in this way

and

in this paper we propose a retriever based method

and evaluate its performance by subjective evaluation

so let me

talk about

that the data collection by you

users

so we focus on these two characters

who are very concerned about

why is not my reason actual present and he's a company c or and

he's also youtube a who specialises you like the coverage of t v games

and

and the characters is a rig it is there is a fictional character is novel

and it does is the company this you

and head character is often referred to as the and the right

according to mitigate here and their exact is mentally unstable and use extreme balance of

brutality is an absolute

but in most so they are two very distinct

different chapters ones

actually present

male cat to another one that action factor of female part

and we set up websites

so that people can enjoy the role played this question answering

so each task has the channel

kind of maybe a kind of channel

user channels for the fans on the japanese

jamie service you can decode all that

this is like are you to

and

we set up the side

on their channels for the subscribers to enjoy role-play based question answering

so this is how it's how the image that looks like fall right

the people down

for questions these are the questions posed questions

and these are the given answers by several pages

and this is how it looks like full

sn

you can post questions in the text few and the and this is a

is imposed by the user and this is the answer posted by the well

so this is how it looks like

and we ran this kind of a trial for several model

and this is what we get task to a few and shows the statistics of

the collected data

if you look at the these two

number of users who participated and number of a questions okay as we obtain

we obtain a have many uses a

as you can see play roles of right and is a model three hundred people

participated

and we over ten thousand questions there's were collected for both

that is right and there's

and also houses for is a this is this is average

words but also that are is that is pronounced as of is it will much

longer and contain more wasn't matters

so in that is a there was more talkative and my are not as talkative

that is

just filling their effects present utterance

and this slide shows efficiency

of the data collection process

that this

yes table shows

how long we took to reach this number of questions up yes

so

for example

to each two thousand

there's

if the standard a full scale of the seven day from right and about one

day for is a and to reach ten thousand pairs

it took about three months former i and eighteen days for testing

so for both characters it is just about the couple of days to reach two

thousand questions appears

and what is a we collected

ten thousand question answer pairs in just eighteen days i think if it is quite

fast

and deciding this confirms this chancy a role-play discourse something for the question

you "'cause" note that uses doesn't run parry provided a to develop a they just

boundary in

provide data enjoying contrast

and the decisive the quality of data and user satisfaction of the users

so this shows

this table shows the average score for example downstairs

and the maximum score is five and we get very reasonable utterance correctly for the

posted classes

and for the user satisfaction of the users

we had the three items for the questionnaire items usability a website willingness for future

use and enjoyment of update and we see that users really enjoyed roleplaying

so we have a created about the more than ten k okay sounds okay as

in

well maybe this question answering and now it's time to create samples using the click

data

so this is a overview of our proposed method

basically we employ a retrieval-based approach that you haven't that question q

and

your question answer pairs of which leaves from this question answer pairs database that we

have collected

and if

the score of this which ends up the is high

in this exactly as or not

so

with the highest score is but and it's a prime

is used as out of this task

so for example this has a score of zero point nine and other ones how

the scores based on

the point nine then this would be selected and a prime the use of the

output for this tuple

and

the important thing to do this

how do we collected this goal

so for this purpose we have this scoring function

it is a weighted sum of six

different

school

so score you types my school central school translation score

so a rave transition score and semantic similarity score and these scores are integrated you

calculate this overall score for the for each question that

a nice

describe these scores along by well

for the initial sweets course

so for the summer school

this is what is given by the scene text with you but engine conclusions of

asr service this question as a great

and reason using with default settings it uses the m twenty five as such

and for the question types

my school

you score is calculated on the basis of case of the question type of to

match that of q prime and the number of named entities good prime requested by

chris

and also susceptible school

we first extract centre was and the was mean noun phrases representing topics are extracted

from all those q and q prime and if the overlap is score of while

it's okay

for the other three scores

well for this some sessions for use a mural found this model can be a

primary cue it is a generative probability of a prime given q at the school

the model is proclaimed is in house the point five million question answer pairs and

then fine tune is a quick collected questions up yes

and for this purpose we use open and m t two

and the reverse translation score is very similar to the translation score not be huge

even a crime is used

at school

finally the semantic similarity score

first sentence vectors are obtained from both q defined by using the averaged word vectors

using welcome back

then cosine similarity between two sentences because it's

used at the school

what do back model is trained from wikipedia articles

note that all scores are normalized between zero and one before integrating the schools

so it's i shows the overlapping to all the system

so user question comes in then this look into document retrieval engine the same achieve

this question answer pairs from discussions appears database

and top and candidates aretha

and for each of the candidate

indicate the score

by using these modules

question-type system action a named entity recognition sent over the extraction module you are translation

models

and what of a model

and we obtain g six

scores that i just plain

and

we get the final ranking of the two it is a the and outputs the

top and

just the masses and did not use only top one also

at the tuples response

and

because we have only about ten k

questions appears in this database is that it can at the coverage of the questions

you know you know

so we additionally have another database which is an extended question answer pairs

created from discussion on sub yes i just explaining but this is

so to extend the questions that the as

we first

focus on this

on the full

in a in a in one particular questions up

and we first that's for a very similar

three in a feature space

which has a very similar content on the normalized edit distance is below zero point

one so they should be very similar on the surface

and for this study we use

the all that questions

to which this was announced

and we therefore these questions

and

a couple these questions is questions and the sounds that

and these

hubble's i mean do is extended question answer yes that's how we extend its question

answer yes into this extended question answer yes

and former i

we all the thing additional wasn't really on

questions that sample is a

we obtain

about one million additional questions okay yes

so by using the proposed method

we did an experiment to verify the effectiveness of the proposed method

we use twenty six subjects

each fold ryan is a

and they were recruited from the transcribers data they are very tricky about the quality

of the utterance is that they are five of the cactus

and the procedure is that each subject evaluated ounces

of the five methods for comparison i explained and misses later

on a five point likert scale

and

you use test speakers questions which were the held-out data from the collected questions appears

were used as input

we have the two evaluation criteria

why naturalness

not knowing who's taking the answer is appropriate to the input question or not

and have an s

knowing that i think question is taking there is probably due to input question on

so

i

describe the message for comparison we have five

we have two baselines

and to propose messes i wonder about

as a problem as a baseline while it's called mail

and it uses general-purpose three hundred k and crafted we use you can email a

show intelligence markup language for response generation

and personal pronouns and sentence and expressions of them

but i lose to match those of the cast as

so as you know this is applied massive amount of

a handcrafted rules that we have been developing and we are using that

for response generation in this and of set

and baseline to this is called c

and it is easy the answer to the highest ranking to it

which achieved by to see which uses the in twenty five by using the input

question other clear

and this is the proposed method one it is called prob

without you x d be extended database the proposed method without the extended question is

like three

and i have the all the all the weights in the scoring function a set

to one

for this proposed method

and for the proposed method to it's called prob

the proposed method this is the proposed method itself and all the weight us to

do well

and the upper bound

it's called goals and it's a gold responses

provide it online user's focus questions

then we compare these five

and this is shows the results

for the five methods for both right and s

and as you can see that the proposed method a much better than the baseline

all right the proposed messes seeing significantly outperform the baselines

and those the problem is that doesn't probably the text and database or not

of what is a

the proposed method outperforms one of the baselines which is mail

and also proposed method is better than problem without extent database all naturalness

the weighted by good and this is a

at the bounds of the but close getting goals is the

gold about data

i show you some of the examples that a more interesting so for example this

is for right and what you do you

for lunch today and then we tend i have it's a compressed by for it

is good at the g

and it had a very high that's on the school but it does not very

much like and so

and the proposed method just return running

but it was hot but it was that just like himself

and via say

use of cute with a question and

we had the two

responses like to thank you very embarrassing thank you from the proposed methods and they

are very much higher scores

so that mm lose may produce not frequencies

but such happens is not necessary you too high

and short answers just liked of these ram and thank you

can lead to high schoolers showing that the content is utterances

it's very important for

so to summarize

we successfully verify the effectiveness of our previous question answering

by using real users

and we successfully created samples using the selected questions yes

and of future work

you want to improve the quality

of the proposed method and those so we want to try additional types of characters

as targets for local a discussion on

actually

questions

so actually this is a kind of a

how they say people can compare different the answers and that's the winds in part

of this the system

the people can just actually there's a kind of like important here

the people can just press this button then

the you know you can you can see that this was much better utterances so

it was kind of you know it's not a confusion but this kind of into

the thing for comparing them

yes a they are completely isolated

no it was just this amounts to

so we just wanted to make sure that

we are not cheating so that that's not that the point

and we could have done

users

but in their own questions and then evaluate the response but since we had a

dataset we wanted to do kind of us as kind of a class wasn't survey

so we can do that so we what how

so we

you have to be able reading with the this streaming service and that they have

the right to be addicted and area

so we have the rights to but our website and their fans on it was

and we all of the right have been created

and the other question

okay so let's thank you gaze