Speech Transcript - Characterizing the Response Space of Questions: a Corpus Study for English and Polish

and the last talk of the social beyond characterizing the response place of questions

a corpus of us english and polish

relevance

in some sense

all in the sense of a competition coherence

which is amplified here in this example one

where

the on the on the first line of answers are kind of relevant

is that chain you yes it's a louis fourteen replica you

and second line is i

the ones that are not appropriate

so this notion is the cornerstone of fears of dialog the same way that say

grammatic allergies to syntax

and you could argue that basically what the during test is about is exactly relating

to relevance and whether that the that that's a good test for

you know when we managed to simulate a human intelligence

so i'm gonna a restrict attention to data corpus study relevance relating to queries

possible responses to queries

and

a bit surprising they perhaps

even if you restricted in this way the been actually very few

comprehensive attempts to characterize it

the some references in a paper and also in some early work that we done

the that a talk about moment

so the some

early

discussion of this in the language acquisition a literature

and some discussion this in some conversational analysis

literature

which is primarily that it to show that there's a at a difference between

three classes on so as not know announces a non responses

and

so in this study were based i've as it looked at n different languages and

show incident gone distributions and

we also flying i'm not the paper they won't korean the child quite different distributions

between say you the results that style but i don't english and the results and

korean but mainly about this

basically about this these three classes on says non ounces and a

a non responses

so today i'm gonna talk about starting by taxonomy that we developed for just characterising

query responses appear responses to queries

something that's sporty characteristic to certain race

and

we then will mention of a basic hypothesis that we used to just scale up

to the general case

talk a bit about annotation scheme in the results

and very briefly

talk about how one might model relevance and what the ready complications then hence

so the starting point of this work in the typology that we developed in the

top level of work of skin myself developed in some work that was published in

the

in two thousand sixteen in the journal of language modeling

and this is a wide-coverage taxonomy for question-question a sequence is was tested on

the bnc child's

the be corpus mammoths corpus

and there was also their formal modelling of the resulting classes in the framework of

our costs t l

this study consisted of about fifteen hundred slightly less than computing how to query a

response to as

and what are merged with seven classes of questions what are called elegy classes for

all not all corinne sponsor but for the two people those one

in this study

so we have yet clarification requests

things like hamlet as a response to what time it about

depending questions

so these are things like does anybody want to one m spread a given the

way

where you can do the inference that one question depends on the other whether anybody

wants to ban and strive depends on whether you gonna give it away

a classical motive which questions about underlying motivation what's the matter why

for class

whether responses and the changing the topic

well as you on so always yours

a fixed cost questions the duck a wet whether the you're trying to understand what

which way you're supposed on sit you know what makes black coffee is

which country

and the final to one is questions with the presuppose on so whether

question-response is somehow

indirectly in indicating on the to the first question

and the seven cases where

the

response ignores initial question but still addresses the same situations of things like

do you go you wanna go down have a look at that now what is

what when there and the response is why haven't they finished yet white with of

a is about the workmen so it still about the same situation but it's not

at all responding about the to the question

so that was that those of the seven classes we found the i need to

characterize

cool question response to questions which is about twenty percent of all at least at

the time we found was about twenty percent of all

responses to questions

and a main above hypothesis for this study

e is

that responses drawn from all concerning these class of questions

plus direct indirect on the food

that's going to exhaust the response space of a query

okay

so basically

you get the following kind of scheme

so a response to question can either be a non se and here you have

to subclasses direct on as an indirect on says

and ultimately in the paper also discusses that these actually needs some extras the process

within them

and if it's not i don't on so then it can either be a question

response like we've

already discussed with these seven classes

or it could be a noun so it can be the kind of gone response

so a kind of an acknowledgement

two classes that i'll give an example a second this the i don't know class

and this is difficult to provide response glass

and then declarative responses the about

the these issues that also already all rows in the in the question kind of

response

so the i don't know is this kind of

very not uncommon kind of response where and equally the this is a difficult provider

announce the case and acknowledgement of course you all the very familiar with those guys

so the data for english comes from the bnc

the be corpus and not on the map task corpus

so as you

you probably most of you familiar with these corpora the bnc is a

ta p honestly conversations

be contain speech or dialogues from the class courses

and map task consists of donald the code for direction providing task

so we took about five hundred past and b and c two and fifty from

b and about

slightly less and five wonderful map task

and

basically

the way this a good was a random ten selection of turn units ending with

a question mark

where we also eliminated type questions and turns with missing text and tens of missing

text

the polish data was taken from the scruggs corpus which is basically the spoken part

of the polish at national corpus

and that consists of that corpus consists of about two and fifty thousand utterances

and for this we chose about two hundred yes

okay so the basic results

all that for english the

the other classes is it is less than three percent so we have

more or less close to ninety something ninety seven percent coverage with this taxonomy

perhaps not hugely surprising

the most we can cluster responses in all three corpora in english

and approach i direct on says

in the bnc the biggest next biggest classes clarification requests

so be the next biggest classes indirect announces the map task the second biggest is

you know actually

ignore the case where you respond with another utterance which is about the same situation

but it's not respond to the question

so you can already see that is fair amount of variability across corpora

for polish the two most frequent last response is a on says so direct ones

and indirect ones

and then the next to a frequent classes or the i don't know class

and the ignore class

so this is roughly the results and obviously it'll be a bit hard for you

this is all in the paper so you can if you in the resulting in

detail you can you can see it there but you can see at the top

you have the of course the most of the masses taken by the

the direct on says

but with

the task oriented of course getting much more direct then

something like to be and see that the and open corpora like b and c

and spokes

and there you see and then you can see

that's there's a fair amount of variability

across corpora fulfilled you different kind of classes telling you that you know you're not

gonna get a good on you can't there's no chance of getting good characterisation of

this problem just by looking one corpus

and as we found in the in the question study at is quite a large

variability across corpora in terms of these kind of distributions so the nature of the

corpus really

again that's not very surprisingly influence is very much the kind distributions you get

as far as real reliability goes

so we did a in a i just speak about the english part of reasons

the time but the polish is discussed in table two so we did an intent

eight is a study we had would had to my main annotators were also

paper

and a work try to students in object linguistics

l two speakers of english and then to when assemble training sessions with the me

and

both annotated around five hundred paths and from this we extracted five hundred calmly bad

has

and

we got a cap of for our about one sixty five a group and of

about one sixty six

there

ninety four cases where the annotation to the disagreements where annotations agreements a occurred

the main disagreements concerned direct on says this is indirect on says so weak that's

about a third able to disagreements

it could no versus

change the topic acknowledgement a direct depending question and a direct answer

and acknowledgement this is the

so direct indirect disagreements mostly occurred with why questions how questions and what is x

doing questions

and visa cases where on says all by a lot sentential

and for which has been significant can promising theoretical literature on how to characterize onset

so just to give a couple of examples

so we have here case with the why question why deep tan'll to know that

well as the new guy

so the annotators disappear i was a direct or indirect and eventually was a resolve

to indirect

and is another example a web

this is a four to one again to why question i thought very nice is

it no it isn't what is why isn't it "'cause" it isn't

and this with again to go clean direct on statistical model

and eventually resolved to an indirect on sit since it indirectly indicated is actually no

reason

okay so this is just we just to give you a sort of flavourful for

full

the nature of kind of disagreements and

ultimately

the fact that probably

this is a kind of task

where

a notion of annotate a more sophisticated notion annotation we wait which doesn't necessarily

lead to a resolution but leads to actual different kinds of judgements having to maintain

it is probably needed

okay so the final thing i'll just mention is

that sort of formal analysis that it

that is needed in order to solve this to

two can describe this problem formally

so in our original paper we provided rules within the cost is the follows them

that

characterized

how the coherence or of

these seven class of questions that can the kind of "'cause" response to questions

and to the extent that what we've

what the study is shown is that basically

the class all of

responses

basically on says

plus

direct indirect bounces plus

things that are address these basic issues

then we already have essentially a complete characterization of the response space

which i in again potentially in the in implement able form in the sense that

this is the cost easier formalism is i is it is a sort of information

state type formal is them so it's but actually giving you a

has potential for implementing a kind of

a for dialogue manager

so just to make a few a few comments in that respect the most basic

a notion of answered i you might we might say is

something one has been a cool simple answer would

so if you think of what a question is for mathematical if you essentially a

some kind of a allowed abstract

where does for broke white ball questions it's a i'm abstraction of a empty set

of variables and for the rich questions over a set from of one or more

variables

then

a simple utterances are of course for polar questions just the two polar opposites

and

for

for all other research questions they are on the instantiations and then negation is

and this is actually a system plots the hood if you're the corpus has pretty

good coverage as

we know from this is of course and a way of they're pretty a direct

way of talking about slot filling

but that the ultimate notion of on subword which a goal here about nist had

encode about a similar in the real lecture

have to be

actually ultimately

if you want really wide got a coverage have to include things the go beyond

simple onset would so it has to accommodate conditional

we demoralised and quantification on says

so this addresses some of these kind of questions that the silicone is been asking

what all these poor people who are just a filling slots

and so

that was so

again i'm not of the i don't have time hated to see how to say

how you can have formally deal these opinion the discussed in the recollection questions

but at the same time even though that there has been discussion of how to

accommodate these kind of

on says to

so that the that also direct on says

with still lacking a comprehensive empirically based experiment extracted tested account for

of right a wh words okay so the all of this the reading lectures based

on based on very small number of a examples just for a small number wh

words

and of course additional notion of their questions needs is some if an exhaustive knows

which has to prevent wrap traumatised

and

whether responses exhaustive well

can determine whether response of except the required for a query so this leads to

what we i mentioned before that we need to find a great sub division of

the honest categories

and therefore on the base of about this and some notion of a source the

best one can define question dependence

and that's that the basis for instance for kind of a rules that you can

give the dialogue manager like if a question some discussion respond with an utterance which

is a few specific another with this either provides an answer a whole a dependent

question-response so that an example of the kind of way of

characterizing the coherence of

various some classes of a responses

the fine across all mention which is the another very big class and has again

fit as a important

implications for the kind of information that you need annual

representations is clarification requests

sold in work by again there's been a quite all of the reckon work on

that going back to what by a matthew purver myself

where we showed how to account for the main class of clarification requests

users using rules that enable clarification questions to be relevant a given utterance

so the basic idea

we are going at any for details is that involves accommodating to context certain kinds

of clarification questions

with rules of this basic format

so that the input is so much as you given so much as you want

something state you would the constituent of this is actions on that application

then you can accommodate any of these kind of a this class of questions what

a mean by you one what would today is that you one or a kind

of a confirmation kind of questions

but knows to do this you could not do this just on the basis of

having

content based that the content of the question as input you need the whole sign

that's associated with an utterance

okay

conclusions so i presented here and initial study for the for

what we've as possible we can see the first detailed form in depend characterisation of

response basic queries

and k s

a lot of things that need to be done

so one thing is cross question type in comparison so as to set the that

the question-response pairs that we looked at

was selected randomly and obviously it's interesting to consider distribution responses relative to fix parts

of questions

so different foster wh questions polar questions and so on and again we can be

facial that they'll be different distributions different fit for different parts of questions

we need to apply machine learning to acquire the response classification scheme of course so

there's been some work on this severance than the men ability of nonsentential utterances

so that that's a

subclass of the kind of was a response exist

so that gives hope for the non the bit of some of the sound classes

we anticipate that it's a some of these classically pretty difficult to learn for instance

the ones that a

heavily based on inference like indirect ounces and a more will change the topic

obviously

as everybody here is interested in down to just implementation so we'd like to test

these in a in a dialogue system with a fairly sophisticated management the of that

for instance of the goat is class

so there's been some initial experiments and work by arrive at allen and gotten but

and of course another here we gave you a

bit of work on english and polish

but of course get which only show you some differences

so is a signal us a given challenge we think is of see how you

test this classification with languages that

lack

speech corpora such as

about ninety five percent ninety percent of aligned is on have

so we we're starting doing some work on this respectively we go and we and

just by using online games

online games the proposed

we have a few minutes of questions

hi first of all thank you for your talk this is really interesting

so the question that i had was that if you could go back to slide

seven please

so you're example here for the changing of the topic a it seems to me

that this is not exactly a changing of the topic because you're staying on the

same topic they asked you what your answer was in us to what there's at

the same general topic but rather more of an indirect refusal to insert

so i was wondering it seems like changing the topic is always an indirect refusal

to answer and we consider a refusal to answer as part of your ontology

so i mean this is a kind of indirect basically thing that you might is

an implicature all providing this kind of response is that you know you're trying to

i mean you certainly not addressing this issue right

so we i

in our original work we actually suggested that these kind of it commented that when

you provide is kind of response

then

maybe for posting reasons that the most common way to do this is by taking

have a question which is kind unify able with the with the original one with

more general one you know we should talk about what or whatever is well though

this is you know

so this work some for many cases but

i mean the more general thing is just to provide i mean you could you

can you know you can sort of do this kind of changes topic

in a way that it can be less smooth of course but you know by

throwing something that is quite different and these things also happen so this

this is this the smoothest way of doing it just from in proposing point of

view

but it's not have a well that's gonna be

in this way

so you the basic dynamics

for this coherence have to ultimately allow you to that

as a consequence potentially

to get one of these questions eliminated

so that works in the in the setup did you did you see any instances

of like direct refusals to answer a question in your corpora

where somebody asks a question somebody just as i don't i don't one answer that

are i refuse to install i mean they're the coming the not very calm the

this another common but and how would your skin like what class would you with

that many well that's the

so that's here so these of the character ones that a about the issue that

is down the underlying issue of changing the topic i see thank you

we have time for another question

thanks a i wanted to follow up on your future work about the relating to

quite a types of questions

and i guess you have and only about but i wonder how much of your

differences in the corpora might be due to different distribution of questions that are in

those corpora versus distribution of types of answers to those types of questions

then also maybe could use quickly comment on how you define question "'cause" i think

you so there's just a question mark them a corpus so you're

you can probably one reason you're not including declarative for direct a no so basically

we in terms of a pharmaceutical questions we just

doing this kind of family

so i mean that's kind of building on the fact that

you know transcription has decided this is a question

and so that basically means that it's typically going to be i'd average questions all-pole

questions

which could be either you know they could also be so the declarative the data

that have a question mark the end so but they usually have the same basic

function as well as a sort of draw people what

so you're set your i mean i guess because since we have done this we

don't know

and we i think it's all the cnn interesting question also exactly you know

i again i'm not aware of what the street address this all that is look

at you know what the difference we have actually in a as the c l

paper that we had a two thousand seven of oracle financial mapping and me we

actually did have some tables of the different distributions of different kinds of wh questions

both

clauses and lexical ones so there is some work on this actually but you know

that was just for one of hope that i think of some b and c

so it's

forward

i don't think the speaker one moment

Characterizing the Response Space of Questions: a Corpus Study for English and Polish

Oral Session 6: Evaluation and Data

Jonathan Ginzburg, Zulipiye Yusupujiang, Chuyuan Li, Kexin Ren and Paweł Łupkowski