Speech Transcript - Cogent: A Generic Dialogue System Shell Based on a Collaborative Problem Solving Model

so i'm presenting the syllable they have to a whole team a people from my

agency shown on and you are here j is not

and this is gonna be a little bit different you "'cause" we're gonna have no

neural networks knock or run with of the pause and no f scores

no numbers

so is gonna be a little difference

so here's a the problem that at that

we are i

the so we start state-of-the-art in dialogue systems actually a couple of you please and

the you know and others have had a similar slide

what we're doing mostly is very simple parsing based on keywords phrases and so on

a regular expressions as one

very simple dialogue models based on either finite state somehow or frame systems with slot

filling its own

engineer for a specific application

and there's

sounds of applications for these

but

every single dialog system is developed for that specific application in which you some cases

here this in get out

modified domain but essentially there's sort of separate dialogue systems they're kind of work together

with a single the interface

but importantly there is no transfer between these domains there is no generic

capability in these systems the transfer from

one domain to another

and as far as the kind of interactions that these other systems allow

there's

no effective the verification or corrections the kind of dialogue that allow is actually very

limited

so here's our position

dialogue is an activity that we can be and should be modeled independently of the

application domain

we i understanding of language to effectively and robustly handle the a broad range of

user utterances that the same

intention can be expressed in so many different ways

added

most of these

finite state based and with simple parsing hubris of data that are sitting in a

day just common

all the somebody's willing to spend years just

encoding what's a regular expressions i suppose

and we also think that the community needs to the frameworks to facilitate the development

of these a complex mixed-initiative systems with very sophisticated back-end recently and i think there's

a fierce of such tools

we see for example in parsing with a stand for the tools or nltk or

other various tools

people adopted them and they started using them and they got better outcomes of that

but in the dialogue maybe we don't have sophisticated enough tools

a tool allows for the for people to a develop such systems

as use only the title our model is

based on the collaborative problem solving so what is collaborative problem solving

well when they collaborate what they do they rehabilitate you they developed jointly solutions the

identify and resolve errors problems of the here a kind of the progress as the

task is going on

they jointly perform actions the of course they can negotiate roles

and they learned from one another

at all these things are done through communication right it's not necessarily by language communication

could be gestures it could be other kinds of communication but it is by communication

so we need to

our central thesis is that essentially all or at least most of the human machine

language based communication can you model effectively

as collaborative problem solving

what does the collected for solving a model in table

so what we need by this is the is that we need to model the

shared initial space between the two agents or some people actually have a

i and the something about

modified agents a sort of

once i

agent dialogue here we just limit ourselves to two but

even with multiple the same response applied

so what is this and intersentential spaceport kind of objects that we are dealing with

these are particles solutions

and understanding common ground session that strange

and all this shared understanding

arises from communication we need to communicate and agree on things and so on

one page counts

create a collaborative goal or as solution there has you to a pursue something together

obviously a selection japan like to go without

the other person

so this is i pictures taken from a paper i data alone and a couple

other of my calling problem

two thousand two

the models place the sort of the this case of tasks in this model in

four different areas communicative interaction a collaborative problem solving a problem solving a individual problem

solving actually

i don't i did of course might interest in this talk is just about that

solve the problem solving here

which really can look at the object in there really reflect the problem solving actually

the same kind of thing

except that their properties

the central thesis that we have that in the two thousand every two thousand and

it wasn't just ask other people have the same idea is that at that level

when you can a reason in a domain independent represent things in the domain independent

way

but this has never been rated are properly and we also didn't we problems today

we have a larger prototype we never really did it so here today i'm announcing

that we know that

and

this

this architecture would be familiar to all of you it doesn't look very different from

other things that we which is so far

so we have natural understanding there's lexicon ontology

the dialogue management which is really the class problem solving agent at that we have

it is in the centre

there's a the backend problem solving or okay here

a behavioral agent there's generation so this doesn't look very a different from other systems

the parts that are in colour or the components of cogent

of is domainindependent shall right so by itself people look at that you're not gonna

have a dialogue system just by that having that but you can have the this

dialogue system i dialogue system by adding to that

the behavior spectrum domain specific and not to mention that language generation and of course

generation you could press all have some higher level but mainly depend generation components but

we don't have it

so a lot of people can do sort of in domain you an iteration

and

we also to do i'm just gonna talk a little bit about that components there

so the natural language understanding the workforce of everything that we didn't for the last

twenty some years as in the tricks parser

it's a d

the that is too sparse to use a very representation of the meaning of every

a sentence it has a very sure principle ontology it has a very large lexicon

some of it or ten thousand maybe more

are handled lexical entries we it stand by learning from a word that but a

session we derive automatically so freebase for example for we driver automatically the roles that

have the they are from definitions

it's and so on

and

i'm not gonna talk about to make too many details but it is available online

and you can actually check it there's a there's a web service for the basic

parser and or number of variations of the parser as well

the output

positions

i don't see that

data

so i don't think this is actually visible but

so this is the

web interface i just put of sensors earlier something that it came up earlier i

need a hotel in the centre of calibration

and that's

what a parse multiply and you can see that

everything so there's a speech act at all

every single more represented here has a type in the ontology

so for hotel accommodation for needed one is one

can the residual graphic region

i even with the british spelling and their got that right

and if you look for example at the next one i prefer very nice hotels

when you can see that before is also one just like need which is something

that you probably want to you

and you can see how adjectives have

very interesting types here the space here is basically a value on a scale of

expressiveness as it for and so on

so you get very rich representation

well

there's an additional thing is here the dealing with reference resolution ellipsis processing ontology mapping

i'm not gonna talk too much about this

i one is the here is that the there's conventional speech act identification still sometimes

you can ask a question by making socially an assertion or you can you can

make an assertion by asking a question for making a request asking it a question

so there's conventional mapping between the surface speech act and the user speech act but

you just really

so not to do this yes agent

so a

essentially the output of all these national chance any sizes a feed into the a

collaborative problem solving agent and what it does is it provides a domain and model

communication adaptable to new domains

what side it just

what really could be called just intention recognition

so there's communicated at coming in from user utterance you want i understand would be

fashion of the user is i and we call that can also be guy

and obviously on the other side adjusting for someone to the specs much time on

that

if the system itself once to communicate to the user it will do that is

actually creating a collaborative problem solving task which can get sense to the generation component

and eventually we'll get into like that

so this section does that and essentially maintain the quality of a state

which

all these acts together essentially drive the a conversational structure so that's why it is

a dialogue model

and again going to repeat myself here but this is primes good idea that there

is in the in domain and the semantics of language that supports

reasoning about intentions

so there but

there is attention here between the desire for domain independent processing and the need for

very affordable a specific processing so

understanding detection of user is almost always it possible to do in just the domain

independent way so the way we deal with this problem is that essentially the collaborative

problem solving agent should be understanding of the user intention is a hypothesis

and then this is over to the behavioral agent which concludes sort of grounding of

all objects and is actually trying to figure out does this make any sense in

this particular state of the task does this makes test and if so then that

i guess

committed as a show if it's a goal then the system can mislead as a

as a shared real but if not there can be clarification so on going on

so is actually the way this is done based on the previous evaluate commit a

little

so the collaborative problem solving agent will figure out a probably problem solving a which

explains the user utterance

would send an evaluation and evaluate at the behavioral agent

and the behavioral agent agree use it will send back an acceptable and only and

we have a commit to the goal of the shared

and this is the same way that we're dealing with a request proposals of those

are questions as well

if the va

doesn't

a light

at the evaluation there's many different that there are several different ways it can handle

with this one is just say a rejection actually i think this should be unacceptable

but anyway

but

we use the like to do this and it can actually give a release

it's a horizontal we don't have enough box for corporate law

it is also possible to propose alternative way and together that for a to the

resulting

i'm gonna skip on aspect is just models

so in the paper is a very detailed description of the various a quite a

problem solving a

so i'm not gonna going to the detail so there's a number of them have

to deal with gold so we cannot do not select d for a goal if

you don't wanna deal with the right now you can completely abandon the goal or

we can really easy to release it means that it's completed

satisfactorily more or not

and there's a there's a bunch back support knowledge in make an assertion that is

actually once is committed to that means of the agent a now believe whatever you

don't the whatever that whenever the human user in intense corpus and the belief

this question is a ask even task w a just to what

questions

you can see in a number of examples that

quite complicated example these are actual examples from system you

including something like doesn't amount of sorely

at the conditional you

at a one that if we increase the amount of whatever the some other proteins

all

or i wh with choices of the gt wagner propose which are regulated by a

reinstall

so this is all the little and there's a number of access related to the

a problem solving status so again acceptable not an unacceptable are essentially interpretation yes where

the da says i like that i don't like it that goes can be we

use will reject it

they can be failures of execution i answers to questions and execution status which can

be either

done at the very end but it can still it can be also used to

just more progress i'm still working on this

okay well as you one is the u

what is mean to add a behavioral agent to actually haven't i was system based

on cogent

you can think of the cts access establishes a sort of a protocol was implemented

protocol and any sure that the obligations that these things create

are satisfied

then after that there's nothing else to do essentially there's no requirement for how the

behavioral agent represents intuitively

i think what it's a line system or a very simple database lookup

what kind appended complexity has

how many some agents are out there are a as long as there's a single

interface a single overarching yea everything should be fine

with it has a models alone

there are some related ways of affecting how the natural language understanding works

but is somewhat so you really want to use this and actually

change how the natural language understanding work because it's not good enough you ask the

did you never i'm not reliable

so we have a number of very implement coded based systems in very different domains

very different interactions is

so by duration

that station in an assistant a biologist assist and a bunch of systems that have

to do with the blocks world

more or less

and some others the that are sort of music composition visual storytelling that's creating such

scenarios for making movies essentially with animated characters so with very different domains very different

vocabulary very different interaction style

so i'm not gonna go too much into a into a we have used systems

but one of the reviewers we want to see the by iteration a system

and i could put too much into the paper because it wasn't published and it

still isn't really

but i'm gonna give you a little video of the system and

so these are all systems except for the one that you are represented the other

day all these systems are not develop is people power cogent and they developed on

the role

so let's look at it of a dialogue

providing you understand looks like logical systems like

was there

one is going to be sensor

but the trees are a little bit

the rule machine i don't want the one here

sorry but

alright so here we would have sort of a the dialogue history then is a

idea a system by averages

what you from an implementation and what you what the goal here i want to

find out how you be shown in the

b equal to these two genes

and there's just outline i think it's probably best work

so i'm so what is the goal here i want to find an explanation so

it's a very interesting type of goal of how this happens

and the way the system knows how to provide an answer that time is to

build and what a model of the molecular interactions

and can try to find out

one that you are maybe we which is kind of the source

useless is g the joan i in this particular cells

i'm gonna you go your

so the user then asks how does your maybe if we regulate pi okay now

why did they know you can see here about the p eight we hate you

"'cause" they're biologists obviously this is not a system for novices

and what the system does it actually looks also there's a huge array of a

by will just pacific agents

including ones that go look up a ways in a perfect database is

there's one but actually read papers and can we can extract information from the air

so it defines a watermark task between these two

g and it creates a network that the user can use it as a source

of information

so i'm gonna speed up because i know my ties are already right it is

okay

so a and creates a so

i'm just gonna lexical and only because it is below

so not the user creates with the system at i a very specific don't model

of this

the system actually based on what it sees it can suggest additional information based on

what it knows

and the user can look at it and say well okay that

good enough with an actual i know something even more specific than that

and the system comes back you can see here

but

to actually explain

the original question that the user a

and there's more it can actually take this and create a dynamic model about it

can ask questions for example is the monitor for whatever protein high and you can

see all kinds of useful information about so i'll stop here

for

four point recognition we actually don't to a

in the in the air agent in the cccs agent

we don't actually use right no plan recognition i know that i

more me

running when you

understanding dialog

for now we don't at high

the i you can see some essentially the one where of i've answering this question

is why was why where we successful with this where we're reward before and done

more work before because of this the way we split

what can be done in the domain independent way from what can be done in

a domain independent way

so a lot of the time i is a set in this evaluates commit little

we basically just wrote things over the fast and say well you figure it out

so most of the situational context and in there is not a model of user

modelling in this thing but the were all of this would actually reside right now

in to be a obviously you want at the at this is a level to

have some of it

to be able to do some walk some more reasoning but right now we don't

we don't offer a deterioration that all the teams that have worked on this have

essentially created template case the generation on the role and so we did we don't

provide

shortcuts

would be very difficult

well we started with similar goals right it with the collagen there are

actually some of these older papers dealing more with that question about the differences

there are some limitations in the collagen model there are some really good features the

colour to model

so i think we can at the same in the same direction but kind of

tackle things a little bit differently but actually i just wanna learn recently that the

the chart for each and others that have put together idea i toolkit

moving in the same direction

although as far as i understand i haven't seen it in practice that their there's

is more task oriented kind of like reading floor

so you know what you know way they can move their expectations as the kind

of reduce their expectations

so i don't know discourse on the slice sliding it was at a

link you can actually download it recommended to use

at least the parser you can actually do much better than what we people do

and if you want to use the whole system will be

Cogent: A Generic Dialogue System Shell Based on a Collaborative Problem Solving Model

Oral Session 5: State Tracking

Lucian Galescu, Choh Man Teng, James Allen, Ian Perera