Speech Transcript - Task Lineages: Dialog State Tracking for Flexible Interaction

i i'm constantly and

or with that

and

i'm into it

and i'm willing to reduce the news that does go framework for a that it

was that attracting it is called task when it is

to handle flexible interaction

though this is my online and after

giving a brief introduction i'm i will get to some challenges we one addressed in

the store

and the approaches to be used to solve the problems

and i'm gonna show some experimental results on benchmark test there's that and then our

conclude my talk and if the time permits

then i will

peeper to some technical details one task for in parsing

so we have seen a lot of one recent advances in statistical devastated tracking

and

of the next thing is that many algorithms have been shown

to be active a divorce the result from search task is

bottom so we got a really a robust to systems to some noises errors like

asr errors in a style varies

but they are usually limited just some you know session based simple task simple a

corpus the dialogues

given the

servers in the interest in conversation agents and the enormous use cases

it seems like necessary to extended the previous

but those to handle multiple task is with a complex calls

in just like long interactions task

so let me talk about the set of challenges to be want to address your

and our approach is to solve them

and the first the challenge you be complex schools

so what i mean by complete score is the

any combination of

positive and negative how constraints

so in

restaurant

finding domain

you can say italian or french but not tie such kind of thing and the

approach we take is a very straightforward

we just to do

constant a level

belief tracking rather than a slot level tracking

though a second challenge is to handle complex input from distributed et al use

the complex input includes not only complex calls about multiple task is that the same

time

so we introduce some new

a problem

were concept of called task frame parsing to address this challenge

though

to scale the conversational agent platform or we usually adopt a distributed architecture so in

this architecture we have on several

numerous actually a service providers or without their own slu and sort of as a

components and we also provide you know on a rate of a common components like

slus and task is

and then down the when user input outcomes in it would especially to all these

components and its component to will return

on their own interpretation so it's the

platform to where is duck a possibly completing

a semantic interpretations to come of it though a coherent

semantic interpretation

so for example when user says connections from soho to me the town and on

p m

italian restaurant near times square and have friendly coffee shop

then

the trends it a domain will detect just so as a form

we did town it's to one p m as

time and so on

it goes the similarly for local domain that's as well

at you can see there are completing slots

and

what we wanna get is

a list of coherent

task

a frame parser is like this to it could samples

so the first parse

identified a first

three spans as the trains it task frame and it also has to more a

local task is

and it's a probably right so it gets a high score like to point eight

and the second one is a less likely and it one you have to local

ask frames and you get so variable scored like to point two

so we call this process as task frame parsing

so we use

beam search using mcmc with the simulated annealing

umbilical many algorithms to be can use

to do this you press we chose to use this method because it allows us

to integrate hard constraints with the power probabilistic reasoning very easily

so i'll one you get sample of hard constraints will be mutually exclusive in yes

recently is

either way act items with the same span cannot be on used at the same

time that's kind of constraints we one mean

and to do probabilistic reasoning

we use the normalized global what would be a model like is

so we can get the confidence score at the end of the thing

and there are numerous a features are you can use so easily port our paper

for more details

and thus

third challenge is about flexible

course task management so to do this we also introduced a new concept of cold

task greenish

so yes i'm suppose this a situation

so a user starts with this conversation with a two task select a weather information

restaurant or finding

and then she continues this composition with the transportation task and ticket booking without compute

leading the first to one

and they she laughed and you know ten does some meeting and came back

and try to reach assume that a restaurant or related to task

and then a sheep

now finishes the restaurant booking and they moved to the transportation and ticket booking again

and complete them

so if we you use a traditional stack based on

task management then you might have on several problems

first you might not be able to do some you know or multiple passwords at

the same time

usually and the other problem is information loss

so when you can't to the turn three if the system about the first restaurant

of finding is complete then the relevant information by removing gone so you just you

know we started a restaurant a task at turn three again

on the contrary if the a system that it it's a cup in company

then on the system can resume the rest want to related task

without relevant information the past but when you get to do a time for actually

the system should you know most of popped up to a transportation and to get

looking to resume this a restaurant booking task

so we it the relevant information for task of okay the task at time for

will be gone

anyway you might suffer from information allows

to handle this problem

we come up with the task of the image kinds

and there is no restriction on the number of phone a task of states in

the task may need

so you can from multiple cats goes and so many you want

and also you a of the task between each grows at each turn

so i mean whenever you we have a new turn you just add a new

task as days

and retrieve relevant information from that have stays in the past

so for a transportation and ticket booking

can get some information from rest of

want to finding if these are task as

are you know are related in our higher up to nine

and turned really

you can you know

resume their restaurant finding

even if the long time

and you can retrieve the relevant information from the task of state and first turn

out to one or without any problem

and

you know that you can similarly for the current for

so basically we don't removal or abandon any information in the past

and you can always retrieve relevant information from the past

and them to you know current task as there is to give you the ideal

from the current focus

and how do we do the context of matching

or we construct context the stats

it is very simple given this has got lineage

we set a time window

and then you construct a beep is that

to collected or

the latest belief estimates

p for the time window

and then you construct

motion dataset

and user act a set

by collecting all question acts

and the

task of frame parts is

in the time window down to

and then

you have an and the context the stats and based on the current

much an act and the current

task of frame parse you try to select which information you want to use

to update the current apply

so it's not just you know a bunch of a binary classifications

so we use a lot just a regression

and there are a bunch of other features for this task so you can refer

to my a very well for them

and the forward challenge

it's about a casket disambiguation

there

always could be some and but in task detection

to sort of this

this problem we

use

on n-best list of the task of images

and this a on the user's that

i wanna put high then this could i don't interpreted as restaurant to finding or

travel

we have them to pass clean it is here

and then when the user clicked by a four

a real intention on like i saying i meant i wanna travel to try high

then down on the you know second

has continues will get higher score because it's a more coherent

in this way we many cities

the task or ambiguity

the overall tyler was that a tracking procedure

older consisted of three steps first we do task frame parsing still given a set

of one possible to completing semantic frames from distributed slus

we generate a coherent task a task of frame parses

and then given the task frame parses we try to retrieve a relevant information from

the past has "'cause" states in the lineage

and it happens for each of the image

and we use this retrieving information and the input information

to date at the task stays at this turn

the how do we do

the task update

actually

this is one of the most a trivial task is in this framework

because we can pick up any

meant but was developed a so far for dialog state tracking because thus they ask

a state update is not in part of a dialog state tracking for conventional setting

you can you know enjoy a wide range of different algorithms like a discriminative map

of the and channel document or you know these troubled done by you know tuning

the data

and done you can control well between the

i believe it has to make use of belief estimates and the wall

observations

to make the analysis simply a simple

five

we actually just about that

the a generative rulebased the manifold of from to go at all

and we use this algorithm for belief tracking for each slot value pair

and

these rules are just you know aiding the current of be the of

well by you know i agree creating a negative and positive confidence scores

so let's move on to evaluation

so we used

dstc two on to evaluate our algorithm and its based on the restaurant of finding

domain

and one interesting characteristic of this day that's that is

relatively frequent user's goal changes

so if our method working well in this

a test dataset then the context venture

sure of and then

in any information for all the gold

so the let's look at the result

actually our mentality

show the best performance so far one accuracy

and

this actually can tell you

that the importance of better competition in tyler was state tracking problem

and

we got this performance to be down using any in seem it would like you

know

a system combination and word neural networks where

decision trees it's just a rule based update for that's a ask a state update

and we want to evaluate our system on more complex

interactions

but unfortunately there is no i sure able on data set out there so we

i had at assimilate some datasets

and we

to dstc three data

as our a base line base corpus

and because the that

contains multiple cats case

like a restaurant to finding copy shop finding end up finding

so we simulated three datasets

with a deeper and a representative settings forced to one

and does not have

any other user goals a complex social goals and no multiple task is so we

just to use that the s this

dstc three it is itself and for a second setting we a have a complex

usual to simulated and no multiple has goes well and down for the re

the third dataset we have a both complex user goals and multiple task is this

numbers are all rips task for this corpora

so was look at the which alt

if you look at the joint goal accuracy

we actually compare our system with the baseline system in dstc

and if we look at the or joint goal accuracy then the are almost all

from a baseline system traps a very sharply from zero point five seven two zero

point three one and zero point zero two

well

our system

dropped some words and the lights your point nine

point five nine two zero point four white and zero point three object that

so keeping the fact that

the task gets exponentially harder with a complete with respect to the complexity

this is gentle reduction is a big when

and we for their evaluate our system we don't work on results

the t l t st

all p uses oracle parses and t l

yes t or uses both

oracle parses and were local

a context patches

and you can see the improved results by using oracle information

so this indicates that there's a some room for future improvement

then we conclude my talk

we have proposed

new statistical dialog state tracking a framework called a task greenish to orchestrate multiple task

is treated the complex scores across multiple domains in continuous interaction

and it's a proof of concept we demonstrate good performance on common benchmark test datasets

and possibly simulate dialogue corpus

and some interesting future direction can't include stop is the use of sophisticated machine learning

models like not gbd keyword a random for a restorative neural networks

i'm pretty sure you can get the problem as much higher than on the problem

was that is shown here by just using this techniques for task a state update

and i can also i also interest the in extending this framework

for weakly supervised learning to be used to cost

and

or so i'm interest the and to see some potential impact on other dialogue system

components i provide a more comprehensive state representation likely on task of images

okay

i have about one minute

that

basically

task of revising a was like this

given this input

wanna go to high work or in

then

there are let's say there are two domain and they generate two different interpretations

like the d o the opera to and the bottom two

and we identify all possible for a casket frame for each dialogue act item

and we have a special

task frame quote in a tape

to accommodate all unnecessary information and then to the a task is to get the

right assignment from a dialect item to do you know why task of brains

so the parsing algorithm

well start with the some configuration that somehow valid

then it moves assignment one at a time

and i equal

to the reason and without you know what word a scores about you can actually

it has to do you know how proper configuration with a high score

i think this is for my presentation thank you for i

okay

right sure

right

actually it's so done through some feature functions because

that's a

as you extending the task of the n is actually we keep the timestamp

and it just times that so the feature function to match the context uses timestamp

and so one of features

so it

as their context you know gets all way for their father from the current times

that then you know you will have a last chance to fetch the information

and

so it's

okay

okay so actually it involves another notion i guess is i you know long-term memory

and then this tomorrow about short term interaction management

and then another my resource you marcel a long term memory management and so you

are gonna have a more you know a perpetual memory there

and it'll be it was so use the four

you know some features to disambiguate board some to boost the some evidence that's kind

of thing

so you

need them more memory structure

other than just you know how short-term dynamic structure

random of course

i missed that the initial part of the question so

can you can

to everyone you have multiple in turn-holding where do you start

probably a run all the dialogue so it it's a more about policy so given

you know a lot of what ambiguity or a higher you know how entropy in

your state representation

actually you can train you know some how smart policy where there

it's a better to ask confirmation at this time or users to assume something were

you try to retrieve some

you know users happy from long term memory

so all these are you know determine the by your policy so it's a kinda

you know another module

that takes care of such kind of things

small question

i think he asking that

i'm repeating question

i think that the question that

i did classification

for each a constraint for complex calls it the right

okay so

what he ask is that then whether it can use us to classification to predict

the how users

input if the right

intention different classifier

to break the based is okay so i think

is actually are shown in this or frame are but i didn't actually you know

use that the they're kind of classifiers only actually that's a necessary part i guess

to for scalability

because you know if we used as the consider all possible interpretation

for all possible slus or don't have the components

then the complexity will explore explode

so i just a you know to some filtering

and so preprocessing step

and do this the parsing

to you know construct

parse they can contain more people

a task is it at one utterance but you need to classification it it's a

little bit difficult to happen multiple has case

and so you are you answer

structure

okay so terrible

absolutely there okay i let me repeat the question so are that i have to

use the context to integrate got a user's utterances

actually i'm using just the corpus the so i didn't have to use the context

the two you know understand that the user utterance

but there have been a lot of research is

you to try to use the context to interpret the intentions so there's no reason

on that to use it

right okay something again

Task Lineages: Dialog State Tracking for Flexible Interaction

Oral Session 1: Dialogue state tracking & Spoken language understanding

Sungjin Lee and Amanda Stent