i i'm constantly and
or with that
and
i'm into it
and i'm willing to reduce the news that does go framework for a that it
was that attracting it is called task when it is
to handle flexible interaction
though this is my online and after
giving a brief introduction i'm i will get to some challenges we one addressed in
the store
and the approaches to be used to solve the problems
and i'm gonna show some experimental results on benchmark test there's that and then our
conclude my talk and if the time permits
then i will
peeper to some technical details one task for in parsing
so we have seen a lot of one recent advances in statistical devastated tracking
and
of the next thing is that many algorithms have been shown
to be active a divorce the result from search task is
bottom so we got a really a robust to systems to some noises errors like
asr errors in a style varies
but they are usually limited just some you know session based simple task simple a
corpus the dialogues
given the
servers in the interest in conversation agents and the enormous use cases
it seems like necessary to extended the previous
but those to handle multiple task is with a complex calls
in just like long interactions task
so let me talk about the set of challenges to be want to address your
and our approach is to solve them
and the first the challenge you be complex schools
so what i mean by complete score is the
any combination of
positive and negative how constraints
so in
restaurant
finding domain
you can say italian or french but not tie such kind of thing and the
approach we take is a very straightforward
we just to do
constant a level
belief tracking rather than a slot level tracking
though a second challenge is to handle complex input from distributed et al use
so
the complex input includes not only complex calls about multiple task is that the same
time
so we introduce some new
a problem
were concept of called task frame parsing to address this challenge
though
to scale the conversational agent platform or we usually adopt a distributed architecture so in
this architecture we have on several
numerous actually a service providers or without their own slu and sort of as a
components and we also provide you know on a rate of a common components like
slus and task is
and then down the when user input outcomes in it would especially to all these
components and its component to will return
on their own interpretation so it's the
platform to where is duck a possibly completing
a semantic interpretations to come of it though a coherent
semantic interpretation
so for example when user says connections from soho to me the town and on
p m
italian restaurant near times square and have friendly coffee shop
then
the trends it a domain will detect just so as a form
we did town it's to one p m as
time and so on
it goes the similarly for local domain that's as well
so
at you can see there are completing slots
and
what we wanna get is
a list of coherent
task
a frame parser is like this to it could samples
so the first parse
identified a first
three spans as the trains it task frame and it also has to more a
local task is
and it's a probably right so it gets a high score like to point eight
and the second one is a less likely and it one you have to local
ask frames and you get so variable scored like to point two
so we call this process as task frame parsing
so we use
beam search using mcmc with the simulated annealing
umbilical many algorithms to be can use
to do this you press we chose to use this method because it allows us
to integrate hard constraints with the power probabilistic reasoning very easily
so i'll one you get sample of hard constraints will be mutually exclusive in yes
recently is
to
either way act items with the same span cannot be on used at the same
time that's kind of constraints we one mean
and to do probabilistic reasoning
we use the normalized global what would be a model like is
so we can get the confidence score at the end of the thing
and there are numerous a features are you can use so easily port our paper
for more details
and thus
third challenge is about flexible
course task management so to do this we also introduced a new concept of cold
task greenish
so yes i'm suppose this a situation
so a user starts with this conversation with a two task select a weather information
restaurant or finding
and then she continues this composition with the transportation task and ticket booking without compute
leading the first to one
and they she laughed and you know ten does some meeting and came back
and try to reach assume that a restaurant or related to task
and then a sheep
now finishes the restaurant booking and they moved to the transportation and ticket booking again
and complete them
so if we you use a traditional stack based on
task management then you might have on several problems
first you might not be able to do some you know or multiple passwords at
the same time
usually and the other problem is information loss
so when you can't to the turn three if the system about the first restaurant
of finding is complete then the relevant information by removing gone so you just you
know we started a restaurant a task at turn three again
on the contrary if the a system that it it's a cup in company
then on the system can resume the rest want to related task
without relevant information the past but when you get to do a time for actually
the system should you know most of popped up to a transportation and to get
looking to resume this a restaurant booking task
so we it the relevant information for task of okay the task at time for
will be gone
so
anyway you might suffer from information allows
to handle this problem
we come up with the task of the image kinds
and there is no restriction on the number of phone a task of states in
the task may need
so you can from multiple cats goes and so many you want
and also you a of the task between each grows at each turn
so i mean whenever you we have a new turn you just add a new
task as days
and retrieve relevant information from that have stays in the past
so for a transportation and ticket booking
can get some information from rest of
want to finding if these are task as
are you know are related in our higher up to nine
and turned really
you can you know
resume their restaurant finding
even if the long time
and you can retrieve the relevant information from the task of state and first turn
out to one or without any problem
and
you know that you can similarly for the current for
so basically we don't removal or abandon any information in the past
and you can always retrieve relevant information from the past
and them to you know current task as there is to give you the ideal
from the current focus
and how do we do the context of matching
or we construct context the stats
it is very simple given this has got lineage
we set a time window
and then you construct a beep is that
to collected or
the latest belief estimates
p for the time window
and then you construct
motion dataset
and user act a set
by collecting all question acts
and the
task of frame parts is
in the time window down to
and then
you have an and the context the stats and based on the current
much an act and the current
task of frame parse you try to select which information you want to use
to update the current apply
so it's not just you know a bunch of a binary classifications
so we use a lot just a regression
and there are a bunch of other features for this task so you can refer
to my a very well for them
and the forward challenge
it's about a casket disambiguation
there
always could be some and but in task detection
to sort of this
this problem we
use
on n-best list of the task of images
and this a on the user's that
i wanna put high then this could i don't interpreted as restaurant to finding or
travel
we have them to pass clean it is here
and then when the user clicked by a four
a real intention on like i saying i meant i wanna travel to try high
then down on the you know second
has continues will get higher score because it's a more coherent
in this way we many cities
the task or ambiguity
the overall tyler was that a tracking procedure
older consisted of three steps first we do task frame parsing still given a set
of one possible to completing semantic frames from distributed slus
we generate a coherent task a task of frame parses
and then given the task frame parses we try to retrieve a relevant information from
the past has "'cause" states in the lineage
and it happens for each of the image
and we use this retrieving information and the input information
to date at the task stays at this turn
the how do we do
the task update
actually
this is one of the most a trivial task is in this framework
because we can pick up any
meant but was developed a so far for dialog state tracking because thus they ask
a state update is not in part of a dialog state tracking for conventional setting
so
you can you know enjoy a wide range of different algorithms like a discriminative map
of the and channel document or you know these troubled done by you know tuning
the data
and done you can control well between the
i believe it has to make use of belief estimates and the wall
observations
so
to make the analysis simply a simple
five
we actually just about that
the a generative rulebased the manifold of from to go at all
and we use this algorithm for belief tracking for each slot value pair
and
these rules are just you know aiding the current of be the of
well by you know i agree creating a negative and positive confidence scores
so let's move on to evaluation
so we used
dstc two on to evaluate our algorithm and its based on the restaurant of finding
domain
and one interesting characteristic of this day that's that is
relatively frequent user's goal changes
so if our method working well in this
a test dataset then the context venture
sure of and then
in any information for all the gold
so the let's look at the result
actually our mentality
show the best performance so far one accuracy
and
this actually can tell you
that the importance of better competition in tyler was state tracking problem
and
we got this performance to be down using any in seem it would like you
know
a system combination and word neural networks where
decision trees it's just a rule based update for that's a ask a state update
and we want to evaluate our system on more complex
interactions
but unfortunately there is no i sure able on data set out there so we
i had at assimilate some datasets
and we
to dstc three data
as our a base line base corpus
and because the that
contains multiple cats case
like a restaurant to finding copy shop finding end up finding
so we simulated three datasets
with a deeper and a representative settings forced to one
and does not have
any other user goals a complex social goals and no multiple task is so we
just to use that the s this
dstc three it is itself and for a second setting we a have a complex
usual to simulated and no multiple has goes well and down for the re
the third dataset we have a both complex user goals and multiple task is this
numbers are all rips task for this corpora
so was look at the which alt
if you look at the joint goal accuracy
we actually compare our system with the baseline system in dstc
and if we look at the or joint goal accuracy then the are almost all
from a baseline system traps a very sharply from zero point five seven two zero
point three one and zero point zero two
well
our system
dropped some words and the lights your point nine
point five nine two zero point four white and zero point three object that
so keeping the fact that
the task gets exponentially harder with a complete with respect to the complexity
this is gentle reduction is a big when
and we for their evaluate our system we don't work on results
the t l t st
all p uses oracle parses and t l
yes t or uses both
oracle parses and were local
a context patches
and you can see the improved results by using oracle information
so this indicates that there's a some room for future improvement
then we conclude my talk
we have proposed
new statistical dialog state tracking a framework called a task greenish to orchestrate multiple task
is treated the complex scores across multiple domains in continuous interaction
and it's a proof of concept we demonstrate good performance on common benchmark test datasets
and possibly simulate dialogue corpus
and some interesting future direction can't include stop is the use of sophisticated machine learning
models like not gbd keyword a random for a restorative neural networks
i'm pretty sure you can get the problem as much higher than on the problem
was that is shown here by just using this techniques for task a state update
and i can also i also interest the in extending this framework
for weakly supervised learning to be used to cost
and
or so i'm interest the and to see some potential impact on other dialogue system
components i provide a more comprehensive state representation likely on task of images
okay
i have about one minute
that
so
basically
task of revising a was like this
given this input
wanna go to high work or in
then
there are let's say there are two domain and they generate two different interpretations
like the d o the opera to and the bottom two
and we identify all possible for a casket frame for each dialogue act item
and we have a special
task frame quote in a tape
to accommodate all unnecessary information and then to the a task is to get the
right assignment from a dialect item to do you know why task of brains
so the parsing algorithm
well start with the some configuration that somehow valid
then it moves assignment one at a time
and i equal
to the reason and without you know what word a scores about you can actually
it has to do you know how proper configuration with a high score
i think this is for my presentation thank you for i
okay
right sure
right
actually it's so done through some feature functions because
that's a
as you extending the task of the n is actually we keep the timestamp
and it just times that so the feature function to match the context uses timestamp
and so one of features
so it
as their context you know gets all way for their father from the current times
that then you know you will have a last chance to fetch the information
and
so it's
okay
okay so actually it involves another notion i guess is i you know long-term memory
and then this tomorrow about short term interaction management
and then another my resource you marcel a long term memory management and so you
are gonna have a more you know a perpetual memory there
and it'll be it was so use the four
you know some features to disambiguate board some to boost the some evidence that's kind
of thing
so you
need them more memory structure
other than just you know how short-term dynamic structure
random of course
i missed that the initial part of the question so
can you can
to everyone you have multiple in turn-holding where do you start
probably a run all the dialogue so it it's a more about policy so given
you know a lot of what ambiguity or a higher you know how entropy in
your state representation
actually you can train you know some how smart policy where there
it's a better to ask confirmation at this time or users to assume something were
you try to retrieve some
you know users happy from long term memory
so all these are you know determine the by your policy so it's a kinda
you know another module
that takes care of such kind of things
small question
i think he asking that
i'm repeating question
i think that the question that
i did classification
for each a constraint for complex calls it the right
okay so
what he ask is that then whether it can use us to classification to predict
the how users
input if the right
intention different classifier
to break the based is okay so i think
is actually are shown in this or frame are but i didn't actually you know
use that the they're kind of classifiers only actually that's a necessary part i guess
to for scalability
because you know if we used as the consider all possible interpretation
for all possible slus or don't have the components
then the complexity will explore explode
so i just a you know to some filtering
and so preprocessing step
and do this the parsing
to you know construct
parse they can contain more people
a task is it at one utterance but you need to classification it it's a
little bit difficult to happen multiple has case
and so you are you answer
structure
okay so terrible
absolutely there okay i let me repeat the question so are that i have to
use the context to integrate got a user's utterances
actually i'm using just the corpus the so i didn't have to use the context
the two you know understand that the user utterance
but there have been a lot of research is
you to try to use the context to interpret the intentions so there's no reason
on that to use it
right okay something again