and the hybrid hamming layer i it from a liberal and i'm here to present
a data set i collected and annotated with my colleagues at a little bit
highness is actually here with me if you want to talk to him
so
there is a the motivation behind this dataset is that there is indeed
for dialogue systems to be able to handle complex interactions
one motivation comes from studies and e commerce and there is a paper by month
later in twenty eleven
where they show that users that come to an e commerce website problem sometimes with
a very well defined cool
in mind but sometimes they just come to shop around or they don't really know
what they want just one to look for options
there is also
sorry some interest in the dialogue community and most notably there was a paper last
or it's a dialogue by finding the and i mean distance
i think it was based that's papers
last year
it's that has any idea that the state tracking for flexible interaction
and is this in this paper they try to move a beyond the traditional
linear slot filling paradigm and try to handle more complex
conversations where you have different user goals and possibly across domains
so we decided are so for this work actually didn't have a proper dataset to
test their method because they there wasn't anything available
so the
modified an existing data set and so we decided to actually try to collect data
and promote this kind of work for future dialogue systems
so we collected one thousand two hundred and sixty nine human-human interactions and the travel
domain
we also propose a new time frame tracking and the dataset is fully annotated and
publicly available at this url
so when i talk about linear slot filling what i mean it's something like this
is actually here dialogue from the dataset
and here and so the user basically gives you some constraints you want to go
somewhere from columbus it doesn't really know where
then the wizard is the agent two plays the role of the dialogue system
he proposes two options vancouver draw no then the user gives a bit more information
about his constraints
and then at the end of day and then the user asks
for information about the offers from the wizard
and that the and the user box the
one of the proposed trips
so here the user will never really changes during the dialogue it's very just drilling
down some options
and by nonlinear slot filling i mean something like this dialogue which is also from
our data is that it was able to onto to support entirely on the slides
are just cut the interesting part
so here this is a representation of the different options that the user
see the mouse you can okay
so on the left
the this is a representation of the different options and goals that the user might
have during the dialogue
so by nonlinear slot filling what i mean is that at the beginning the user
is talking about or in some going to toronto
and then and he explores a options and i think in green
but at the end of the dialog the actually decides to go back to that
you're on a trip and then
so in this case
and the user goal changes during the dialogue but the user also goes from one
able to the other and if we want to be able to actually broke the
drawing a package for this trees are we need to remember it
so let's that of into the details of the datasets freeze the domain so it's
a travel domain we had trouble packages with a round trip flight and a hotel
this is an example of a package so you had you hold our
the flights with their time and the dates
and for the hotel we had are the category which is the number of stars
we also have guessed readings on a scale of and
of one to ten and amenities and vicinity so
on the rows
those are the first one is
a bit too small to read
but it vicinity so vicinity of the hotel you have something like shopping malls museums
but is universities airports et cetera so that
the distribution
and on the o
a button graph we had the number of amenities burr hotels so the amenities could
be breakfast wifi
whether the what has a spot those kind of things
and so that for most hotels we have more than one and automatically so that
the users
had something some ground
some matter to compare to what else one against each other
and we had two hundred and sixty eight hotels and one o nine cities in
total
so for this dataset we hired
twelve participants to collect the entire data
are over twenty days don't our data collection last
the twenty day i'll for of the participants
it entire data collection and the other ones where hired for just one week
and each dialogue was performed ugly a chat on slack
so we had about that was a pairing up to user is
and then they can they were able to chat so when the user what spare
to a wizard you would get a task
and we generated those that is based on templates like this one
so are basically we tell the user his goal
and to generate those are tasks from the templates we just replace the placeholders for
the different entities with values that we randomly true from the database
and
two very the task
we actually
word error probability for each template
so for this template would say
and has a probability of additive
point five to succeed
so that means that when we actually wary the database with the entities
well fifty
present of the time it will every turn results and fifty percent of the time
it want to return results
and when it won't return results we would give to the user we would either
tell the user to close the dialogue
or we would give him some alternative like if nothing much easier constraint then tried
increasing your budget by twelve hundred
dollars
so as i said we only had twelve participants and we collected a bit more
than a thousand dialogues
so to keep it interesting for them
we tried to tell them to play roles and try to very the way they
speak to the to the wizard and to anchorage just a bit more we also
growed sound fine
templates like this one so that was at the time when pocket mango was very
popular so we told them to pretend that there are pokemon hunter and they're really
wanna go to the city because there is a very rare pokemon there and that
they should find a good package to do that
so
to keep it interesting we are created such templates and we then kind of
throughout the day data collection so that they would have different tasks and they did
they would they would stay engaged in the data collection
we also gave some instructions to the user to make sure that we collected dialogues
that we could use so we told them to not use too much and comments
buying but also to use some so that you know what it's data bit realistic
so we told them to make personally the lectures and
and
we also told them to feel free to and the conversation at any time because
we wanted them to feel like they're real users
and for that we also created some templates that would
and courage to select one of the templates words
you're a pop star you're an absolute geneva and you want accept anything under five
stars
so sometimes you know there would be we act like a different just close the
dialogue and leave so that was interesting for us to have different cases the
successful dialogues in there are lots where the user would just three
we also told them to try to spell things directly to keep not too complicated
and we told them to
try to determine what they can get for their money so that they would really
exploring the options compare the hotels and
try to figure out what's in the database
so on the wizard side so the agent
playing the role of the dialogue system at the beginning of each dialogue they get
a link to search interface that look like that
so on the left
you have although searchable fields and on the right you have the results
and for each search the wizard will always get up to ten results so from
zero to ten
and you can also see
the little tab on top
so basically what we did is that
every time the user would change i've been strange so it might so here it's
for which cd baltimore
if the user would say then okay what about to run all then we create
this search and you have so that if the user wants to go back to
baltimore
the wizard can do it easily and wouldn't have to repeat the search over again
and we also gave instructions to see whether it
those where whites
critical for us to be able to have a dataset where we can actually try
to imitate the wizard behaviour
so we told them to be polite and not jump
and on the role played by the user
claim that a mistake
and this the start point also relates to that we told them your knowledge of
the world is only a limited by the database because we don't want the wizard
to start talking about pokemon
or things that we can't we don't wanna dialogue system to do so we just
pull them to
you know that the user is gonna play a role in be kind of funny
but try to just
talk like a dialogue system basically
i we also tell them to told them to try to spell things correctly for
nlg
and now the second point we told them to very the way a cancer
the user and we told them that sometimes
they can try to say something that is a bit impromptu so imagine if you're
having a dialogue and then the middle of it the wizard with say hello
doesn't make sense
and we did that because we wanted to have so
we have a lot of experience in training dialogue systems with reinforcement learning and the
problem with that is that if you only have
positive examples and you don't know
what a mistake looks like so something that you shouldn't do at some point of
the dialogue it's it makes it a bit hard
and as a way to
measure how
how that
was there are in the in the dataset we ask
the user to read the dialogue at the end of each dialogue
and we told them to base the rating only on the wizard behaviour so if
they didn't get any results because there wasn't any result in the database
but the wizard was helpful and we told them to give a maximum score
so we had suppose on the scale of one to five and those are available
as the dataset
and as we can say as we can see there are a few most of
them have
the maximal score of five but somehow
lower scores because the wizard was not completely operators and the actions that were not
very helpful
then other statistics of the corpus this is the proportion of dialogue
through dialogue length so number of turns in a dialogue as you can see
the
for of the dataset is around
fifteen turns bird the averages that fifty turns per dialogue so even though we have
only one thousand three hundred sixty nine dialogues we have about twenty thousand turns in
total
a then this is the number of dialogue act
this is the distribution of dialogue act types in the dataset so we had about
twenty dialogue act types
and the number of dialogue acts per turn so during one turn because it's human
dialogues and
there was more than one dialogue act per turn very often as you can see
about three percent of the time
there is more than one dialogue act type opportunity
so
that is that isn't in frames so once a frame but we
so and i said what we really want to do is
remember everything that the user has
tool this during the dialogue so that we can
get back to one option if the user decides to put that option in the
n
so we took inspiration from state tracking and the definition of a state and a
dialog state tracking challenge in this challenge they define the state by the user constraints
and at the user requests so everything that the user's task if he asks for
the price or for the
the name of the what out that that's a request
and we also added things that we
saw in the dataset and that we needed
one is user binary questions so those are questions where you have
so the user is
a request is like the user is asking for price
a binary question is when the user asks is the price
two thousand dollars for instance so that's the yes no answer
and we also had comparison request
where the user as
to compare something between two or tells you can ask if there is what do
a cheaper than hotel be for instance
and so those are examples of frames and the how their related so those two
hotels are children of the
the bowl
frame
as you can see
and something you in our dataset is that
frames can be created by users but also by whether it's so every time the
wizard makes a proposition for hotel we create a frame because we want to remember
it in case the user wants to book this hotel
so we had a we
made up a few rules for frame creation after analysing the dataset and seeing what
makes sense
and for frame creation
we create a new frame every time the user changes a value so here at
the beginning the user is to go to atlantis so that's one frame
and then on these are utterance the user asked to go to never land and
sold or destination cities change the we create an you separate frame with this value
for the destination city
actually changes a more entities here but we need to just have one tend to
change to creating you frame
and so that's one type of frame creation but we also create a new frame
one the wizard makes a proposition for hotel and we put in this frame all
the properties of the hotel
so that gives you are frequencies of those behaviours
in the dataset
as for changing frames
as you can see it's all user controls
because we want
the wizard to really be an assistant and
just a dialogue system to really be an assistant and propose things but then the
user controls what we're talking about the user controls the topic and the
in the dialogue so the user or only has the power to change the frame
that were talking about
and so that happens
which in you frame when the user proposes a new values a leafy changes the
destination city then we automatically switch to that new frame
if the user decides to consider an option a hotel and ask more information about
those this option then we also switch to that option is a frame corresponding to
that option
and we can also switch to an earlier frame if the user says for instance
and the dialogue that actually earlier okay let's go back to toronto package then we
switch to the frame corresponding to the toronto package
we also have annotations for dialogue acts and slots
so the dialogue acts
we have general purpose function still kind of typical dialogue act inform offer compare
we also have dialogue act specific for frame tracking with the which is which frame
that in the case when the user switches to are a frame
then a for the slots we have all the fields in the database we also
have specific ask the slots describing specific aspects of the dialogue
while one is intense so the intent of the user is to book for instance
action is their counterparts on the on the wizard side so the wizard book a
hotel we annotated as action equal book
and count is when the user gives the number of hotels in the database corresponding
to the user constraints are sometimes the wizard will they i have stream or tell
them about a more since the we would
we would annotated with count peoples three
and then we have specific
slot-types
to report
the creation and a modification of that of a frames
so we actually
automatically annotated the frames and the content in the under frames based on those slots
so those slots are it for each new frame we give a to a new
idea
reference so every time the user preferences the past frame
and read and write
so i'm gonna go faster here
so that's an example of how we used read and write
for read it's
basically it so we sorry wherein frame five here the "'cause" the active frame is
frame five
but the wizard five talks about
values that were provided in frame for so reread those values from frame for and
we would put them in figure five
and for right it's on the last utterance
duh wizard provides new information
about a frame that we already talked about before so we write this information and
the preview in frame for
even though we're the currently active frame is
the frame number six a basis it's a bit
complicated like that but
it's basically a way to track of all the values and then
dynamically
populate the content of the frames
so i statistics are some statistics of frame changes in the dataset
the average number of frame changes
created per dialogue is six point seven
and the average number of frame switches is a three point
fifty eight and we get a we have a lot of variability between the daleks
as you can see here
so we observe do the behaviour that we wanted to observe
we also trying to see so we had five experts annotating the dataset and we
evaluating how well they agreed on the annotation
and we got a reasonable agreements
so we propose baselines with for this dataset one is an nlu baseline that was
choose to you kind of how hard piano your task was
we adapted model from arnold and colleagues published in twenty sixteen
and we predict dialogue act type and slot
and slot values and we get about eighty percent accuracy so
it's all already pretty good but there is room for improvement
so for frame tracking ripple for the task
so if you want to create a dialogue system that's gonna be able to
g
in memory all the frames talked about during the dialogue you'll have to do it
to create the frames dynamically as throughout the dialogue but we decided to take the
first step
of having a simple task
so if you know all the frames created so far you have the new user
utterance
and the nlu annotation for this user utterance so you know the dialogue acts in
the slot types
and the task consists of for each
slot
find the frame that it references so here for instance
that's efficiency nipples mine reference to frame number one
budget a post you cheaper actually makes was created new frame
and flexibly view of the steeple true refers to the current frame
are we proposed a rule based baseline that was very simple and that we just
we just observed some behaviour in the and the dataset and so we propose a
very simple baseline so basically if the user can forms a new value we create
a new frame
we switch to a previous frame if we find the mouse is that the user
is talking about in one of the previous frame
and basically
very simple rules are those of some for
switching to frames
and of so the performance was bad because rules are not enough to do this
task
we kind of breaking down based on
different cases and the dataset so it
for frame switching
if the user provides a slot so it's as they are let's go back to
toronto package
then we get about forty five percent performance
if the user replies to a previous frame but without specifying a specific slot
then it's harder because we don't it's harder to understand what the users talking about
after a wizard after the wizard proposes a hotel so that after an offer
most of time the user will ask for more information about this hotel so
very often we would switch to that frame so what that's easier also to predict
and it's easier than one there is no offers so we get a lower performance
there
and for frame creation we can predict that no frame is greeted but it's harder
to predict when the frame is created
and as followup work we
okay so we had a paper was the better model that
outperform the baseline by a lot
we presented it workshop at a c l very recently
and so to conclude this is the new human dataset to study complex state tracking
we have turn level annotation of dialogue act slots and phrase we also propose a
new task which is frame tracking and some baseline
thanks for your attention
the first minutes for questions
fixed would talk could utilize the language variability
but it's a few but anyway
over one thousand dialogues actually the user actually filled or increasing the
so by just eyeballing we didn't really
compute anything but by just looking at the dialogue they really playing the really get
into a they play the roles and they just change their language sometimes it goes
from very polite to more
like young speaking it there's a lot of variability thanks to
possible to combinations so it is to monitor from you to see would be
to generate will fall
so it's of combinations able to do something over it sorry if
but only from you
to work well
so that's
that's something we decided not to deal with the we actually asked to always talk
about one thing at a time
but with the true for example the system
is it should have seen from small words
huh
we would have would have all right
to thank you for interesting to before i just quickly you and the u
three point but the among the you can use you pixels detailed results tools to
promote collagen dreams
so we record all those urges and that the end result of the such as
we
that's an idea that we had we have we haven't really try to see if
it's really reliable but
because
everything was not searchable database as well so that's probably had and we're actually
that something when it's a we're collecting more dialogue right now to make it bigger
and now we're gonna make all the field in the database searchable
so that we can record of those searches and then do something like that
just one more question
all clusters let's take the speaker again