she you not good afternoon
i am casey kennington
currently boise state university but this is work that i did
well i was to build a full university with along with that was long and
and i'm gonna give my two cents on
a continuation i guess on yesterday's discussion on personal assistants
"'cause" we're gonna tell you a little bit about a personal assistant of that we've
been working on
and if you don't know what a personal assistant is your in the wrong conference
you've heard of them you've use them and they're great i mean they their useful
not we dialogue people aren't the only ones using and lay people are using
quite often quite regularly
but
when these laypeople use these
systems
these dialogue systems essentially these personal assistants they do weird things with them and they
complain about mary all things
and so today want to talk about a few of those things and maybe make
a approach addressing a couple of them
one thing is that they kind of have a difficulty signalling affordances someone shorter but
yesterday and things you can do with your e
why doesn't need a book
that you need to disney to be signal somehow and it shows be a lot
of these sure speech recognition output and sometimes it's great perfect
but you know well
that speech recognition even if it is perfect does not you know understand
that something else that needs to happen here
they don't know that understood until it finally does something comes back and the results
are
maybe what they wanted maybe not
another thing is the user has to expressed
express their intended one goal
that you have to say the whole thing wait for to get back to them
and then they can continue wanting
sort of like this again with the system
looking into that a little bit more if you if you consider a
personal assistant on a continuum like there's some one extreme you have these
person or systems that i don't even really want to talk to you
they
want to its apparently easier to predict your life then it is to predict
what you're trying to say and so groove allows trying to do this in this
is useful
on the other side of the continuum you have the full turn
personal assistant that is expecting you to
given entire intent and then it
that was all that's understanding and you do some kind of response maybe there's something
in the middle that would be a little bit nicer
sub-turn little bit a little bit to the left ear so
i say call mom and there's some sort of feedback that it understood be a
and i know that understood me a nice to amend it and then i can
say on speaker phone and okay good
and we can move this may be given a little bit more to the left
and say something call a your mom
one speaker phone
it's
exactly that's what i meant to say
so there's a little bit production it's not trying to predict your entire life it's
allowing it to give at least part of the intent but that's doing some prediction
error so we can maybe make our dialogue systems fit some runs continuum that's useful
for any particular user
we want to look at this a little bit
really quick related work some inspiration joyce tries work on misalignment manners signalling understanding and
others work
on backchannels stuff on arts and
work on goodies which we kind of are gonna do here and then of course
lose project
we would take inspiration from all of these
for some reason they're not none of these people here
but we're gonna do something using all this all of these as a sort of
inspiration so we're gonna signal ongoing understanding
you can agree
assuming here of course that people have a way to display agree so this might
not work on something like the amazon
echo but most people have other phones with them and can use the personal assistant
with the display
and with it with this really backchannels don't overlap speech so for talking and its
updating and showing them its understanding then it's not gonna have any problems importantly works
incrementally
that is word for word are explained that the moment a little bit more and
it works with
minimal or no training data
the rest the talk is as follows i'm gonna explain our system
and the components of it and then
see if that system is worth its salt
well first the system
at first blush looks like any other dialogue system you've ever seen their speech there's
nlu errors dialogue management there's some way to convey the it
i'm response to the user
user with technology in but in this case agree
the speech recognition i'm not gonna going too much it's
google asr we have it modularised here nicely to give us incremental
results so word-byword it's coming back to us and we take the those that incremental
output from the asr give it to our nlu
and are not use working in lockstep with that so one takes a word
and we're gonna use the in the simple incremental update model which we introduced in
sect dial and that's in two thousand thirteen
and without getting technical you can look at the paper if you like
equation thing like that you can if you what you get is you don't word
and its going to produce a distribution over slots
and that's can be given to the dm the dm the dialogue manager gonna use
that somehow
with this little provision when someone utters a word
asr gives us a word
that is the same as more similar to
a value that could fill a candidate slot
then that's gonna get more credit and this is how we are able to make
the system work with little or no training data and then build up from there
that's no you're
but the dialog managers taking these
word for word the not use given this these slot
distributions to dialogue management dialogue manager has to do something with that
though
in fact it's making one of four
there are simple decisions one is
i get a slot a look at its confidence value and what why do i
can wait
if it's if the confidence values well just sort of ignore it
in particular so particular value isn't enough to make the slot the one that i
want
or i can select something
is above some confidence threshold than the slot as good let's fill it with this
value
or to others here is we're close to that threshold
but not quite there so let's make a clarification request and somehow display that agree
and then of course they have to be able to confirm that request
i want to point out here that it is here between sort of the nlu
on the dialogue manager
where this and pointing is done we're not doing and pointing with speech recognition that's
just always on
and it's here that where
so they can stop and pause and think and what do something it'll wait for
them to finish so they can do things in instalments so it sort of semantic
driven and pointing
and we can use of and i'll
for this it's sort of rulebased at the moment but we have the provisions are
there now for
reinforcement learning and learning on-line to improve the system as people interact with it
now we do we
the dialogue manager decides which was to be filled and it says gui here's what
the decision i've made please convey this information to the user
and the golay you'll notice right off the bat we aren't
obviously aren't you are designers
but here's the here is that you turn the system on and
this comes up it's in java script so
and it just looks like a right branching tree and really that's all it is
but right here you can already see what the importance as r o we can
do these five things are nice
i don't have to guess i'd have to play with it in figure out what
it knows and what it doesn't know
and so i look at this thing is a well you know i am kinda
hungry and it will go then into the food domain and sort of open up
the treatments a lot
if you if you're hungry then i
you know one where you want you know what you want and where
you're gonna unit
and i can say you know i'm among we first and thai food and at
that point in
go to the top here and
shoulders note and read a question mark for this clarification state did you say tie
in to the and this to me as
into it in that it
is trying to understand me and i have to do is say yes or i
mean time and that would fit
basically feel that slot which
conveyed visually means that it just collapses that are the tree and shows like this
so the here's a here's a frame that is filled
and it shown visually like this
that's our system
recall right
now well we did some experiments to see if that's system it was everything we
hoped it would be and where to put some people in front of it
though
we want to test a couple of things about this system so we're gonna break
it up in the basically for different
different settings
we want to test
we want to see if our incremental system is better than or more useful i
suppose than the traditional one
so we're gonna let them play with that of first and give them a trial
phase here's our system here some tasks to do them and get used to the
interface and then we're gonna
sort of move start on the very right side of the continue one where they're
doing this
traditional
current turn taking full fully intend mentioning
personal assistant
so and points
as usual
kind of like the traditional personal system
so we then we
then move the continuum move on the continuum a little bit to the left and
nouns incremental now we're doing some terms
and you can
do things in instalments
and then we have phase three for removing that
a little bit more to the left on a continuum answering
now it's going to adapt to you a little bit and try to predicts and
fill some these slots for you
or expanded a little bit phase one acted like a standard personal assistant silence and
pointing before they can we would even show and the asr was shown like it
is in your standard personal system
based to is incremental phase so they did phase one for four minutes
and then they began face-to-face to did not display asr is just the query and
it just was always there are showing always updating
and the endpointing as i mentioned was done semantically
s two and determine there was a question and we just asked them you know
what you think about
these different systems so there was a ten questions and we ask some you know
that they prefer the first system the second system either or both
and case three started this was the adaptability adaptive phase
which is basically the same as face to with adaptation and the wayward is that's
very simple way
if base
if they did it task
basically build a slot or frame
and they
did that same thing again it will remember it and start to
ask them just immediately ask a clarification so instead of saying i want this i
want the thai food they would say i'm hungry and then it would say then
it just have to say yes and it was shown slots for them
and then after three times we just filling all the frame entirely for
and also an example of that much for video card movement
and then after face three we had another questionnaire that compared phases two three
so here's that video
so this is in german i'm doing this
so if you speak your mind you apologise from my accent and so anyway so
i'm saying something like this i'm hungry us i want to eat something around here
maybe thai food
and it does a clarifications are to say exactly
and then i repeat this several times to show you the adaptability of this
this isn't something you would do you're not gonna take your personal assistant read be
yourself five times
it's gonna give us a lot
but just to show the functionality of this
stress
are
i
so
it's filter not just one more kiss
we are hungry and now it's also
i feel like
and i don't see that same thing i am hungry
so
and then the last time i said calmly
if someone else
i'm a pretty
pretty easy going to predict yes but this is common
it will use their people want to use these personal assistance data the same thing
over and over again
my brother here's an act my brother everyday twice a day all opens up as
i phone subspace yuri
google voice you traffic
every day
is it just like that and it gets the response he once in people do
this and it could probably just pop up and shown the traffic
where am here
so we got fourteen participants to come and sit down with our system so we
set them data at a table there is a
a screen that show the task that they were to do not spend a moment
and then there is a chat with it was a turn on its side it
shows the gui and the gooey was this was as i showed you and it's
it's javascript so it was in a in a web browser basically a motel what
and then as a keyboard push a button to let them know that they couldn't
one
but to signal about that the task was complete rather so the tasks were like
this there are five possible tasks call reminder
find a restaurant leave a message or find a route between two cities
and that asks questions icons and the task items were randomly chosen randomly chosen task
randomly chose the slot so we want them to convey to the system and then
there is a fifty percent chance later that the task would be repeated
here's an example
they were said they'd be sitting down playing with this the system and then something
like this would pop up on the screen and that thousand or call
peter
and the system with then
due to its magic then show
google really show it's gooey and once they
recognise that understood then they would push a button and a new task pop up
and they were charged with doing so many of these task as possible
because the we wanted to do this
and not just let him play with it because the tasks
help us
collect some objective measures as well if we tell them we want them to do
is many tasks as possible in the four minutes of to have to interact with
each setting of the system then we can learn a little bit more about how
productive they work
so here's the other tasks they would see stuff like this
so we have the twenty most common german names you know how to most published
cities in germany billfold it turns out as among them
and you know everything else part of the so there's quite a few possibilities that
could be said here
but again
we didn't train this at all we just sort of type these and got a
list of stuff and threw it into to the system important that was the end
of it and then worked
but here some results from the questionnaire as we get we can we can conclude
the following based on sums some significance courses that they generally like the gucci
they counterintuitive to use an easy and understandable
and that was our main focus now something goal
the grill optimistic to be taken care of locally and they did this a lot
if a mistake if the if of slot was filled with the wrong thing they
would immediately try to fix it
it didn't always just push a button move on to the next task or
there is a keyword they could say that could we start from the beginning they
generally trying to fix it right there and it was able to do it for
the most the time
and they didn't generally notice that the between face to face three the incremental and
adaptive phase they didn't really know there's
something adapting but for those who did not which was about half of them they
notice that was face three nineveh did get wrong and there's a listing of all
the questions and there's more in the in the results section of the paper on
this because of the
this is what some things we want to highlight from that
so
the objective results we are these tell in interesting story so we just cut we
just kinda that the number of tasks of their able to do in the different
settings
and once they get increments one adaptive variable to do quite a few more tasks
at least they thought the tasks were complete
and here the next the next rows frame accuracies so when all the slots in
the framework the same as the one that we wanted them to convey in the
task that we showed
and the adaptive wanna
does quite well because basis it's part of the time the slots are already field
for them
so it score one for google now
i guess trying to predict your life is actually maybe easier than learning how to
understand language
the other to tell an interesting the more interesting story we get f-score which is
basically maybe the entire frame wasn't correct but the this gives a and idea of
the correctness of the slots of the frame maybe wanted to the slots were correct
one wasn't
and
in this case incremental lower and then look at the time the time is about
the same across all and this tells us that the degree was
intuitive enough that in the in the printed
phase where they are just playing with it in the trial phase
they learn enough about an experience enough that they are just getting used to it
over time
and
what both these rules tell kind of that story
so it helps to be a little bit more productive especially in the adaptive the
adaptive
ending
so they're kinda nice results not the most stellar thing this thing is and you
know going to be in everyone's phone next month
but
like i said we didn't use any training data and it was fairly robust
some discussion here
our incremental personal assistant or ip a different i suppose allow users to make mistakes
easier and sooner allow the users to interpret the state of the system's understanding
and under the adaptive settings it allows users to be more productive you get more
tasks done in this kind of the setting where we're driving them to do tasks
like this
and endpointed based on semantics not based on site
i have a nice thing
future work
i mandarin is the obvious thing we have a system no training data let's interact
with it and it should start to learn and do things better
and the mechanisms of their siam the nlu model we have the dialogue manager we
have all have provisions for this we just need some kind of a supervision signal
which we have if the frames filament get sent on their happy with that
we can give feedback now to say those utterances led to this then that should
that should help the nlu and hope that the dialogue manager work better
same for additive
and better use user modelling and adaptability
like to be improved
also web based authoring loose does this a lot of systems other that do this
right now it's not too bad you can after adjacent file and it'll important there's
tools for that and is actually fairly quick and easy but where they softly might
be nice and then of course we need to scale up to more
larger domains degrees the bottleneck here and it's sort of a two edged sword you
wanna show your stuff but also be able to handle lots and lots of general
things so
that is it thank you
note that focus on
if the
right
right like a like i said we're not ui
experts bring us to if you're right it's gives call i guess on but what
we have right now is sort of a max after their seven or eight knows
that is just sort of dot the thing you have to do there is there
and what gets shown what are the top seven that you will show and if
those are if there's something that's not english on their then you doing something wrong
so there's more user modelling that happens in that regard what get shown on the
gui
better no you would help with that
better user model and help with the
good question
research future stuff
i q
right that i'm not that the future work i mean the way we don't the
provisions are there are also in this you can you can click on of the
clicking doesn't do anything about the idea is kind of like the stuff on larson
to his gui as you can talk about the gui itself and navigate to go
insane know why don't want any of those go down a little bit we start
right there are some exactly
exactly so you can flip through it put stuff and you can add something if
it's not there that would be nice to and i guess but right and system
in as becomes intent that you can use in the future the gui should be
able to help with that
okay
right so it
right so the common question comment was on the semantic endpointing bit of it i
something to look at i don't have
don't have an answer
definitely something considering
right
agree
no not
i want to be really clear on that they're in the trial phase maybe they've
done all the adapting they're done adapting but the system is so rudimentary and simple
and the gui is that it doesn't it doesn't do much you know there's only
a couple of things that it that it does they learn about a very quickly
that's why that time to really change
you know the average time per
for task
so they weren't just
getting used to it over time because they are already used to before they even
started the first phase that's kind of the taken thing i got from the objective
scores
that's something we were concerned with that's why we designed it this way
that was i need i knew someone asks a question i'm glad somebody did exactly
we because of the way we wanted to do the comparisons we wanted to do
this objective comparisons and we wanted to do some objective scores and this was a
debate we had what we ended up doing it this way with the hope that
if we designed the right way
you don't get used to write beginning we will have as facts and the numbers
can show that
i'm glad you ask that