welcome to the special session so it's actually a time is passed so that start
now so this your we propose the special session entitled future directions of dialogue based
on intelligent personal assistant so i'm enrichment cms from carnegie mellon and
and i'd exp opengl is from toshiba research
so next meeting so we have one hundred i'm sorry have one or how our
from now
i sorry seven
yes so today
it's a is a row of a personal assistant so many tech giants
released dialog-based a person assistance and including i'm google in microsoft the many as a
front and of their whole service s so it's a big deal so now dialogue
system research is
is very core crucial for that kind of service s so we are all dialogue
system researchers
our rock stars and you know society in this era i believe so that's why
the actual we propose this process a session to discuss our next a future direction
or vision
so
let's get started
so this is today's agenda so we're gonna quickly introduce introduction to choose its average
the common ground and is and then print discussion
so we have four not able panelists from academia and industry so let us introduce
maybe later and the and then q and i
so this is actually kind of very flexible kind of discussion so we happy to
get your questions and yelling from your from audience so we happy to have your
opinions anytime
so
by the way so what's our prison system now so you know what it is
so this eerie or contain no make a soft or and a spoke and also
that's so
i am is on so they are maybe we can hold it hold them helplessness
distance but different kinds but
assuming they are presents distance so anyway qtd a is that personal assistant is
like this so if the agent that can perform task or services for individual so
basically that's it so in this session we we're gonna define persons systems something like
this just simply a personalised task management class spoken dialogue i capability
so that it is
that's our
i definition of presents distance has a common ground in deception so it's with that's
it
so
so that's look back level of it
the past
so
i think we think so the current personal assistant has two major streams one is
task and management also spoken dialog researchers i think i'm not the right person to
describe advantages tree
but so that side a purse not personal personalised has management's so because a best
if a region one of the region will be apples knowledge a navigator so that
was very actually very usually a vision so
it's actually it's actually exceptions it presently this eerie just fall of that vision
i believe also in two thousand three hundred arbour announced howl a project which is
very big project so gonna discriminately more
so we are so this the knowledge navigator so
some of you most of you may be already knew this but
i don't is actually be video so very additionally a video even now is very
interesting so we don't have time is i
okay i
this research cheating one just checking
a short circuits last year postech second extension was translated this mary his i am
i right this is sufficient
is that in i started lots of twelve o'clock
you need to take there are actually on schedule and
in some you have not sure exactly for stationary amazon rainforest
this leads to those from last semester
no that's not enough
i need to review more recent literature propose a new articles i haven't read journal
articles only
find your financial gilders has probably still there are two for station is a it
dialogue it is also rainfall some sarah
it also covers a classifier absolute reduction in africa
and increasing importance of so
context you like this one but sorry i'm sorry should increase for secure features it's
of these go to the u two that there is a will be they'll the
feminist video so even also serially for example cannot can be that kind of quality
so the me while so the overall
and nouns and their big project called a hall perceptive assistant
that learns
and then it darpa award it's is a light and cmu so is actually the
ball is a common general architecture and it in col and the rater
and cmu more
each instance of the paul architecture
so and then this was paul slash colon architecture so the main focus was learning
so learn from user
and is a column had a lot of capabilities in terms of task management
so example a one of the most dialogue-related
capability was meeting assistant i think so it has
for example dialogue act detection also summarisation and so on
cell
the other kinds of
verincation there
so and n is the rate or the reader is not i think if you
know this piece i correctly so that was that was not so much a dialogue
system but it was a male management and outer scheduling task agents
so and it's eerie this was the most are used
slice of a serially
so it has it's it was very first agent that had
spoken and in could also be the management's so but
as you can see that the conversational dialog management
interface user to be very small capabilities so this is really that's is you know
that's the past and
but i'd a few comments about where we are now the student maybe them start
the conversation
like many people say we are
going through the spring or artificial intelligence
and we are also transitioning doing you error of intelligent person and the systems
there's many factors that have led to these
and
this can mean
i second that maybe we can bill long-term relationships with whether it is one they
are year a lifetime
and that these a sequence could be so sure
so many factors contribute to this but i think there are two main cochlea mostly
the advancements in hardware we have cheap and powerful hardware and we have
a lot of pervasive smart devices whether the as smartphones during bracelets whatever
so this creates a lot of be
and very use beta so we have
powerful machines and b i that we can make a little as you work
the these enables us to tactile problem that were previously
a little harder to soul
in this is evident by the availability of the tools that we have now in
the web
where there open source or no
so tools like nucleus are for speech recognition or both frameworks over
maniac things
how can we combined of these things into ending a separate utterances and
one of the ways we can think about it and of course this is open
to debate
is that we can simply be assistant into the cognitive functions like it has into
the communication channels
so the ses and in
needs to be able to reason about there were about the knowledge about everything need
to be able to communicate with the u
so it needs human computer interaction
and it needs also a lot of interface is equivalent to devices whether that you
like or smartphone
car syllables anything
i either
so maybe you like to mention are assumed in also mentioned earlier
that an agent needs to handle multiple complex tasks maybe sometimes
i don't characteristics are like the video we showed before
we need seam seamless and context aware
understanding and generation we need maybe start can be in
ability to incorporate new knowledge into what agent knows and so one
and there's are sort of challenges like for example communicated we could of these devices
maybe is not very interesting thing the research where is for you know new students
but it's a very big problem when we try it processes
and that it what it wanted to me
was that
the agent needs to be able to you dark evolving relationship with the user like
i mentioned before maybe select from and they are five maybe it's or maybe for
lifetime
you don't need to be able to reason about it would
in this sort of
backing the context over time like i see here so events change things changing the
world and we need to be able to refer to get passed to the future
we present
and these are all points for
discussion so what is the future of the person of a system
we have here
for topics for discussion just exam star our conversation
what is the current state
of a person and assistance in research and in industry
what are some big technical were connected to be absolutely induce all before we can
get the next generation personal assistance
how can we can't big data
in terms of collecting the data what kind of so we just do we need
how do we manage privacy issues card we you know all of these things learned
from data
i do process
stored in a minute of these things and then we
a topic about the future a revision of the future of a version of this
is constructed kind and what it cannot be
and so we have for notable boundaries that should be for
i would you introduction stored in the interest of nine
the first we have where professors even who is a professional for information engineering
i'd information engineering department division of an invasion of errors in
as a long track record of research on spoken dialogue
particularly speech synthesis recognition dialogue management among other things
because she of numbers of words for each contribution
in this going to name a few
of the signal processing society technical achievement award
middle of scientific achievement from is curve
in other things
james flanagan speech audio processing of interest of time or sorta really one many other
words based a pair of words
on proficient german
we have comments are
what is a senior speech scientist accommodation
what you "'cause" work in several voice in april product of common
in this approach is leading a group of researchers currently working on models for i
think several service and also holds and i don't open to position language technology since
you could at carnegie mellon
because a lot of experience in speech recognition and translation in the plastic as workforce
only multimodal technologies and toshiba research
we have a professor jeffrey become who is an associate professor human-computerinteraction institute of carnegie
mellon
with along track record in crowdsourcing
and crowd power words used in the work for natural language applications
prior to joining carnegie mellon you with an assistant professor to begin with different just
there
because it should main action a word someone weights and then it's of cardio to
were either respond which as well as you can see what one of
it might you can use thirty five innovators under thirty five
and we have a urine wrong
with a coupon there in c e o or for three are good at i
a startup company got developed conversational interfaces
you cause many years of natural language processing a spoken dialogue systems research and development
experience
if you guys formally worked for by do the super research period where
and university college london from bridget good use for each d
so next we will ask our panelist introduce themselves you've a little bit of
the fusion talks
after that we will use seed questions and whenever you want you can raise your
hand and
ask different questions
so
i start okay
there we have an from question which of
it should be written down
current state s and bottleneck to i don't really think this very much
say about that
you may disagree but i think broadly speaking
we have enough in place that we know how to build a different bit
of a system
from speech input through understanding response generation
interface to the backend
i mean when people use serial available now we pick up the edge cases and
we laugh at you know series failure to do this and so on
but if you actually focus on what we system can do
and you comparing what they could to five years ago it actually pretty remarkable progress
in my view
it's mostly engineering
and i think that the over the next few years it will mostly be engineering
but makes these systems ever more capable broader coverage a makes a few a stupid
mistake
within each individual sort of subtopic it's clear we can all we do better
and i think they'll be no is no shortage of things we were interested in
research the focused phone but fundamentally my view is that there isn't a huge kind
of missing piece that no one knows how to the until we still that
we call build a virtual personal assistant not one like in the film which i
haven't that ceasing by the way well i started watching it they also able i
think that fell asleep a twenty minute
people to only with i should watch that than actually manage it but we're not
gonna get to that stage any time soon but i think we can be there's
a long way to go and it's mostly engineering
i think one of the big problems that perhaps is kind the community has
is the data problem
there is
i've worked on in spoken dialogue for a long time
mechanical turk
enough silence revolutionised what we could do
because we
once mechanical turk became widely available we can build a system on we can be
employed and we can pay people to use it
and we can get several files and dialogues if you like and i saw something
doing and i will not run time we can we can measure of performance
but apple now process about a hundred million
also very compensation the week
and i'm show google
and i'm and non-face work or handling similar constantly now the kind the machine learning
you can brew on that would that kind of data throughput really very different so
what any and the academic and even contemplate doing
no
i think one of the issues they really have academia and industry if you like
work together
so that the academics can actually focused on the real datasets
where the real information is on the right real data flow though and find ways
to work
and i mean and that leads me onto what is it my view one the
biggest
questions about taking these systems for words and that is the privacy issue it's something
that different companies have a different take on
but and i think the public at the moment a pretty much asleep
on this issue
many people don't know okay what the privacy issue is
issues are you when you sign up to use very for example you scroll through
this stuff no one reads
well almost no one reads you click the bottom n you agree and most people
have no clear what it to the we agree so
apple has their you really rather strict privacy
protocol
and actually it's researches
don't get to see private information and i can't speak for the other companies
but it seems to me that we'd
be these are issues which you need dealing with and i think many remote transparency
the main evenly with some legislation but without transparently in some clear rules that sort
of everyone's working to i think we're going to get come unstuck because there is
a danger that something happens which is not good
and is the backlash and then be systems become
i don't know
the great the people don't normally use them for reasons that the just a full
of the but vector previously but also some very interesting research that you can do
so one of the things that we work on apple is differentiable privacy
basic idea is if you have one a client and you want to collect data
from them what you can tell which you can collect the data are on the
on from by the device and then you are noise to it you at sufficient
nor use that you can't actually any longer identify the purse more in the
any of the private content but when you take that they tell you what you
aggregating with the simulated a noisy data from a hundred million devices you're effectively filter
the noise and get the statistics you looking for
without ever seeing the private information that was done any individuals device
i think this very interesting research is starting to manage along those lines and maybe
it it's a roots to being able to make the kind of information the more
acceptable to academics
if the right channels with doing this which claiming to protect the individual privacy but
actually allow the data to be more widely engaged i don't know
i'm the final thing i just wanted real probably different way to one on the
vision thing
i think really based scale for the companies that the doing this
because
the what first of all i think we will move to a situation where a
what individuals how one person was system and the use that personal assistant for everything
why would you want to switch if this one personal assistant knows everything about you
by no go history or timeline what you like what you don't like what you
did a week ago a year ago and they can of influence would be a
real service that person
now that's going to be as it
thing than anything that you might have with anything that's being before so face but
want to want to k values is always talking to facebook
and so we try to make facebook a sticky as they can and on and
various will be will try to estimate thing but once the virtual personal assistant really
get yelling then you will have here r one and it'll be very difficult very
high for to think about switching to anyone else's
so if people really start using court on their in rows of this time to
look you know larry data from three or too much
obvious tracking which can we are working for conducted a larry but anyway so
however it's a terrific siri cortana alexia
if you if it people's that get a get really attached plus this they won't
leave and then the money will start to flow in due course
so who owns the purse list is gonna be a very big deal in the
future
thank someone
that with
like word
or not
the with some or all
is to actually greater this topic like
and compared to other processes not exactly hours of the
not a one-to-one it really cool you a bit about that rollment feature vectors goal
of money
that is the different for your processes
nevertheless so alexi voice overs it has to go to bring a lexical everywhere so
i
and also doesn't degrade offering
it's a service
so i to integrate this it will
all different sorts of artwork therefore the that you might actually
have
a lexical or also
it's just all over to like your foot or
in your research
and about the visual the us to
right a space where you actually always have access to one of the person looks
as a system
the one of the bottleneck
there i think it is likely that all of what you have a big
number of devices to what it's looked at all
that you want to you interact with the okay so that many different devices
and you where what is the one can do and
to enable is kind of six q
i've to scale
capabilities of
what the system is able to do
so one way to do that "'cause" i think that
that's right up to
enabling us to develop skills looks a little bit
i think what has something similar to that
of things about which allows
enabling
a number all a pattern it
telling the functionality of the system
to make life of the word
see
comparable
and accuracy problem rather
coupled a perspective
as the most important thing right so you have to a great value for the
people having a process that's one thing but actually utterances to something useful that adding
more or
with respect to data
is a
of privacy is really a
topic also so
we have just companion up when you go through this you can take exactly
what alex over
and request
after an utterance or to get them you'd
so that's of a very important is to keep the crust
customer so i don't know
using the other companies but not on
it's below
all of the utterances
having so many different devices actually or planning to support so many different devices us
a kind of representation
for building statistical model is one of
problem
we have to write an article so you don't wanna send every time you have
you like your
data vector notation
for
from a doctor
what i
feasible
but it's redundant and a wasteful
so
wrong
from the perspective of scalable annotation scheme i think
i think that is
what role
all
otherwise
like i think scale actually and we understand what we want to do or what
to
as i can listen to what
the customer service
and actually
no
like to pose if a customer service
that's what color and they tell us what's wrong
and
but isn't
one way a novel way
is quoted a system for a lot of local
people discuss to go
for make records the true
what
what do cool
this respect to dialogue a cue at lexical walls one
there are several
so it is not likely to the pride
i
the i think dialogue this will definitely something that a lexicon who are right now
it's really want like from a machine so i wanna but on the light can
spectral light
if it doesn't really realise
recognise which like you want to come up cool so many of them will have
a for conversation with you but it's really very
task-oriented in that so
a hurting
a longer conversation to collect at the moment is this
what we are not
and
i think that was to question about how to commercialise
systems okay thank you really have to create the trust in the process
at
that a that
the system will work well so if someone tries that it doesn't work
the goal for it again
you lost a personal to we can with some criterion
and
models one for
what is
actually i think it sometimes
doing something small as well then although promise
and
one of the technical bottlenecks that i'm currently c is related to machine learning if
you get more data
you not always converge to the same local minima
which functions
and
people the a get a better experience but some for some people it breaks right
so you have to have a way to make sure that actually not
too much regression happens for large crowds of people or actually sometimes things
in general are great a very with all
and
from and former four point of view the mechanism to make sure
can fix to use it so it can kind of engineering problem or so
to design a system
allows you to
crow to system
in perception and make it more
maintainable
why have slight of
i is a remember
well i'm just bigram a carnegie mellon and i'm in the hci l t i
institutes there and so i'm here because my group of the past couple of years
have been developing crowd powered dialogue systems with all introduce you to
and over the past couple of years we also been working to automate solve kind
of explain what that means
so i where some questions as was mentioned and so you know where are we
well as we all know people are actually using
using these systems now there's a talking to their devices which is pretty exciting right
many of you raise your hands about the legs i talk to my watch and
i'm not always just talking myself i'm
clearly most people i've interacted with have a few specific function that they use those
devices for that they've learned somehow those devices are pretty good i in so that
the kind of this illustrate this point i was that the local library the other
one open actual value your go
and i found this work
and it's work is called
walking just your right a it's a great book
i will tell you all of the things that you can talk to serious about
and the recently reasonably large but which is pretty impressive at the bottom recorder and
may happen shriek utterance but and it update now maybe it about inspect now
what about what i think that that's where we're at we're at the point where
we have system that can reliably do a few function back to you know pair
number of functions and we're teaching people how
to access those functions
and so well we've been trying to do is put out a system crowd power
system that explores what we might be able to do if our systems could
be as robust as the human system and so on our system call chorus with
we developed if the scrap our system in a way to work is that people
talk to it able to hang out so they can talk to it but in
speech recognition or type to it those messages go to a crowd that we were
route on the man's within a
a minute or so we get a group of workers but then also just responses
and another workers about whether they think those responses are good an effect are good
if for them back to
the user and if the user once they can reply and some of the same
workers a menu new workers of joint because the others have left well actually responded
and they can have a dialogue in this fashion we have explored how we might
maintain consistency over time so there's a memory space over here the crowd workers can
and of remote access a learned about the user as they have a conversation with
him or her so maybe i've learned that you are allergic to still versions of
the next time you ask me for a restaurant recommendation i should not recommend a
thief it'll task
chorus all kinda different things and because the people maybe isn't so surprising that horses
pretty good at responding
and they have a travel at it was some idea of how to make spaghetti
with it i i'll kind of crazy things
and you two can ask really things
by going to talking to the crowd artwork
i hope and i encourage you to try to i would say it's perfect right
we're doing a lot of things in the backend to try to corner responses but
i think it'll be
surprising and even though you know you're its people i'd that you'll be surprised at
the red and robustness of responses of made
also so that the right so that might be what you thinking so what you
this is just people talking to people are not ever know my note that the
most obvious thing in the world okay and you mostly right i mean there are
some challenges when we introduce an improper off and where there are only doing the
short after the never done before
and if you work with mechanical turk might be surprised we can have to get
people quickly and of they do more or less to think they're supposed to do
i don't really surprises on the quality of answers good in back but again so
what well what is i one reason why we might care about this is that
by deploying a system that we wish we could automate
we might learn about what we don't know how people actually want to interact with
the system like this we get a lot of inside i think into that by
deploying a system something that you don't necessarily get an artificial scenarios
we don't a data driven improvement right so actually collecting a bunch of data will
release it is that as we go and it's real data from real people asking
you know questions that you know the first question or two there is an estimate
of the curious eventually because they actually wanna
one of the answer i think maybe the more interesting one though is that are
thinking about hybrid workflows that combine a automation with people talking to two examples of
things that we worked on just the taste to give you a sense of you
know words is going or the person the system called guardian as this the crowd
powered a dialogue system
or web apis and so the just this we use the not mostly non-expert crowd
as a mechanical turk workers to convert the api on programmable web that two little
dialogue system which then the crowd helps to run so they do the slot filling
they do transitioning of states and
and formulating of responses
and the what kind of need about that is that we're collecting data at different
levels right so we're actually having
the crowd provide data not apply you gave me a method and i provided a
response which is may be difficult to learn from but at the lower level that
you trying to start running the dialogue system with the crowd
that then be done to try to push this away from just information queries into
actions a is the have the crowd start to work with the user and a
dialogue to create rule that their phone can then run i don't know how many
people have used something called if t values if t
okay so it's basically a way that you can set if the and roles of
things like you know i was
i was late per meeting this morning the crowd in table why were you latex
the twelve the last night in something along with that particular of my car and
so from that they can work with you just a well if the was what
i had access to be a this api is says it's note overnight than at
i put your i'm may alter your alarms with a little bit earlier so you
wake up and
right so turned out that this idea of using people along with your automated system
is not actually new right so most software company is all words are software company
the many startups have efforts in this paper they have them and sometimes a very
exploded so we already have is but in mention
so that is creativity vc this is one of a call centres what have you
know their crowd their workers through a obviously can and you can guarantee more about
confidentiality in other things well there's another example is likely to be your artwork and
things like that i think what's really interesting about this is that
we don't have to just rely on automation right anymore not rely on just automation
right so whether it a call centre like this or it apple engineering you know
more and more templates that can respond to pacific function but it knows that can
support we are actually relying on people or amazon building out this key all features
of that
it's crowd of developers can build more scale intellects that you we are kind of
relying on this and so here i i'm just saying well maybe we can even
push this vector right so we put it out so that this can happen on
the fly with the complete non-expert crowd and so you're dialogue system with a little
bit of human input
a might be able to do whatever you want the first time you ask
f me thank you
okay and
by nist for a long span prosody and i'm wiry order to be might lead
to this panel and have the opportunity to each and exchange ideas with the three
cindy are pioneers the in this area
and i'm speaking on behalf of the new it is about to start out in
china that i can finally the name that really i so please all need to
briefly introduce what about doing a trail
so what was not writing right now is you we were show chinese conversation platform
for creating conversational interfaces like chuckles
so i mean so like sends it special because
i see a lot of the last the technique that i saw reads as
i see a lot of advance the technology has been developed for a unison a
lot of major you feel languages like chinese is a languages that have a higher
complexity is the independence facts
so it's it would be a lot of more challenges in presiding training for example
like a knowledge while of ice like go up all other become these microphone has
put a lot of average downgrading knowlege graphs and they are what she would agree
class on really need to use but in
chinese i mean you in the that knowlege graphs criterion you can be used i
is all is you much less reliable so there's a novel noise so that's you
know like that of
they the bottleneck of difficulty with face in chinese and also for example
right now we're mining the vibe to find different ways also you scenes and the
for example of a little to say were you in chinese we see like or
force all the different
expressions
that's really disasters still
and is also what we are trying to do for chinese it's a week optimize
the technologies that people had developed that all other languages maybe you know so we
construct right and reshape that to adapt which i need for example to avoid a
big noise the knowledge graph we try to
mining the liability in particular domains and try to quite a great you know
a relatively smaller ontologies or smaller knowledge allows for you each the lion the those
resolve the ambiguity you know higher labeled you wouldn't there are also using the homes
using the information
kind of
then i'll try to use tools to solve the
actual ambiguities and also we are going you know customised solutions for
in to have can unite of things all warfare characters in like computer games i
and animations it's gone seems away unique optimizer the system for our oak lines of
python other companies that require these a conversational interfaces
so
we provide like open-domain she chided style systems like i still being by microphones and
we do task oriented dialogue system that while and the way we also provide a
highway of system solutions like similar to the a
which was is then we by the way i mean i are used to measure
in this product because they are leading chinese system seen in this area and the
also because
another component of our company you actually the particle creator of o one of the
system and we week only the power to the otter project i you previous employer
so
the experience we gain from the previous systems is carried all the to our new
company
and the
now we sing you the you know the relation of the system where you we
believe that the
the future we're choices than the in addition to be to be more
capable
it's to be more human like i mean is shorthand have distinguishable personalities and the
you motions
that's
that's it is it is gain from the you know the previous quite a long
so we see in the previous products weeded out
so for example we see the you know
the chitchat acquire we used actually takes significant the proportion of the that you can
require log in the previous system we maybe would and this is even more you
know
more home and then you task oriented acquire is
so i mean this is
lady partially due to the subculture in a user in china
button release is actually the people looking for complaining more than euros actually solving the
task using this were actually use it
and also we think they the or word resistance would be
actually it's the onto only proactive
i shall try to find the right timings will be the there always died off
you know passed to be very simple
and is
so that's what we are comedian to do at real and the where we are
we are working on a to make it happened
so there are the you little to do that we also have a lot of
challenge to solve all examples all to customise a personality of were treated that's a
very difficult task
and
requires a lot of continuing work and the some sometimes a lot of their work
and but i mean they're always
there will always be solutions so we try some different you know technologies to you
rewrites and as a so reshape sentences to make the language style done to be
resting caleb also you know all
you know human can be you will only in this kind of task like all
can be
you know community and i can actually ask the user real users to come to
be able to these curves are
also
sorry
also we
also for asr everybody we can okay yes prior is used sorry
sorry
like join you
i
approaches it is very important because
what we do is
well not fit well i believe you everybody is doing that a very carefully and
is back to the initial and i you addition to use of the user's privacy
we also face like the political really you critical use rules
region though use rules and the this than you know there's nothing we can award
the visitors have to do with the you know where a cow previous
strong classifiers pastoral are used strict you know the dictionary threshold the ins and mightily
to
you know to solve that the rights to so that's but this is also very
important because what we i actually because we are between the chit chat system we
after hasn't chitchat energies had park a lot
because i
we believe that you know causation is oprah size that generates information and you're in
chitchat all maybe multiply motivated by chitchat a piece of information can be generated here
in a conversation and this piece of information come you're we distribute either you are
not a conversation
and maybe got comment either you not a once a long if we can have
to be made and all design this cycle
the system then be self sufficient
so you that way i mean we don't need to actually
we improve the system anymore and it itself we'll gonna you know can be it
is that the knowledge of self but you'll disability is the propriety c becomes a
very crucial part "'cause" if you disclose it information that the user's cows to
your system will not a person that's a really horrible usually okay moving the whole
product company so we are like i don't know i mean i don't i will
it better solution at this point and the
like to discuss dependability and i had ears of all this
so
that's my predictions all and before share you
a star with the more you're that is
sense
okay much so no we realise that we have thirty minutes or so i
maybe we can
i had a we can extend little bit but
so i thank you for introduction so it looks like it sounds like so we
are like professor steve young sad so we are engineering
pace now so then we don't need more modeling or
it's done or i don't know and so given that so maybe each panel is
that pointed out that there is it a privacy issue
two
to make real
more realistic
the service
so
i was asked the point you think what's i maybe so let's go back to
so
so question one and maybe to what's a bottleneck we are facing is
so
on in terms of technology what's our biggest think
i want to ask again
at the
you're gonna step no issue so i right and i don't and privacy issue so
what's our on technical problem we're facing outs
but you think
i think we have a bunch of silos and i'd love to see them
the together right i don't really useful right
well as i so that i think you know we have all the bits none
of them up of five and all of them can be improved we have been
a bit though putting together systems that work
you can if you use modularity and you if we can seamlessly switch
portable click for maxine over there so maxine building a system that cmu which is
actually and integration of different dialogue systems from research groups something around the well
if there are enough of was and the and a bit like the chorus this
them if the users talking to they sit and it appears to have huge coverage
experts in many different areas
the fact that actually modular and multiple different systems
is completely you know that uses a can't see this there oblivious to it obviously
if you start switching topics the way humans can do we within
within the conversation it might fall apart with the w can do by building modular
systems and scaling the
you know i think there's a long way to go
the wrong i'm not aware of a specific thing that we count the
stopping as building these systems but maybe we should ask the old
yes please
well i think that sometimes something i think
i do well the question is it if it's an engineering problem with the research
community the as i said none of the components we have a perfect why i
mean the
and so as we go is done you know we by the dialog state tracking
challenge there's lots of
there's lots of things one could set so to improve slu and so on
well i think the real challenge is actually how we make the data are available
so that academics can actually work on serious datasets
and not
the something frank tori datasets you know of a thousand a few thousand dialogues
in the list of interesting stuff that's not what you know microsoft or apple or
am doesn't have they have datasets that the several of the magnitude bigger
and it would be really great if we could leave bridge the academic community to
actually be of the work on something that is
really very large know whether the dial poles or something like that can generate similarly
obliged a dataset
but we can work on of the research community that be great but i think
that's probably the major challenge
well i think i the answer at all these questions is unless you have a
system which
real users are motivated to you
then it's very difficult to get they they're watching all quantities so the reason that
but google an apple and so on a have so much data of a certain
time i is the people actually are motivated to use a variant google now on
the lexical so
and now one of the things that we don't know maybe this is the
you know i arg awhile view on it is the degree to which beings the
algorithms we develop a generic and the extent to which we can move them from
one application to another without having large amounts of data no they that you know
separate cases only just come out where we're actually
inviting developers to so essentially attached that third party apps to use a series front-end
a likes has a similar ecosystem the way these things will work is in fact
that is certainly for initial deployment if you have a coming which specialises in dialogue
software to interact with patients
setting aside all about the very real
you know i think of issues that may be those applications have ben but the
algorithm the models we build may well be generic enough to bootstrap a reasonable working
system and then the more data you collect about three get
so i think to some extent this will evolve in time and you'll have better
tools so explore some of those issues that you're so
i agree inhibit the topic of this of this section is virtual best lexus
and are they were defined that the beginning i think that you're pretty much on
the edge of o
that's all that's the v i justification
no that's the vision actually don't know the that that's true you know i think
the what the apple doesn't want you to do is to be locked into a
lexus sufficient that when you buy something
you're gonna user likes the for the advice to by its is and the mechanism
so by also but lee i'm short that's what japanese on this thinking about is
well you'll do not a in how he's going to make sure the amazon is
the channel for buying it
that it does work really well on lex all right now
i
well i don't know going work i
and
cell i would like to raise and grouping users here i think are very annotated
and that little change
an elephant you that had training case you grad asking theory things that from when
he missed seeing ninety year old whenever i got addressed like first lock average and
i found in for a year we went round of entity that's you get and
here we extract lexical where satellite that scares the crap data is
every time and that is wonderful they don't you know interactively and in that we
relevant having to our home or our parking or slightly altering the trajectory of bare
it's any cell i'd like to have what energy in a research why their wedding
picnic excited about
to do research on an outright
regression so i always
e value and to that but also have a possible children and they are able
to talk to lex actually the older one love to talk to him and the
younger one which is always understood it runs are set him
i think that it out how many people were also inspired by the young ladies
primer of diamond age but i think that's pretty fascinating we obviously there are privacy
and confidentiality a concerns but you know children the children are the future and they
will be the ones using these devices and
i think we should be listening to an finding ways that they can shape the
direction of use
devices because they'll be the ones living with them
so maybe what's your onto split
so if you have any bad experiences the as you all children
developed have it's all and things that you from this talking to say every that
you're i really wish they haven't
the injury
along that line is it wasn't article i don't know or remember if it wasn't
go up lexical we're to discuss the behavior kits that they don't use the polite
word like
a lot can you do this the right so because the machine of the data
we like that it would have to techniques or something like that the parents got
very upset about that because the changes
in the is the whole kit okay so i'm not sure how to really fix
that because but every the parent in the floor to the
like that's perfectly and hear something that i think we can data and i think
it's the iceberg
so i have an extant counter most the time we just use its it like
played of adl and listen to that purple ham sound you know don't have a
copy anymore from forty years ago whenever
i
i one thing have noticed that in china tests that recently in several when they
do all the time is set at time if an utterance i don't have to
do that and say you say you know the next you know set a timer
for five minutes or ten minutes to whatever and the echo star and my natural
we actually need to say thanks
since she's open the channel
right i given in this task she's happened the channel she comes actions spanish the
timer and i say thanks and the time we just keep skyline
so that i k
however i thanks
the timing chest each guy i think is thanks that means a turn at the
time there it's not that fast so what you have there is this crossover between
social dialogue behaviour at back like a greeting or thanks and these task oriented and
here and i think we have no idea
have to get pragmatics in that situation in at great others to things and i
think it's it is unlikely that i at and i spoken it in trying to
explore the and it and here's another one and these are kind of maybe it's
just engineering and i'm really next step
and i hate came in aston lex's our and on the top
and i stand that real ran on the topic
i never seen a tuple are so kinda separate that i don't understand your query
i said it's right in the context you know she she's right she's in this
stage is no its not achieve any kind of a state right
i think i hear some or i might pay you have to do next that
this was a nice to christ's h cisco i mean and then she said saddam
stay you know in nineteen fifty seven elvis presley made his first you know whatever
right so i eight o'clock in the money i think and i
i was i hate and make it more accent and she tells me exactly the
same meaning everything time so i think we just have no
we don't really know how to integrate these this kind of social mad with the
task going behaviour that's mightily
and
the that the right thing is probably you love the wifi but that i was
a response from the device itself so it can give you an answer about politeness
that we found out what we did the first spoken dialogue challenge
and i guess by now were allowed to say that there were three systems there
was eighteen tedious cmu and this cambridge and they are served it community of greater
pittsburgh to answer the phone on and we found out that when people spent to
the cambridge system anywhere much more polite
i the dataset a cable we but we it has something to do with the
accent
absolutely awesome
i do not so i
so i have i have a question of a user point of view from all
this i think it there a certain number of checking points okay those in this
you remember when the internet was first used
what we were all using it we're going to general public can use the stuff
and the general and the general public used it when a well made any interface
that with super easy to use this
right now there was another chipping point for is i and with the far-field my
microphone array in cattle and i use it in you know you walk out of
the shower and say what's the weather and then you know what to get out
of the closet
it's i think that is a huge thing and i do not know how the
asian
is going to follow that unless you have cameras in all parts of your house
how visions gonna look around the corner because the mexican here around the corner
and so i think it there are still some other chipping points
e for the user and the user side sentences
g i can use this and i'm gonna buy this has two hundred and some
people approximately have done with
i do so far
so what do you think is checking points are
but you don't sound maxine who or what
i didn't get the vision that
i actual computer vision all your vision you have used
i think that when the expectation goes wrong it doing the you think that i
know it can do to me just assuming that it can do whatever i say
that will be a huge tipping point
i mean also i didn't what we see in china is a the you know
people looking for combining more than the you actual task or the acquire is so
i mean i combined to give social be like a you like of all of
these all than the lunch each accent but also we have a response you we
idea of your require is and you can just it's of the their own goals
very smoothly you can just i tried to be the exact what are what you
take it as a front no you know applied or whatever so then even by
the we i we try to combine these chitchat r is the task oriented are
also make the
were choice a stand and the then we see missus the over significant can the
part of the you know people are looking for you know the chitchat is that
of the already completing the task
so that's i think that actually owns or that you level what's attracting people is
that the human like coref the of this device and also you know how to
solve the you know the this also offline the use of you the system but
it also brings problems comes up to
i think is not the technical secrete item also building the open-domain chitchat the way
we mind you know conversations these social media as from the united and we got
like are you know beat is of human conversations and redesign you know of features
with there will be done of features to
you know to score data is to see how a always replace it with we
require at least for the most suitable replies map almost seems
most suitable replies and run them useful one of them to reply but the problem
is
well you do this the u is really difficult to control what this it some
kind of sight
it's the user in the most sometime it was they you know progress things but
the eventually it may say something bad or something you know you will you are
not extracted it's will say so that's a currently so you are useful to be
solved the i mean my point of view and the u
i mean how these and you know what we see that you know that generate
generative systems like laurent and base the a conversations it could provide a solution in
some qualities you know it's a
reach way to model yourself the already spike to a reference so
i don't know yet i mean that's to
we are exploring that direction
well i think it's a mistake the maxine just to tie to think about things
and associate associating alexi with a the thing that lame puts on account so
all you need is a microphone you need the channel and what you once is
that same voice
that with the same knowledge about you to be accessible in as many different contexts
as possible so when you get a new car and you have the same quest
you know you ask questions you want so that access the same system when you're
in the home wherever you are where the using your file in your watch
you talking to a loudspeaker you're talking to television it wants to go through to
the sign plus new knows assigned things
in the same so that you don't have the land different protocols different
you just want the same interface now
or still not sure the by vision you mind cameras but you know in some
circumstances there will be more inputs and the that is a big thing that's not
really been done very well so far as integrating gestures what you can see around
you into a into these systems
but primarily
the that they personal assistant is detached from hardware
it is just that you know it's maybe in the cloud this may be running
on your personal ecosystem
but it's yours and it belongs to you and it's accessible at wherever you need
to access it
i mean that's one way this could go but it seems kind of automated think
of this embodied agents that are with me at all times but that's very different
than my of lived experience now write like when i'm at home
i interact with people who know me but no many different way in a different
context network from the on travel right
i
i'm not i'm not sure i'm not sure what people want but it is not
clear to me that the same agent everywhere
but so much power more powerful if it is the same age the same that
was the same thing
so the lower than maybe not everybody knows but i'd like a little on the
for like that
also available via we
so would like you have any of the a and a also like the on
the cycle of
well
what i did what the system you want a lexical
and hear the topic
and i can see the shopping the
i get a one problem the other five
and i could also ask for the same dropping the support of what so it
it's the same the information
so in that sense if it would be my car or somewhere else what
got the same propping this work
the sofa following up on those last two points
i think that this issue of personal assistant and what are really means so is
it is it something that only
knows about you when you only know about that and it doesn't interact with other
people at all
or is it more of something that is an assistant for you in a social
interaction of we think about human assistance for executive assistant and so on yes their
report to one person maybe as a as you know
a forty but they have to interact with a lot of people and the issue
about your while shopping list and whether you should have access to that
i think that is both the big scientific rather than engineering question and that
as far as we come really is just say we can have
walks in prevent certain people from accessing devices or certain functions but i don't think
we've got more sophisticated
in terms of saying how they would interact differently except that everybody has their own
personalisation
and you know i'm may wan other people to access some of my information from
my personal system but not everything and how should that works very curious about what's
in
alexi now for managing groups even how do you
how do you deal with people fighting about which music that i want to
but what kind of answer our people bring about in terms of multiple users in
were interacting with multiple people
and
one or more assistance
the with respect to multiple uses the device is assigned to the older but obviously
it is a fairly do what so it's in the living everybody can be
or it my wife wants to put something on the shopping that i follow
sure that of what it disrupting that because everybody can
excessive in the family so it's like the whiteboard right so
but are one of your a virtual want to respond plus the right so to
speak
so one or one point development but it works system was true words presumed
but system should
not respond to complete remote room impulse hmms
random no marketers store parking number system how to understand
but the observations the ones used or talking participants after a while useful realtor machine
that syllables
for stuff are just so do this actually works a lot better
so that maybe we're going through this transitional true people have been not require a
very
proper way to address this time
once the models as culturally establish some researchers who grew where
personal monocle with a rich close to the room response would really precludes ago
room and removed from you
remember system at a machines were introduced
the room but remember to o
behind one person or something wrong understand talking to the machine trying to read through
the correct then
this is not gonna happen to the
the cultural norms what we do this leads to each other we watched
something simple true but you know to some prior to the throat a double point
one from which rooms
basically what is the remote to do
i think it's more dimension room specific query
the some form of words has to do acquisition of norwegian structure
this is because we were able preschooler we do with the room with the parlance
pixel value some criminals
the reason we will use the language going to be homesick how exactly those with
map onto a actions the three but the main the bark number four through four
that's a little work on them
remote for worst possible sorts porcelain rooms room
would be most of these machines could produce the problem the room release brute
will come
work
and remove you have any thoughts i agree okay i
for syllables but the numbers knowledge so removing so
familiar google knowledge order to
from these works
but open remote chance to come from somewhere remote will have to remove solve a
problem
i can't being misquoted so to be clear i didn't say that we all we
have engineering will follow these things i think what place it was we now we
know the engineering can it's can build
systems which of going to be
it's a significantly more capable man they are today
but doesn't mean to say
a lot of the things that the been mentioned here will remain problems that need
folding
i'm just saying i think we can scale them with formal capable mail to they
would just engineering
they still will be able to do what you're us all human
and just to get quite common i was gonna say to lend well maybe now
you realise it doesn't recognise thanks you might just sliced all the time at are
alike so
well that might become sounds such that share your children probably won't erase your children
will be figured out likely to just size l
no i
i
i think you like it slots think it
covering the cost of alright if he wants to say that way you should be
able to do
so
i think lin is exactly right i think we only know the tip of the
iceberg in terms of how to in integrate pragmatics with all this wonderful technology which
i agree it's amazing and wonderful and every time i you see or even though
it doesn't do very much for me it still amazing to me if i remember
way back when i just wasn't possible
but it really is all about both sides i think are mentioning really interesting things
about identity and ideas design partner specific processing and that from me
it's what we only know the chip the iceberg about and so
a fact that a bit more tomorrow but for instance you know my story is
i have all of these devices in my hotel room and somehow i but series
by mistake
and units and might have my own in male and female voice started saying in
almost units and that there is that the right for one and then you know
how many whatever and so you know really their this notion of having the same
character everywhere we deal is an interesting idea and you are trying to go for
a coherent identity
or rollie something that it's still a real problem we don't know how to control
in context like all kinds of things we can go wrong just is trained and
that i mean you have certain expectations of a partner
you know thirty years ago and chai demanding brenda laurel how to handle
that i was sitting in the middle them on and they were fighting
i get fourth about
agents everywhere or agents are evil and immoral okay and so that was the point
if you thirty years the goal is a little psychologist in the ongoing an empirical
question it'll work some of the time it will work at a time
and so that was what we find that thirty years ago there is no doesn't
at least at this conference if no one voice and abilities to be alone was
extension item in around to
to throw water on our parade that
you know they're probably people out there that way then sell
i think it's really important to think about the social things and you're right in
terms of restaurant things in
certain little functions i can do with mind that
you know the big picture the annotation is wonderful but we're so far from here
then we talked a lot about children
and you know the how children want to interact with systems and course children will
adapt
to the language like we're talking about the figure out that well you don't really
just in the way
what i was used as motivation
for a vision and dialogue systems is my father
who's currently years old and
back in when i was a new once in the late to about two thousand
four two thousand five time frame i was really brought over or maybe little rebels
part of our how many hope you system and i
had my father use it in a completely destroyed it you know
this is because he doesn't he's not gonna adapt to the text apology ready he's
and it will be said you know this is kind of nice you know exactly
what the do when children systems are really useful really useful was back in the
forties
as a what you mean because like unigram some of the coda
and the when i with the we had a problem
let's say than refrigerators brain you know making a sum is not pick up the
phone
and louise would pick with answer and choose the telephone operator she everybody in town
little town
it's a lilies you know why refrigerators making this clicking so you know what he
would you which i do infeasible kind of rigidity about the project here we know
bob he'd he services for tutors and let me let me just connect you
bob's he's always over the joes diner at those times let me i mean can
tell whether i and somebody might then still am is there is two thousand four
two thousand five and
and i thought you know that there's wisdom there which is that
we shouldn't have the have people adapt to the technology we so is no we're
gonna go build we use and you know that was the there was a motivation
for louise and then which became cortana but
the this kind of using technology to meet humans where they want to be naturally
that forcing humans to that
it's funny because right here is to thread simultaneously and had ever since that tension
or you can have a nice debate which is we want to make technology more
human like in each and he said
and we want to make people talk like the machines "'cause" that'll be easier for
the machines to understand and we can have to decide if we're going to make
machines more human like
we can we as in years till the humans change or we can integrate pragmatics
and other aspects of natural human conversation in to what we teacher machines
like my questions related to this i don't know very much better and size and
one transmission mentioned many times that right can i so having emotions and high as
it is an important part of that as an assistant will a dialogue system and
so i'm just wondering a what is the kind states
of interacting be interaction between researching has no assistance and active competing in some point
in terms of recognizing emotion from the user and from you know prosody of this
each and other aspects and also in terms of generation
of utterances which contain motions
and ask questions so we actually or
no i don't number in that's really we actually not
research all models i we i mean at this stage at the start off button
will allow them from the email so for like you motion
emotion recognition all you motion generation actually be we actually use you a sheep the
lottery
good performance comes fortunately in there are task if you want to recognise it anymore
so all you want to you know reply i was because it mostly you don't
actually those that for every
replies
so you just keep your procedure you use a battery or
i mean you will double recall you just keep your is here tonight in
by doing that in that way we can achieve likely you know or ninety five
percent accuracy or something like that and we also learn allowed from the research community
like doing the generative model for a chat board
we actually a truly in a channel walter using sound you know