okay
thank you for all state all first and late
an apology set
alan black wouldn't dryness
i'm phil collins from an f university
that you all introduce yourselves
everyone i'm become not item
and you can communicate comedy what my last name
and i work at
educational testing service research and development
where i work on
multimodal dialogue systems for language learning and assess
i and i said you try we from now google ai a working connotation ai
but also
a multimodal stuff but
vision about four and also in
efficient machine learning basic out you do
conditioning on like computer memory constraint
covers can so i am professor here at the age
but also co founder and chief scientist at or above x
spinoff company
it's h
developing social rubbled
that for
great
alright so i proposed a variety of
what i hope for
questions that would cause people to start thinking both about the field and also about
their own research
and trying to understand where this field it's going
can i make the text a little bit bigger right then it can read everything
but i can do that
about that
the back
"'kay"
well do that
so
the thought was
i hope will get to talk about all these because they're all interesting topics
the whole idea is to put everybody on the spot
in one sense
understand what it is we're doing here why we doing what we're doing
are we working on
the problem speak the problems that were working on simply "'cause" there's a corpus there
it's easy to work on a corpus that exists rather than either create your for
actually work on the hard problems rather than the problems that exist in this car
corpus
so the question is are working on the right problems that's the first question
will also want to talk about multimodal multiparty dialogues i wanna push the conversation into
somewhat more open space
where
very few people there are few people here in the room with thought about that
but not a lot of people
where they're our
architectures that we're building which tend to be type you know i do they are
pipelined or they're not pipeline then you know you should talk about
why it is we wanna do each of those
the next topic is why do i have to learn to talk all over again
why don't like be able to just have account you know why can't conversation speech
act and what not be something that domain independent that's related to the pipe one
question
the explain ability question has to do with well g d p r is an
interesting issue here
but if is to dialogue system
why did you say that
i like to get a reasonable answer out
so how do we get there and the last you know a very important problem
is
what are the important problems what would you tell your graduate students of the most
important like to work on next
okay and the last question is
okay think about
the negative side of everything we're doing
can you are technology or my technology their technologies be used for yellow for bad
interactions for robot calls that are interactive now
so lots of topics to talk about
we can kind of start with the first one
and then also down it shut up
so i imagine that a lot of work here on slot filling systems
so you ask your sis your system asks you what time you want me
and use at earliest time available
or you say what's the earliest time available when the system says six p m
and you say too early
so the system says seventy and so you say okay
notice the user didn't fill the slot the two of them together fill the slot
that's mixed-initiative collaboration et cetera there's lots of issues rather having to do with collaboration
are we only working on slot filling because the corpus is there
short would like to say
we do i guess everybody can be comfortable by some attacks
therefore nobody it i think it can keep it in track recorders
played the lead so show answers
just the dataset and metrics adding more than the dataset it's easy to evaluate and
for sure systems accuracy have on this one metric we're because we know the actual
values the true values and the precision recall single
but i also think that
it cannot be a slot filling system or the other extreme you know you go
all the way the logic and say it has to be a fully constrained the
system i think it has to be something in between and we have to be
flexible to adapt to it could go from a slotfilling to actually being understand okay
what slot
attributes or values can be actually changed morphed into something you know that maybe that
depending on some constraint for example temporal constraints right so the downside to going completely
constraint is there's no way we can you ever program all that logic
or even for the fact that like the system if you allow an automatically learn
system to you know in for that from corpus there's so many different possible ways
to infer that like i mean you're talking about this example like if you say
only i mean how many earliest time should i give you like seven p n
six fifty nine six fifty eight six fifty eight and sixteen learning work on something
like well i it doesn't necessarily right so which is why selects it has to
be something in between where you can
program and then it's okay to actually get some of these you know
heuristics or something where we say that okay
i'm looking at thirty second blocks are one minute blocks of thirty minute blocks
and then can be actually gradually x
you know sort of extent that are open it up to learning something more nuanced
i guess it depends on
what you want to do so if you want of restraint system poses an intelligent
system
nothing is really good coming up with belting systems you just give it a bunch
of dayton
you clean it really well but intelligence is something it so
i think that this is not a knock on any of these two things because
in some cases be do want between systems via be happy but between systems and
that's what we wanna look
but in other cases we might want without it
and
not that be really close to that but this you want to get
to something more which respects some kind of planning some kind of higher abstraction so
if you wanna go that route but it really depends on what we're talking about
just to build on
so i think this of course related to the corpora that are all there but
also like
what are the practical systems that people are building which are often these kind of
searching for a restaurant or something when you have the slots but
so i think i think it would be interesting to open up and look
completely different types of dialogue domains so i can give one track where their actual
are practical problem second when you want example so far as we are developing an
application with the robot performs job interviews
and the robot might ask the user so tell me about
a previous work you have already got the challenge that we manage to solve
so the answer to that question is not very well with a set of slots
that's you more it's quite hard but it is to come up what does that
slot structure look like so that kind all and then you that will also be
needed when we so open up to more application of the response we have now
i think would be very interesting to address is also perhaps not very see
to translate that to logic form a lower where an sql quick we're or something
there's something else that is needed there's some kind of narrative that is coming from
the user that you need to represent them that's what i one
so definitely would be interesting to try to but for doing that you have to
consider other domains i think
what'd
what did you think about the
the first talk this morning relative to
semantic parsing verses slot filling
that it was very interesting talk but it's more it's obviously if you have that
kind of queries you need more complex semantic representations and so on
we have different queries is a common way by a given the corpora we've collected
you know what random because the corpora doesn't exist because we define it that way
verses
you know you actually go travel at you have a conversation with a travel
and one would find perhaps of might be a little bit more
open ended in the way you
maybe
but it's like
it still perhaps the user at querying something getting some information on all the system
we sometimes as the other way around the estimates asking the user absolutely with without
sources so well
in fact the original
task-oriented dialog
with barbara rose his phd thesis in nineteen seventy four all the structure of task
oriented dialogue where the other way around the system is telling the user and you
something
we're trying to get the user to do something which of course are plenty of
examples
unlike arctic you had are added
change a tire
i just of the one more think that when we talk about this intelligence quite
often we sort of completely think that that's this one inflection point instantly the machines
are gonna learn how to reason and like you know understand everything i think one
sort of nugget i want to mention is that
whatever form logical form or anything else that we're gonna use being the important part
is to see you mentioned collaborative right is the on language understandable by the system
may not even generate like proper stuff right but is it understandable by the human
on the other side read and allow them to you know get to the you
know a better state and towards that and i think like
we're not going to see like you know one system trained on travel domain subtly
doing something
amazing in a completely different domain but i think we should start paying attention to
these because everything is machine learning the user's how well it systems doing and multiple
domains right i mean start like generalising
and think about the generalizability aspect when you're proposing models as well and also abstract
location so that it to than the third in the fourth question
to my whining i don't the head okay i that's okay well
i don't sick
okay
so there's a lot of obviously in the intended trained systems
where training dialogue system in addition to the language processing
and some of the slot filling systems we're doing exactly the same thing
which means you're dialog engine is
is basically start with that domain
and now you're gonna get a whole bunch a new kinds of domains and certainly
my dialogue system doesn't how to talk anymore
i don't know how to perform a request to understand the requested maybe there are
two kinds of speech act
that are coming in
we saw this morning as a lot you know in the semantic parsing they're trying
to deal with that huge amount of she mentioned
as mere element and is a lot of variability in a language
but i submit is much less variability
in what happens to people's goals in the course of a
in general you tend to achieve them you achieve the you fail you try again
you augment what you're trying to do you replace what you're trying to do et
cetera because actually i my suspicion is it's a relatively small state machine
why seven both of those together what can i figure out one through machine really
one or any other method
and then deal with the all the variability in the language in a pipelined fashion
versus train it all at once
please i guess i mean the
i agree i mean it's something reasonable to separate these things like this
the motivation for parameter and learning is that you wouldn't have to have any knowledge
about this
representations in between so gonna have to have a lot of data so that the
data but you don't need to know so much so i don't have a lot
of data happen is that
no that's the problem
i mean go one thing for go with the rest for the rest
in the standard as counteract so i think there is i mean that to that
of end-to-end learning systems rate i mean they're end-to-end learning system but we say that
all these components which are not pipelined fashion we can just gonna get rid of
all of them and they can and the input and the final output
in some settings i mean i would argue that you might actually have more data
for that then the individual components right like for example speech-to-text a right then you
know
all these fanatics annotations an intermediate you know annotations at all different levels in the
system might actually have just the speech signal and the you know they're transcribed text
or some response
that might actually be easier to obtain and indoor settings i would say the into
an systems at least
given enough amount of data have actually in recent years provements and this is not
just be planning i mean as the technology walter gonna see improvement in that like
the recognition error goes down now the question is when do not do you don't
have to do end-to-end learning in every scenario raymond there is also like okay you
know i
every into and learning system is not going to solve the error propagation problem right
and then you might actually creating more issues because no you don't know how to
debug the system there too many hyper parameters and like you have to deal but
that that's actually a worse problems in some settings then actually you know just fine
data just do the input and output annotations so i think it depends on the
use case like
if you have to prove the system or if their individual parts of the system
that you need to actually sort of transfer over to a different domain or for
other systems where you need that output not just like the last but by like
something intermediate like for example
it can be argued syntax is not necessary for every not task or domain using
howling when the last time you actually so part-of-speech tagging paper in the recent years
or even a parsing paper for that matter if you see the number of a
percentage of a present is yellow re mlp or not collide means going down dramatically
but doesn't mean that that's important to not important ready made exactly important depends on
what you trying to do with that pretty using the dependency parses to do something
in me do some reasoning over the structure substructures it is useful to generate a
doesn't know what it on the other hand that's just a precursor to peer into
an anti r machine translation system
it's arguable that that's not necessary
for the matrix that we're talking about parameter got automated metrics
again that does not mean you're gonna solve that we have to solve those problems
are used i can take models
any depends so well on what you're trying to use a system for
in some sense it's kind of a balanced rate so
typically for example but we are kind of
so this to take a specific example of what we're doing
however when we're trying to bill so
really building language learning module the building specific goal-oriented systems task-oriented systems a specific skills
like
this thing see
fluency of pronunciation or grammar or specific aspects of ground so
so how do you go about and this is the whole so how question but
you raised earlier which is about
you know how do i build these generalisable systems are how to a kind of
you know
use the same pipeline across these different
ceiling is similar tasks but there
probing each probing different things
so you start out with something perhaps which is because it's a limited domain you
don't have much data anyway
i have started more expert knowledge
and then start collecting data
to wizard-of-oz or some kind of outsourcing with some of the matter
and ultimately get more data that you can kind of build a more hybrid kind
of system
which could either be end-to-end but also be informed by
not that one so
that's
that's one way to
i guess what what's problem kind of look at
different points along this hybridization spectrum a combination of data to one another driven approaches
for
have implications for how your pipelining a system in training the forces
well i certainly don't agree
while you guys
but you know some of the techniques
for instance are not gonna be particularly
appropriate for certain types of tasks
so for instance i think attending to a knowledge base forces
computing actual complex query those two things can actually be very different
frontal use a probability comparative and things like that
it's not obvious to mean attention might solve
i guess that's related to the first question that you will probably addressed the kind
of dialogues that you can still with this
method and the other ones you will not address
so that's score so that the risk of
where this research is going as we just keep drilling into the problems that we
started with in we and not expanding or to go
so
talking about expanding this goal
i want to talk about
or have you guys talk about multimodal dialog so i've got
not just
the speech but i but other modalities and their coordinated in interesting ways
and about multiparty dot
which guys
so
take
any of your favourite speakers and stick it in a family stick an indoor environment
family not have a conversation with your family
and that device
and it can track conversation amongst multiple people what time you want to be a
merry want to be the month at three o'clock mary's is no i don't
okay so what the system into
what's representing as to what happened in that i'll we do that men
do we have any representation of cool what's the belief state
that we've seen in all these the
all these papers is there any notion of believe actually going on
so
the idea i mean there's a huge amount of thing to break open once you
start what within the multi party set and just there's the physical situation had actually
having a robot or gonna look at lex's physically situated it got a camera on
it
and i'm sure they have that right and it's
and it's can see what's going on in the room you can see who's talking
it was talking to consider you know if you allow
what do you to track out of all of that house is gonna actually helpful
family rather than just
and individual bunch of individual conversations
so
this is a whole rate better bigger space what we've been dealing with how we're
gonna go
well really worry about the multimodal multi party
adaptation so this is still very the this is the kind of dialogue that we
are trying to model with for a for example what you have multiple people on
one problem there is as you say sort of the
this
the belief states or sort of typically you think about a bit that's what does
the user
one
up to this point or what have agreed to this point but if you have
to people the might be of course to different states
so if the two people are ordering and one say
i would like a bird around the other once s like me to
but not with onions or something referring to that and you have to keep track
of course of what the two different person someone that sometimes of dialogue
it's also
you can't just are presented as individual adults it's common like we want to do
this we would like to do exactly so that maybe you should have like three
different representation one is what we want and one is
i one on the other one
the goal is to come to a consensus but this is i mean it's are
watering things you could have different things and so long so it could be a
mix of course
and that thing that you can refer to what the other person is saying
but also of course is to say if the two people are talking to each
other to what extent the system listening to that which is probably has to form
a part of real data part of the we
right if it's part of it's all of us together are trying to solve this
problem
what we're gonna happen what we're gonna order in more when we're gonna go out
for whatever
we then the system has to be part of this collect
and you have to have what we used to call in today's joint intention
we're trying to do together
but how we're how would you guys think about
this problem
a multi-user problem i guess the other thing to add to the mixes the multi
modality of things right so absolute so for instance
when you have audio video
which one be within two first and how do you how do you to choose
priority
and of course is unknown situation that something
that
it's is just
just missus usually is i
so and this also what we found is that the so
maybe looking largely the education context for this kind of thing the teacher training or
something that you looking at
for instance a person interacting with you know
a teacher interacting with this
you know able to a class of student outcomes
so
you know if the teacher dismisses one student how are you know you know
is the student or is one of the students to
so suppose they say for instance you like a low the in great but i'm
pointing in that direction so who does the system you know attend to work as
it into my speech is it into my just to
and this is always that kind of
or buckets may or but
so
try to positive spin to that i think we are at this stage we can
do belief tracking for sure that it is not at the level at be wanted
to generate cannot but i believe we have developed system are very close to
the technology that the point where we can actually do joint inference or video audio
and textual signals where we can actually disentangle you know between different entities all you
know corresponding at the same time and we can do the set scale
you could do that but then how do you
relatively prior knowledge of the simulated user the second point where i mean i'll give
you a different scenario like that so we do this
imagine it's not just like you know collaborative but we are i you know you
can actually attribute that to a specific entity what if it's a parent and child
mel whose preference you take into account the channels as a play the cartoon network
and look for twenty four hours right for example women alexi do that store who
will do this obviously there's a preference here like in the parents have to sort
of winter
the very tricky situation and it might not be as easy as like that in
some sort of a general-purpose model that says you know these are the entities and
like there's one model for k there are two people interacting and they have a
joint intend to write it might be customisable powerhouse over or you know set of
people and these might all vary across different sets of people at put together
and the relationships between them as well so all these things have to be factored
in right i'm into at the challenging mixer problems
but
simple thing is we don't have to line everything right i mean like one suppose
everybody things like machine learning we have to relearn everything you can just ask the
user for preference for a time you could just a person thank you are people
tell me what's your preference or just manually enter it like in an a or
whatever it is right i mean that's is that just one bit is enough to
sort of bootstrap the system or at least locking bunch of variables right which you
know would have cost a lot of confusion downstream
so
there's still hope i mean there it's
have to be this interactive mode not this system observing a bunch of things and
learning and then like certainly starting to do the writing of a point in time
alright i'll move
we finish what time
six about
and we
okay and i think we wanna have a fixed
so giving an audience participation
so i will try to move along with some of the other
questions
and
i
but the next one
that i had in mind was explained ability
okay so we have always lovely machine learning systems
you ask any of them why did you say that what do you get
not
okay
now the system could make up
white said that but you actually want white set it to be causally connected to
what it actually
so what
kind of architectures can you imagine
that will gain hours
explain ability
in the general case
whom like this
i mean
first the question is do you as a user really need to be able to
ask that i mean are us to use are interested in what the system i
did you recommend that i think it is a dialog assign a definitely want to
know it's but then the question is do you have to get the answer to
talk about restaurant we wanted me to go to
you give me recommendations s a y okay
so in that case like this
i didn't you suggest that
and i think that this not of course if it's if it's learn julie
and
i between a and especially then you have to build a dialogue
around that so whatever you where you're building your dialogue you have to train a
dialogue on explaining
dialogue
there you might not have that data
well that part of the point is
i just it's just offer a counterpoint to get your really are so for instance
in education this is really important so you if i'm and this is true for
had this but mental health and any other found that so if i and perhaps
radix as well
so if i you know telling operation that you know what you have depression but
seventy five percent probability you probably want to them what is what
they probably want to know why or why you can plug conclusion
are the same thing with the but someone what you're saying all you know what
you're this your fluency score is nine out of ten or
four out of ten by is it for i work and what we need to
improve
so in those kinds of case is really important having said that i think there
is an increasing body of work in the em in literature especially for those interested
in end-to-end models
to and
you know similar deep learning models really look at interpretability using a variety of techniques
and i think it is that has been relatively unexplored in the dialogue community but
i think we should really
this is one of those things i would really at two i think one of
those questions a little bit is what would you ask your graduate students or next-generation
exactly one and interpretability but there are several techniques so the techniques that
try to probe deep neural networks and trying to figure out what inputs are the
most salient that you know lead to classification
the techniques that look at
visualizing neurons the techniques that look at visualising memory units
and all the way up to so this is in terms of model interpretability but
input
but even in terms of feature interpretability but you believe that will actually get chewed
up to a comprehensible
explanation to an actual in user
not have them but so you wanna say something
just gonna say that my point is gonna be about
just because we say that a network is explainable doesn't mean i mean depends on
you know who is looking at it right i mean if it says okay activation
number for three sixes firing and that's causing like the positive class to go up
by probability x right
to the ml engineer scientist was actually think this model all great okay now go
to fix it or you know like do something to but i think what probably
more interesting it's lee at least for nlp and a lot would be like are
there is some high-level abstractions or even you don't have to you know incomprehensible
i sense that it can actually find in the let's eight knots alignments right where
these sets of examples of like are basically leading to the same sets of outcome
right i mean at higher level right so that higher level at time t right
you could be of the phrase a level i could be at the semantic level
but obviously a single higher i mean
bending unexplainable system would then become as hard as actually generating before system itself right
so then
and so this is while i think the field has to go hand in hand
but like you know the modeling work and also all the other work and applications
well the vision community if you like has like advance for their in this respect
and the lp community not just for probing networks and looking at activations in even
learned approaches where you actually backprop to the network and
look at regions and like you know sort of find like learn in online fashion
which regions actually and what ceiling natural colours et cetera our triggering certain types of
behaviours and sort of interpreting back from in an discrete fashion like it's a colour
map or like in a certain types of object patterns around or you know like
triangles et cetera
i think we want to see more that nlp community getting the most interesting words
that i've seen in the recent past like you know more of the probing type
where you have these black box networks and the other methods are actually trying to
providence you okay where they're gonna feel when are they gonna fit right and you
be very surprised
some of the state-of-the-art systems you just change one word in the input utterance and
suddenly it'll flip the probability so there's a lot of women lineman other types of
method which are looking at these things so i think explained ability and interpretability go
kind of hand in hand
for realizing consumer that you need to explain it
it's not just
probably nor on
and so i think we actually need to come that's and groups and there are
many people in a room we've worked on this problem
in the past in its time i think that certainly
in the learned systems need a figure out how they're gonna do this because
it you don't the european can you will
just the point i think the good news though is that i mean if you
see the number of papers on this topic right you know over the last just
two years i mean this is a very encouraging sign rate so it used to
be like a who wants to actually talk about explains as i just built the
system it does state-of-the-art you know like x y z
and now i think for grad students i think it's a very interesting and very
exciting field to be part of okay so that's the next question what's the most
important thing people are to be working on the right
i have my data
you've got
so i mean to start with i think it's very important that
people work on different things so
so we have a lot of different approaches but we can compare sum up everyone
does similar things
i also think sort of the
in the intersection between dialogue
speech and multimodality and so on because this arcane still separate feel so
i mean if you look at
this to google duplex demo for example that god's a lot of attention on people
for that while this sounds really human like
so if you look on a sum
dialogue
pragmatic level if you make a transcript out of that
it's not the very sophisticated dialogue the model but the execution
is great i we don't know if that was a sharp picked example but as
it sounds at least it sounds fantastic so be able to actually execute the dialogue
in a way that the has that kind of turn taking and that kind of
conversational speech synthesis and so on
using a model of the dialog a i think that something that is
are explored in both the speech and the dialogue community
explain ability is
super important
would say that
i mean this sounds like there's so many factors associated or like multiple areas associated
with this building more system so that we can make the system's less brutal the
number of ways to achieve this rate and
that's a very important topic and you can deduct a number of ways from the
ml community from like in injecting more structured knowledge one of the things that all
these things lead to in my been in is like
not just for generation but all the other aspects of dialog really research problems
what are the min viable sort of nuggets of knowledge that we have to encoding
the rain or the system after encoders that it can learn to generate well i
can then do recognise do the slots in turn spell it can be transferred to
a new domain so
is that like what is the equal and of a knowledge graph right i mean
for like different dialogue systems i mean that we can actually sort of we can
all agree on so i think if we come up with like some sort of
a shared representation of that i mean which is interpretable to at least to some
extent then i believe
you know we can actually make even more for the progress right of course it's
a hard problem right i mean and dialogue is like one of the hardest problems
in and that's language as well so
it's not just for looking up is what i'm talking about is like what are
the things about like you know the channel well right i mean it doesn't have
to cover hundred percent even like twenty percent of the knowledge can be encoded in
the concept space and relationships between them such that i know this now for a
new domain i might have to just
get like access to very small amount of training data or like learn a little
bit more do sort of market into existing concept or like sort of augmented by
existing concept you know database
so
i think that's
a super interesting thing and this could be multimodal as well it's not just about
like you know language it's about like
what are the visual concepts i need to keep in mind right i mean the
taxonomy of like objects relate to each other if i see a chair in forever
table i mean i know you know what is the positional relevance between you know
different things
all these spatial coherence all these sort of thing freedom and so what are the
mean mobile sets of relationships and you know concept that we need to one
but better dialogues
so
since gabriel and since you have already covered buns of things and say something complementary
to that but add to this because i think these are really interesting problems and
it was
gonna at least my list anyway
i just add that the
working on low resource problems
so for instance we already we always
well
so this is in terms of languages domains
and even you know the kinds of data sets that we kind of cv we
didn't do or what train and this is been this is nothing new everyone where
you're knows about this we all what we can do over trained on the restaurant
data sets of the cambridge datasets a good reason of course because the publicly available
but that's
that's one thing but
you know
apart from plano get more data sets and that's obviously one of the things we
want to do but
you know can be look into how do we do minute that
i don't this work already going on but perhaps more intense there's a lot of
work on c one shot
but trying to you know
look at the better ways of adaptation better ways of working on new domains
that with limited resources
a given the existing resources perhaps using
you know since but you know it begins by very techniques for machine translation or
some other
some of these other sister feels that
you know we might not think of immediately but for instance
this is starting to come up a lot more
trying to use data which you know
i kind of unconventional for dialogue what might be a useful for bootstrapping is kind
of low resource settings
that might be
also something very interesting and useful to look at
and especially for underserved domains so okay coming back to my to madison education
these are not necessarily the climate is how may i help you or you know
looking or those kinds of
domains but i think there's to you know this is where you have a lot
less data but still
might be useful to kind of
one thing we have very large loud structure maybe global don't it's block structure to
the group
and
and then
that's all unique
it's just the known structure and after that you already know how to have a
cons you know what objects are you know with the actions are you know what
the verbs or you know what they're preconditions and effects are why do you need
anymore
but i mean dialogue constantly able some the well unreasonable has a file that is
why don't why do we need any more than just
a change and knowledge
i don't need a big corpora "'cause" already learned head
or in that got a huge vocabulary have that all these vectors
so
one like just change the knowledge base
then how because be to make it you know what's
who needs universal just give me a alright i'm gonna do
cancer diagnosis or i'm gonna do
architecture where i'm gonna do whatever you know take arbitrary size
i was just a great so for each of those domains you need that lack
knowledge base and i
i think i like that everybody may precision and that's what they're
okay
but even if the knowledge bases let's a huge and static reasoning over that is
in keep changing rate i mean the same knowledge you might interpreted differently you know
sometime later as it was would you doing right now it could be because our
methods are not sophisticated enough or
you know be basically some new information pops up i mean the fast a the
same but you know the way you look at that changes over time right i
mean
and one give users about example for this but i think
i don't think the problems are gonna go away anytime soon if anything the machine
translation "'em" even the low resource setting
this is existed for several decades right i mean i mean number of not make
a similar to what he an unsupervised machine translation like now we use starting to
see okay that more system actually scalable systems working this domain and it's i think
that feels all and all the ml all a computer vision
has this tendency to okay we focus on like the solvable immediate big crunch and
problems and then you try to simplify are then like you know extent to the
zero shot setting extent to you know or so sitting but it's not be starting
from scratch all the stuff we learned about image method i mean convolutions are still
them single useful most useful blocks that you're transferring over a and foreign language i
would argue like over the last five years
attention seems to be a common i get that seems to be trendy can have
thousand variance of these networks but there's specific concept that even if transferred onto new
problems right now you build models so
hopefully these also would transfer you know as we start looking at you problems are
extensions of
well conceivably we should be thinking more about grand challenge problems but is going just
usually a alexi challenge but
larger ones you can get governments to support
but you know governments now we're gonna start asking us there's last quest
which is
so you built this wonderful technology
and now i'm getting phone calls the user interactive phone call that are trying to
get me to do stuff
either by stuff
or in the worst case commit suicide or you know a variety of activities
and these are by doing this
and they understand language pretty well
and they are
there enough to cause some people to be convinced
that they're dealing with the a person
and even as far back as the a light there were people are convinced about
the human this of that but these are you know who knows and letting these
things lows
how do we start that and ask
you know we've seen that we see what happen in computer vision where people were
really paying that much attention
and certainly it's being this
how do we prevent are technology phoneme is you
obviously it's our problem
suggestions
and then we'll turn over to the floor for any
you know will have enough time for twenty minutes questions
as only ten minutes
so you know obviously can do regulations that
bots always have to say that there were able but the
that would not will not stop people from doing that possibly
so adversary older networks
generated you know if the need for a year you're gonna have steve fakes in
language processing and dialogue processing of wherever successful
in that it might also come to stage where i don't pick up the phone
calls myself anymore but it's under your by six mile bit makes it up in
order to see if it's about corpsman
and they were talking to each other violent argue that is
try to convince my but that it
i don't know but that it actually happen that i mean it does so i
don't have take michael's but the local system takes the call for me
which might be nice even if it's a human coding like having an secretary
so and that could also be annoying so that in another way because the technology
might not work so well in the to start with so you spouses falling and
guess
your part sphere text you and it might it might cause system from millions correct
extra
so these are other problems also
so i think with every technology i guess like
they're both sides right eigen this example you said like pots talking to other bartending
i mean be awake those are then we think no they can and the generations
or at least for some of these things are super a good that don't have
the time the natural language exactly me just know the right keywords or trigger words
and it can now imagine one if you're box has access to critical account and
like the other what's a stock and then the code of the order you know
like this like eighteen hundred dollar stuff right and
it doesn't at a confirmation because the predicate info is already on so i think
there like blog sites the both of these things right so but one thing i
would say is
we can like just work on the research of like you know improving the dialogue
systems the recognition the machine learning and then sort of ignore or like sort of
re actively you know sort of go back or because of g d p or
something and go back and look at this problem track so this is also opened
up new research in other fields right i mean and tested we can still process
the bottom always gonna get better it's like spam right i mean
you know the you have to their multiple ways to deal but that's rate of
research also has to be like sort of state-of-the-art in terms of like how to
deal with either zero so there are methods which actually now try to improve i
mean
take the adversarial in flip it and try to improve the robustness of the system
basically using the same kind of adversary technique but like in a reverse way when
you know the gradient in the other direction of during training time
one way to look at it
in the commercial systems like should be make the so the money p-value or the
like number of tries these bots get like sort of increasingly more challenging or like
you know the amount of course like many of these are generated you know thousands
of times a day and also generated right so if there's that wonderfully cost to
that
how these companies won't exist right or they will actually change the strategy so there
are different ways of looking at these problems like them in the cost effectiveness the
research one thing is
i don't think it's gonna go away and i think that's if we solve this
like you know that was no problem right now be fixed towards can be something
that's it's a continually changing problem one example is like when we released like some
of the systems like you know it's multiply et cetera was people don't know we
have to it too "'cause" wait longer to actually build systems to actually
detect sensitive content the messages because you don't want any of these smart system to
say something stupid you'd rather not say anything man and you know traders be smart
and suggest responses and that's a continually evolving problem right and its cultural it's you
know depends on like the language so many different aspects to like
so it's a very hard problem but better i mean those i think research also
has to look into these aspects and like sort of
going back to the psd is what kind of problems your work on thing we
have plenty of problems that are uncovered by the advances we made in the last
ten years writer is opening up like new areas for research as well so
it's a constantly evolving challenging
okay let's point we one open it
okay let's open
we got a mike
we got a question
i feel
so i just want to fall on the explain ability discussion
i think one useful nuggets from watching be asserted that like video this morning is
that the all the users in that skit didn't trust region a set on not
sure about that
and it may make you think that russ is also very important for explanatory
and i was wondering more specifically
if the panel things that symbolic
representations are necessary for
modeling that sort of explain ability
the structure for
are we gonna the mean for the connectionist as a compared to connectionist models that
we see today and then the role approaches
well i think you can have both
really
it occurs to me to use
you are to be able to
training no
neural system with but ai planning system
and then you've got a very fast executed neural system planning to can explore much
bigger space and people can and then you actually have when you ask a wide
you say that then you go back remote the planning system where essentially it's going
to therapy in figure now why what i've said that
right because there are causally connected you could imagine them
actually producing the representation encoded to train it to do
that would be my
that's what am i get the answer questions
okay
so i think one more aspect about the trust is i mean
do the user's trust the devices or like the technology itself right and in one
interesting area that's i think fast case right now or like it's gonna be of
increasing importance as privacy preserving i and
the notion is whether you know data level there is on the device or you
know what is shared you know to the color who can access it like i'm
ideally percent where trust the veracity of the information that's coming back
all these are interesting aspect right i mean i mean in addition to the symbol
again initialize like the links the dimension i think this is going to so to
be even more important in the coming years because like
phone is where your most of the time these days right i mean that's not
gonna change its if anything it's only gonna get worse right so and you interacting
with these voices systems it like probably added exponential rate if you have one of
people and you have an unplugged so i
well i don't know as can be irritating sometimes right so which makes people do
this
so i think that's also an interesting and very useful aspect of trust and then
there's a like a elevator version of that like
regulations in gtd are like and imposing like in making sure like
there are third party sources it's which can verify this information right and it's not
just one central entity that you know is being out and you believe everything right
so
more questions
not until january see so i wanted to make a comment and then the what
documents
so the first one that i cannot algorithm or with in not being open to
an out-of-domain multi-modality explain ability we can already that's done had candle names and
an alarming domain may human learning machine learning domains and what we need an does
yes and the fact that we don't have large datasets and personally i can personally
in my projects i can't wait for you know that they is a deep learning
architecture tool
be able to jump from restaurants easily to be able to understand the conversational that
the patients and is engaging in when describing there is and so i'm not sure
exactly what
this solution is there but i see a narrowing that she actually and in a
well as you need a narrowing on this task i wanted to and bring to
your attention a very interesting paper i thought from ace it nothing to do we
still
the each race and sharing and whatnot there is an accountant and that they are
wasn't a and
energy consumption and i one slip ring of and training what is the learning model
as and i thought
human there was a the task i wasn't sure so shall i you know some
these technology i think that is also something that we may want to take into
account when we
train in retrained is machine learning
using the people was completely i
something like this is a difference you to ring radii screening so
logistically
i think i can now that space and the last but i think the second
point you made a is probably gonna be one of the most significant areas that
are gonna come up like not just for and all the anything touching ml and
then x five years
on how we can use compute i mean there's a general tendency of maybe just
keep increasing the compute on the cloud right i mean and they can keep using
as much as you want by segment via might arise and like you get access
to more t v resources if you're that's not gonna be true i think what
you will see is like
we training with more sources but you're also building more models and if you look
at some of the you know a statement going from some gladly well gonna i
ten x more compute power and
i think we expressly my group you're actually looking at a lot it like on-device
and also efficient machine learning and
they used to be a concern that all
these methods i mean if they have lower for rain or lexicon hundred printer memory
are
you know their you know factor we have to sacrifice quality but i think at
least for recognition classification sequence labeling et cetera and even for speech recognition too early
this year i and i of you know
seeing performance for these efficient models almost on par if not better than the see
that so there's no reason to say that all i need all these resources to
train the model there are much better ways to do it and that requires separately
you know like you have to introspective research that goes into that optimisations and lex
choices et cetera it's hard it's not there just making a black box
there are some black box to the there but it's a very important problem
and going to the first point out narrowing i think it is true but i
wonder if it's not just the deep learning i mean and i'm sure this has
happened in the you know previous tech it says well random and suddenly you know
there's some spike in technology and you know everybody grounded to its that and then
like over time that changes and like
i would see this like the rise in deep learning and the power of these
networks as i mean just the cord like you know that something everybody knows the
a very good function approximation sorry i would rather use a state-of-the-art model in one
of those black box components like for language modeling utterances
then having to think and tweak about like you know what model to the use
here right there are the focus on the domain problem vitamin like for how about
the focus of the high-level system than like what is the utterance generation mechanism that
i should use right it's hard but because
requiring you know that was also understanding what goes on because how that has contracted
the rest of the component but i would rather you and it's easier to access
these can open so these days as compared to what it was before so there
is i think a silver lining their
you know that more people have access to these state-of-the-art models right now and they
can use of mary's which of the using a very creative
or in the back
you on the smoothed from also
and thank you for the discussion i have
such as for the social impact
discussion
what do you think we could do about informing and uses
about the dangers of these technologies like
do you think maybe is feasible at some point
actually building blocks that help people
recognize logical policies
or marketing strategies and all these things what can we do what we do
in terms of educating and uses
you mean how to get defensive but
no an l c was pointing out not directly does it all the defence
the end user but the ball that teaches the and use the
about
logical fallacies about marketing strategies the about the fact that there are what's around
that try to manipulate you
can we get this to the politicians
i don't know logical fallacies input i mean them
it's
we have it is quite a small community compared to the
entire population and of nobody knows about the politicians one okay
there's just one can really get the robot calls
so this
i mean they're starting to care about deep fakes now that in the us congress
all those converse people were
misidentified for criminals from some f b i most one a database
this suddenly start a carry
so
okay
so now they have no they carry
suggested
now we have but i mean i agree you could definite haven't the this is
actually
in other applications of this area of dialogue system that are that this under started
on that's systems for training for example to train you to do a job interview
so the system would be you and you would
see what it's like or and that here i mean
it is the training scenario but you could training
a lot of different domains
or someone trying to sell something to you and trained on how to understand first
is really trying to doing and so on
so this kind of
training scenarios using dialogue system for that i think that's a huge
well like your idea of the defensive system by because a lot of the
systems that you don't you know all the ads that are being pushed actually
are you know the kind of things that they're gonna come and lots of modalities
right be auditory soon your defensive system could take care that for you said you
know the all pass l one thanks very much
you know on the defence
by
and you are gonna have to talk to me first
no i don't get to you don't get to pass along here
you know what it is you trying to push and so on
so i realise that may not be in the interest of
of commerce but it may be easy to rest of the
the people who
you know would like to be helped by these parts rather than attack by
so i think i was a great suggestions
more
the all i mean it enters common but you know
david just before dinner
i think of the gordon not so i c
i also discussed the remaining earlier about the well trained system versus the intelligent systems
in kind of ties in sets in a more just question and what you guys
had a higher rate maybe sort of that neural plus symbolic approach would be best
and
so why do you think more people are working on this kind of approach now
i didn't say people working on it but
i think just to the point of
what should be could be looking at anything this is something that you know we
want to probably look into more believable
as opposed to you know just running behind and again i'm not think this is
happening but
this is the addition to kind of you know see this use dataset which is
that it and it's easy to publish on and this is easy to get for
instance their stance of low this can is despite darts so it's very easy to
kind of log in late models right now
and so yes we should probably do that but as long as the problem is
that motivated
but you know
that temptation apart it would be good the kind of
look at other aspects the problem that are not just statically plug and play
i think that going
last question
believe it today the tram
we're related to the you were so maybe a false dichotomy between pipelining and
maybe other alternate but
i mean in this slide i think
the real issues more modularity okay where it doesn't necessarily imply sequential process or not
it's a limited modular where
there is insolence usually both directions which makes a point or
but
for this set is the set of
goals you're saying it may maybe for simple task execution fairly limited enumerable but
when one h in dialogue with other people
real situations
we're usually thinking about multiple matches completing a single task so all the pieces of
language or for
user or one
versus there are also useful for finding this reason how much my
placing r c is giving
so relations
future work so the constraints first questions also
so
either these extremes is really getting that's
that's
like a travel agent you'll probably
i
constrained problem for ways but not just words this separate problem
simple examples
you think about like this
speech like is this you know in four or were question
it's not a separable from a propositional content fine
chance it's like functional transformation
after a little and g i let's you to constrain a be you can say
and you know what i think about speaker identification
okay
well thank you all for coming and i think we have a dinner next