Speech Transcript - The Future of Dialogue Research

okay

thank you for all state all first and late

an apology set

alan black wouldn't dryness

i'm phil collins from an f university

that you all introduce yourselves

everyone i'm become not item

and you can communicate comedy what my last name

and i work at

educational testing service research and development

where i work on

multimodal dialogue systems for language learning and assess

i and i said you try we from now google ai a working connotation ai

but also

a multimodal stuff but

vision about four and also in

efficient machine learning basic out you do

conditioning on like computer memory constraint

covers can so i am professor here at the age

but also co founder and chief scientist at or above x

spinoff company

it's h

developing social rubbled

that for

great

alright so i proposed a variety of

what i hope for

questions that would cause people to start thinking both about the field and also about

their own research

and trying to understand where this field it's going

can i make the text a little bit bigger right then it can read everything

but i can do that

about that

the back

"'kay"

well do that

the thought was

i hope will get to talk about all these because they're all interesting topics

the whole idea is to put everybody on the spot

in one sense

understand what it is we're doing here why we doing what we're doing

are we working on

the problem speak the problems that were working on simply "'cause" there's a corpus there

it's easy to work on a corpus that exists rather than either create your for

actually work on the hard problems rather than the problems that exist in this car

corpus

so the question is are working on the right problems that's the first question

will also want to talk about multimodal multiparty dialogues i wanna push the conversation into

somewhat more open space

where

very few people there are few people here in the room with thought about that

but not a lot of people

where they're our

architectures that we're building which tend to be type you know i do they are

pipelined or they're not pipeline then you know you should talk about

why it is we wanna do each of those

the next topic is why do i have to learn to talk all over again

why don't like be able to just have account you know why can't conversation speech

act and what not be something that domain independent that's related to the pipe one

question

the explain ability question has to do with well g d p r is an

interesting issue here

but if is to dialogue system

why did you say that

i like to get a reasonable answer out

so how do we get there and the last you know a very important problem

what are the important problems what would you tell your graduate students of the most

important like to work on next

okay and the last question is

okay think about

the negative side of everything we're doing

can you are technology or my technology their technologies be used for yellow for bad

interactions for robot calls that are interactive now

so lots of topics to talk about

we can kind of start with the first one

and then also down it shut up

so i imagine that a lot of work here on slot filling systems

so you ask your sis your system asks you what time you want me

and use at earliest time available

or you say what's the earliest time available when the system says six p m

and you say too early

so the system says seventy and so you say okay

notice the user didn't fill the slot the two of them together fill the slot

that's mixed-initiative collaboration et cetera there's lots of issues rather having to do with collaboration

are we only working on slot filling because the corpus is there

short would like to say

we do i guess everybody can be comfortable by some attacks

therefore nobody it i think it can keep it in track recorders

played the lead so show answers

just the dataset and metrics adding more than the dataset it's easy to evaluate and

for sure systems accuracy have on this one metric we're because we know the actual

values the true values and the precision recall single

but i also think that

it cannot be a slot filling system or the other extreme you know you go

all the way the logic and say it has to be a fully constrained the

system i think it has to be something in between and we have to be

flexible to adapt to it could go from a slotfilling to actually being understand okay

what slot

attributes or values can be actually changed morphed into something you know that maybe that

depending on some constraint for example temporal constraints right so the downside to going completely

constraint is there's no way we can you ever program all that logic

or even for the fact that like the system if you allow an automatically learn

system to you know in for that from corpus there's so many different possible ways

to infer that like i mean you're talking about this example like if you say

only i mean how many earliest time should i give you like seven p n

six fifty nine six fifty eight six fifty eight and sixteen learning work on something

like well i it doesn't necessarily right so which is why selects it has to

be something in between where you can

program and then it's okay to actually get some of these you know

heuristics or something where we say that okay

i'm looking at thirty second blocks are one minute blocks of thirty minute blocks

and then can be actually gradually x

you know sort of extent that are open it up to learning something more nuanced

i guess it depends on

what you want to do so if you want of restraint system poses an intelligent

system

nothing is really good coming up with belting systems you just give it a bunch

of dayton

you clean it really well but intelligence is something it so

i think that this is not a knock on any of these two things because

in some cases be do want between systems via be happy but between systems and

that's what we wanna look

but in other cases we might want without it

and

not that be really close to that but this you want to get

to something more which respects some kind of planning some kind of higher abstraction so

if you wanna go that route but it really depends on what we're talking about

just to build on

so i think this of course related to the corpora that are all there but

also like

what are the practical systems that people are building which are often these kind of

searching for a restaurant or something when you have the slots but

so i think i think it would be interesting to open up and look

completely different types of dialogue domains so i can give one track where their actual

are practical problem second when you want example so far as we are developing an

application with the robot performs job interviews

and the robot might ask the user so tell me about

a previous work you have already got the challenge that we manage to solve

so the answer to that question is not very well with a set of slots

that's you more it's quite hard but it is to come up what does that

slot structure look like so that kind all and then you that will also be

needed when we so open up to more application of the response we have now

i think would be very interesting to address is also perhaps not very see

to translate that to logic form a lower where an sql quick we're or something

there's something else that is needed there's some kind of narrative that is coming from

the user that you need to represent them that's what i one

so definitely would be interesting to try to but for doing that you have to

consider other domains i think

what'd

what did you think about the

the first talk this morning relative to

semantic parsing verses slot filling

that it was very interesting talk but it's more it's obviously if you have that

kind of queries you need more complex semantic representations and so on

we have different queries is a common way by a given the corpora we've collected

you know what random because the corpora doesn't exist because we define it that way

verses

you know you actually go travel at you have a conversation with a travel

and one would find perhaps of might be a little bit more

open ended in the way you

maybe

but it's like

it still perhaps the user at querying something getting some information on all the system

we sometimes as the other way around the estimates asking the user absolutely with without

sources so well

in fact the original

task-oriented dialog

with barbara rose his phd thesis in nineteen seventy four all the structure of task

oriented dialogue where the other way around the system is telling the user and you

something

we're trying to get the user to do something which of course are plenty of

examples

unlike arctic you had are added

change a tire

i just of the one more think that when we talk about this intelligence quite

often we sort of completely think that that's this one inflection point instantly the machines

are gonna learn how to reason and like you know understand everything i think one

sort of nugget i want to mention is that

whatever form logical form or anything else that we're gonna use being the important part

is to see you mentioned collaborative right is the on language understandable by the system

may not even generate like proper stuff right but is it understandable by the human

on the other side read and allow them to you know get to the you

know a better state and towards that and i think like

we're not going to see like you know one system trained on travel domain subtly

doing something

amazing in a completely different domain but i think we should start paying attention to

these because everything is machine learning the user's how well it systems doing and multiple

domains right i mean start like generalising

and think about the generalizability aspect when you're proposing models as well and also abstract

location so that it to than the third in the fourth question

to my whining i don't the head okay i that's okay well

i don't sick

okay

so there's a lot of obviously in the intended trained systems

where training dialogue system in addition to the language processing

and some of the slot filling systems we're doing exactly the same thing

which means you're dialog engine is

is basically start with that domain

and now you're gonna get a whole bunch a new kinds of domains and certainly

my dialogue system doesn't how to talk anymore

i don't know how to perform a request to understand the requested maybe there are

two kinds of speech act

that are coming in

we saw this morning as a lot you know in the semantic parsing they're trying

to deal with that huge amount of she mentioned

as mere element and is a lot of variability in a language

but i submit is much less variability

in what happens to people's goals in the course of a

in general you tend to achieve them you achieve the you fail you try again

you augment what you're trying to do you replace what you're trying to do et

cetera because actually i my suspicion is it's a relatively small state machine

why seven both of those together what can i figure out one through machine really

one or any other method

and then deal with the all the variability in the language in a pipelined fashion

versus train it all at once

please i guess i mean the

i agree i mean it's something reasonable to separate these things like this

the motivation for parameter and learning is that you wouldn't have to have any knowledge

about this

representations in between so gonna have to have a lot of data so that the

data but you don't need to know so much so i don't have a lot

of data happen is that

no that's the problem

i mean go one thing for go with the rest for the rest

in the standard as counteract so i think there is i mean that to that

of end-to-end learning systems rate i mean they're end-to-end learning system but we say that

all these components which are not pipelined fashion we can just gonna get rid of

all of them and they can and the input and the final output

in some settings i mean i would argue that you might actually have more data

for that then the individual components right like for example speech-to-text a right then you

know

all these fanatics annotations an intermediate you know annotations at all different levels in the

system might actually have just the speech signal and the you know they're transcribed text

or some response

that might actually be easier to obtain and indoor settings i would say the into

an systems at least

given enough amount of data have actually in recent years provements and this is not

just be planning i mean as the technology walter gonna see improvement in that like

the recognition error goes down now the question is when do not do you don't

have to do end-to-end learning in every scenario raymond there is also like okay you

know i

every into and learning system is not going to solve the error propagation problem right

and then you might actually creating more issues because no you don't know how to

debug the system there too many hyper parameters and like you have to deal but

that that's actually a worse problems in some settings then actually you know just fine

data just do the input and output annotations so i think it depends on the

use case like

if you have to prove the system or if their individual parts of the system

that you need to actually sort of transfer over to a different domain or for

other systems where you need that output not just like the last but by like

something intermediate like for example

it can be argued syntax is not necessary for every not task or domain using

howling when the last time you actually so part-of-speech tagging paper in the recent years

or even a parsing paper for that matter if you see the number of a

percentage of a present is yellow re mlp or not collide means going down dramatically

but doesn't mean that that's important to not important ready made exactly important depends on

what you trying to do with that pretty using the dependency parses to do something

in me do some reasoning over the structure substructures it is useful to generate a

doesn't know what it on the other hand that's just a precursor to peer into

an anti r machine translation system

it's arguable that that's not necessary

for the matrix that we're talking about parameter got automated metrics

again that does not mean you're gonna solve that we have to solve those problems

are used i can take models

any depends so well on what you're trying to use a system for

in some sense it's kind of a balanced rate so

typically for example but we are kind of

so this to take a specific example of what we're doing

however when we're trying to bill so

really building language learning module the building specific goal-oriented systems task-oriented systems a specific skills

this thing see

fluency of pronunciation or grammar or specific aspects of ground so

so how do you go about and this is the whole so how question but

you raised earlier which is about

you know how do i build these generalisable systems are how to a kind of

you know

use the same pipeline across these different

ceiling is similar tasks but there

probing each probing different things

so you start out with something perhaps which is because it's a limited domain you

don't have much data anyway

i have started more expert knowledge

and then start collecting data

to wizard-of-oz or some kind of outsourcing with some of the matter

and ultimately get more data that you can kind of build a more hybrid kind

of system

which could either be end-to-end but also be informed by

not that one so

that's

that's one way to

i guess what what's problem kind of look at

different points along this hybridization spectrum a combination of data to one another driven approaches

for

have implications for how your pipelining a system in training the forces

well i certainly don't agree

while you guys

but you know some of the techniques

for instance are not gonna be particularly

appropriate for certain types of tasks

so for instance i think attending to a knowledge base forces

computing actual complex query those two things can actually be very different

frontal use a probability comparative and things like that

it's not obvious to mean attention might solve

i guess that's related to the first question that you will probably addressed the kind

of dialogues that you can still with this

method and the other ones you will not address

so that's score so that the risk of

where this research is going as we just keep drilling into the problems that we

started with in we and not expanding or to go

talking about expanding this goal

i want to talk about

or have you guys talk about multimodal dialog so i've got

not just

the speech but i but other modalities and their coordinated in interesting ways

and about multiparty dot

which guys

take

any of your favourite speakers and stick it in a family stick an indoor environment

family not have a conversation with your family

and that device

and it can track conversation amongst multiple people what time you want to be a

merry want to be the month at three o'clock mary's is no i don't

okay so what the system into

what's representing as to what happened in that i'll we do that men

do we have any representation of cool what's the belief state

that we've seen in all these the

all these papers is there any notion of believe actually going on

the idea i mean there's a huge amount of thing to break open once you

start what within the multi party set and just there's the physical situation had actually

having a robot or gonna look at lex's physically situated it got a camera on

and i'm sure they have that right and it's

and it's can see what's going on in the room you can see who's talking

it was talking to consider you know if you allow

what do you to track out of all of that house is gonna actually helpful

family rather than just

and individual bunch of individual conversations

this is a whole rate better bigger space what we've been dealing with how we're

gonna go

well really worry about the multimodal multi party

adaptation so this is still very the this is the kind of dialogue that we

are trying to model with for a for example what you have multiple people on

one problem there is as you say sort of the

this

the belief states or sort of typically you think about a bit that's what does

the user

one

up to this point or what have agreed to this point but if you have

to people the might be of course to different states

so if the two people are ordering and one say

i would like a bird around the other once s like me to

but not with onions or something referring to that and you have to keep track

of course of what the two different person someone that sometimes of dialogue

it's also

you can't just are presented as individual adults it's common like we want to do

this we would like to do exactly so that maybe you should have like three

different representation one is what we want and one is

i one on the other one

the goal is to come to a consensus but this is i mean it's are

watering things you could have different things and so long so it could be a

mix of course

and that thing that you can refer to what the other person is saying

but also of course is to say if the two people are talking to each

other to what extent the system listening to that which is probably has to form

a part of real data part of the we

right if it's part of it's all of us together are trying to solve this

problem

what we're gonna happen what we're gonna order in more when we're gonna go out

for whatever

we then the system has to be part of this collect

and you have to have what we used to call in today's joint intention

we're trying to do together

but how we're how would you guys think about

this problem

a multi-user problem i guess the other thing to add to the mixes the multi

modality of things right so absolute so for instance

when you have audio video

which one be within two first and how do you how do you to choose

priority

and of course is unknown situation that something

that

it's is just

just missus usually is i

so and this also what we found is that the so

maybe looking largely the education context for this kind of thing the teacher training or

something that you looking at

for instance a person interacting with you know

a teacher interacting with this

you know able to a class of student outcomes

you know if the teacher dismisses one student how are you know you know

is the student or is one of the students to

so suppose they say for instance you like a low the in great but i'm

pointing in that direction so who does the system you know attend to work as

it into my speech is it into my just to

and this is always that kind of

or buckets may or but

try to positive spin to that i think we are at this stage we can

do belief tracking for sure that it is not at the level at be wanted

to generate cannot but i believe we have developed system are very close to

the technology that the point where we can actually do joint inference or video audio

and textual signals where we can actually disentangle you know between different entities all you

know corresponding at the same time and we can do the set scale

you could do that but then how do you

relatively prior knowledge of the simulated user the second point where i mean i'll give

you a different scenario like that so we do this

imagine it's not just like you know collaborative but we are i you know you

can actually attribute that to a specific entity what if it's a parent and child

mel whose preference you take into account the channels as a play the cartoon network

and look for twenty four hours right for example women alexi do that store who

will do this obviously there's a preference here like in the parents have to sort

of winter

the very tricky situation and it might not be as easy as like that in

some sort of a general-purpose model that says you know these are the entities and

like there's one model for k there are two people interacting and they have a

joint intend to write it might be customisable powerhouse over or you know set of

people and these might all vary across different sets of people at put together

and the relationships between them as well so all these things have to be factored

in right i'm into at the challenging mixer problems

but

simple thing is we don't have to line everything right i mean like one suppose

everybody things like machine learning we have to relearn everything you can just ask the

user for preference for a time you could just a person thank you are people

tell me what's your preference or just manually enter it like in an a or

whatever it is right i mean that's is that just one bit is enough to

sort of bootstrap the system or at least locking bunch of variables right which you

know would have cost a lot of confusion downstream

there's still hope i mean there it's

have to be this interactive mode not this system observing a bunch of things and

learning and then like certainly starting to do the writing of a point in time

alright i'll move

we finish what time

six about

and we

okay and i think we wanna have a fixed

so giving an audience participation

so i will try to move along with some of the other

questions

and

but the next one

that i had in mind was explained ability

okay so we have always lovely machine learning systems

you ask any of them why did you say that what do you get

not

okay

now the system could make up

white said that but you actually want white set it to be causally connected to

what it actually

so what

kind of architectures can you imagine

that will gain hours

explain ability

in the general case

whom like this

i mean

first the question is do you as a user really need to be able to

ask that i mean are us to use are interested in what the system i

did you recommend that i think it is a dialog assign a definitely want to

know it's but then the question is do you have to get the answer to

talk about restaurant we wanted me to go to

you give me recommendations s a y okay

so in that case like this

i didn't you suggest that

and i think that this not of course if it's if it's learn julie

and

i between a and especially then you have to build a dialogue

around that so whatever you where you're building your dialogue you have to train a

dialogue on explaining

dialogue

there you might not have that data

well that part of the point is

i just it's just offer a counterpoint to get your really are so for instance

in education this is really important so you if i'm and this is true for

had this but mental health and any other found that so if i and perhaps

radix as well

so if i you know telling operation that you know what you have depression but

seventy five percent probability you probably want to them what is what

they probably want to know why or why you can plug conclusion

are the same thing with the but someone what you're saying all you know what

you're this your fluency score is nine out of ten or

four out of ten by is it for i work and what we need to

improve

so in those kinds of case is really important having said that i think there

is an increasing body of work in the em in literature especially for those interested

in end-to-end models

to and

you know similar deep learning models really look at interpretability using a variety of techniques

and i think it is that has been relatively unexplored in the dialogue community but

i think we should really

this is one of those things i would really at two i think one of

those questions a little bit is what would you ask your graduate students or next-generation

exactly one and interpretability but there are several techniques so the techniques that

try to probe deep neural networks and trying to figure out what inputs are the

most salient that you know lead to classification

the techniques that look at

visualizing neurons the techniques that look at visualising memory units

and all the way up to so this is in terms of model interpretability but

input

but even in terms of feature interpretability but you believe that will actually get chewed

up to a comprehensible

explanation to an actual in user

not have them but so you wanna say something

just gonna say that my point is gonna be about

just because we say that a network is explainable doesn't mean i mean depends on

you know who is looking at it right i mean if it says okay activation

number for three sixes firing and that's causing like the positive class to go up

by probability x right

to the ml engineer scientist was actually think this model all great okay now go

to fix it or you know like do something to but i think what probably

more interesting it's lee at least for nlp and a lot would be like are

there is some high-level abstractions or even you don't have to you know incomprehensible

i sense that it can actually find in the let's eight knots alignments right where

these sets of examples of like are basically leading to the same sets of outcome

right i mean at higher level right so that higher level at time t right

you could be of the phrase a level i could be at the semantic level

but obviously a single higher i mean

bending unexplainable system would then become as hard as actually generating before system itself right

so then

and so this is while i think the field has to go hand in hand

but like you know the modeling work and also all the other work and applications

well the vision community if you like has like advance for their in this respect

and the lp community not just for probing networks and looking at activations in even

learned approaches where you actually backprop to the network and

look at regions and like you know sort of find like learn in online fashion

which regions actually and what ceiling natural colours et cetera our triggering certain types of

behaviours and sort of interpreting back from in an discrete fashion like it's a colour

map or like in a certain types of object patterns around or you know like

triangles et cetera

i think we want to see more that nlp community getting the most interesting words

that i've seen in the recent past like you know more of the probing type

where you have these black box networks and the other methods are actually trying to

providence you okay where they're gonna feel when are they gonna fit right and you

be very surprised

some of the state-of-the-art systems you just change one word in the input utterance and

suddenly it'll flip the probability so there's a lot of women lineman other types of

method which are looking at these things so i think explained ability and interpretability go

kind of hand in hand

for realizing consumer that you need to explain it

it's not just

probably nor on

and so i think we actually need to come that's and groups and there are

many people in a room we've worked on this problem

in the past in its time i think that certainly

in the learned systems need a figure out how they're gonna do this because

it you don't the european can you will

just the point i think the good news though is that i mean if you

see the number of papers on this topic right you know over the last just

two years i mean this is a very encouraging sign rate so it used to

be like a who wants to actually talk about explains as i just built the

system it does state-of-the-art you know like x y z

and now i think for grad students i think it's a very interesting and very

exciting field to be part of okay so that's the next question what's the most

important thing people are to be working on the right

i have my data

you've got

so i mean to start with i think it's very important that

people work on different things so

so we have a lot of different approaches but we can compare sum up everyone

does similar things

i also think sort of the

in the intersection between dialogue

speech and multimodality and so on because this arcane still separate feel so

i mean if you look at

this to google duplex demo for example that god's a lot of attention on people

for that while this sounds really human like

so if you look on a sum

dialogue

pragmatic level if you make a transcript out of that

it's not the very sophisticated dialogue the model but the execution

is great i we don't know if that was a sharp picked example but as

it sounds at least it sounds fantastic so be able to actually execute the dialogue

in a way that the has that kind of turn taking and that kind of

conversational speech synthesis and so on

using a model of the dialog a i think that something that is

are explored in both the speech and the dialogue community

explain ability is

super important

would say that

i mean this sounds like there's so many factors associated or like multiple areas associated

with this building more system so that we can make the system's less brutal the

number of ways to achieve this rate and

that's a very important topic and you can deduct a number of ways from the

ml community from like in injecting more structured knowledge one of the things that all

these things lead to in my been in is like

not just for generation but all the other aspects of dialog really research problems

what are the min viable sort of nuggets of knowledge that we have to encoding

the rain or the system after encoders that it can learn to generate well i

can then do recognise do the slots in turn spell it can be transferred to

a new domain so

is that like what is the equal and of a knowledge graph right i mean

for like different dialogue systems i mean that we can actually sort of we can

all agree on so i think if we come up with like some sort of

a shared representation of that i mean which is interpretable to at least to some

extent then i believe

you know we can actually make even more for the progress right of course it's

a hard problem right i mean and dialogue is like one of the hardest problems

in and that's language as well so

it's not just for looking up is what i'm talking about is like what are

the things about like you know the channel well right i mean it doesn't have

to cover hundred percent even like twenty percent of the knowledge can be encoded in

the concept space and relationships between them such that i know this now for a

new domain i might have to just

get like access to very small amount of training data or like learn a little

bit more do sort of market into existing concept or like sort of augmented by

existing concept you know database

i think that's

a super interesting thing and this could be multimodal as well it's not just about

like you know language it's about like

what are the visual concepts i need to keep in mind right i mean the

taxonomy of like objects relate to each other if i see a chair in forever

table i mean i know you know what is the positional relevance between you know

different things

all these spatial coherence all these sort of thing freedom and so what are the

mean mobile sets of relationships and you know concept that we need to one

but better dialogues

since gabriel and since you have already covered buns of things and say something complementary

to that but add to this because i think these are really interesting problems and

it was

gonna at least my list anyway

i just add that the

working on low resource problems

so for instance we already we always

well

so this is in terms of languages domains

and even you know the kinds of data sets that we kind of cv we

didn't do or what train and this is been this is nothing new everyone where

you're knows about this we all what we can do over trained on the restaurant

data sets of the cambridge datasets a good reason of course because the publicly available

but that's

that's one thing but

you know

apart from plano get more data sets and that's obviously one of the things we

want to do but

you know can be look into how do we do minute that

i don't this work already going on but perhaps more intense there's a lot of

work on c one shot

but trying to you know

look at the better ways of adaptation better ways of working on new domains

that with limited resources

a given the existing resources perhaps using

you know since but you know it begins by very techniques for machine translation or

some other

some of these other sister feels that

you know we might not think of immediately but for instance

this is starting to come up a lot more

trying to use data which you know

i kind of unconventional for dialogue what might be a useful for bootstrapping is kind

of low resource settings

that might be

also something very interesting and useful to look at

and especially for underserved domains so okay coming back to my to madison education

these are not necessarily the climate is how may i help you or you know

looking or those kinds of

domains but i think there's to you know this is where you have a lot

less data but still

might be useful to kind of

one thing we have very large loud structure maybe global don't it's block structure to

the group

and

and then

that's all unique

it's just the known structure and after that you already know how to have a

cons you know what objects are you know with the actions are you know what

the verbs or you know what they're preconditions and effects are why do you need

anymore

but i mean dialogue constantly able some the well unreasonable has a file that is

why don't why do we need any more than just

a change and knowledge

i don't need a big corpora "'cause" already learned head

or in that got a huge vocabulary have that all these vectors

one like just change the knowledge base

then how because be to make it you know what's

who needs universal just give me a alright i'm gonna do

cancer diagnosis or i'm gonna do

architecture where i'm gonna do whatever you know take arbitrary size

i was just a great so for each of those domains you need that lack

knowledge base and i

i think i like that everybody may precision and that's what they're

okay

but even if the knowledge bases let's a huge and static reasoning over that is

in keep changing rate i mean the same knowledge you might interpreted differently you know

sometime later as it was would you doing right now it could be because our

methods are not sophisticated enough or

you know be basically some new information pops up i mean the fast a the

same but you know the way you look at that changes over time right i

mean

and one give users about example for this but i think

i don't think the problems are gonna go away anytime soon if anything the machine

translation "'em" even the low resource setting

this is existed for several decades right i mean i mean number of not make

a similar to what he an unsupervised machine translation like now we use starting to

see okay that more system actually scalable systems working this domain and it's i think

that feels all and all the ml all a computer vision

has this tendency to okay we focus on like the solvable immediate big crunch and

problems and then you try to simplify are then like you know extent to the

zero shot setting extent to you know or so sitting but it's not be starting

from scratch all the stuff we learned about image method i mean convolutions are still

them single useful most useful blocks that you're transferring over a and foreign language i

would argue like over the last five years

attention seems to be a common i get that seems to be trendy can have

thousand variance of these networks but there's specific concept that even if transferred onto new

problems right now you build models so

hopefully these also would transfer you know as we start looking at you problems are

extensions of

well conceivably we should be thinking more about grand challenge problems but is going just

usually a alexi challenge but

larger ones you can get governments to support

but you know governments now we're gonna start asking us there's last quest

which is

so you built this wonderful technology

and now i'm getting phone calls the user interactive phone call that are trying to

get me to do stuff

either by stuff

or in the worst case commit suicide or you know a variety of activities

and these are by doing this

and they understand language pretty well

and they are

there enough to cause some people to be convinced

that they're dealing with the a person

and even as far back as the a light there were people are convinced about

the human this of that but these are you know who knows and letting these

things lows

how do we start that and ask

you know we've seen that we see what happen in computer vision where people were

really paying that much attention

and certainly it's being this

how do we prevent are technology phoneme is you

obviously it's our problem

suggestions

and then we'll turn over to the floor for any

you know will have enough time for twenty minutes questions

as only ten minutes

so you know obviously can do regulations that

bots always have to say that there were able but the

that would not will not stop people from doing that possibly

so adversary older networks

generated you know if the need for a year you're gonna have steve fakes in

language processing and dialogue processing of wherever successful

in that it might also come to stage where i don't pick up the phone

calls myself anymore but it's under your by six mile bit makes it up in

order to see if it's about corpsman

and they were talking to each other violent argue that is

try to convince my but that it

i don't know but that it actually happen that i mean it does so i

don't have take michael's but the local system takes the call for me

which might be nice even if it's a human coding like having an secretary

so and that could also be annoying so that in another way because the technology

might not work so well in the to start with so you spouses falling and

guess

your part sphere text you and it might it might cause system from millions correct

extra

so these are other problems also

so i think with every technology i guess like

they're both sides right eigen this example you said like pots talking to other bartending

i mean be awake those are then we think no they can and the generations

or at least for some of these things are super a good that don't have

the time the natural language exactly me just know the right keywords or trigger words

and it can now imagine one if you're box has access to critical account and

like the other what's a stock and then the code of the order you know

like this like eighteen hundred dollar stuff right and

it doesn't at a confirmation because the predicate info is already on so i think

there like blog sites the both of these things right so but one thing i

would say is

we can like just work on the research of like you know improving the dialogue

systems the recognition the machine learning and then sort of ignore or like sort of

re actively you know sort of go back or because of g d p or

something and go back and look at this problem track so this is also opened

up new research in other fields right i mean and tested we can still process

the bottom always gonna get better it's like spam right i mean

you know the you have to their multiple ways to deal but that's rate of

research also has to be like sort of state-of-the-art in terms of like how to

deal with either zero so there are methods which actually now try to improve i

mean

take the adversarial in flip it and try to improve the robustness of the system

basically using the same kind of adversary technique but like in a reverse way when

you know the gradient in the other direction of during training time

one way to look at it

in the commercial systems like should be make the so the money p-value or the

like number of tries these bots get like sort of increasingly more challenging or like

you know the amount of course like many of these are generated you know thousands

of times a day and also generated right so if there's that wonderfully cost to

that

how these companies won't exist right or they will actually change the strategy so there

are different ways of looking at these problems like them in the cost effectiveness the

research one thing is

i don't think it's gonna go away and i think that's if we solve this

like you know that was no problem right now be fixed towards can be something

that's it's a continually changing problem one example is like when we released like some

of the systems like you know it's multiply et cetera was people don't know we

have to it too "'cause" wait longer to actually build systems to actually

detect sensitive content the messages because you don't want any of these smart system to

say something stupid you'd rather not say anything man and you know traders be smart

and suggest responses and that's a continually evolving problem right and its cultural it's you

know depends on like the language so many different aspects to like

so it's a very hard problem but better i mean those i think research also

has to look into these aspects and like sort of

going back to the psd is what kind of problems your work on thing we

have plenty of problems that are uncovered by the advances we made in the last

ten years writer is opening up like new areas for research as well so

it's a constantly evolving challenging

okay let's point we one open it

okay let's open

we got a mike

we got a question

i feel

so i just want to fall on the explain ability discussion

i think one useful nuggets from watching be asserted that like video this morning is

that the all the users in that skit didn't trust region a set on not

sure about that

and it may make you think that russ is also very important for explanatory

and i was wondering more specifically

if the panel things that symbolic

representations are necessary for

modeling that sort of explain ability

the structure for

are we gonna the mean for the connectionist as a compared to connectionist models that

we see today and then the role approaches

well i think you can have both

really

it occurs to me to use

you are to be able to

training no

neural system with but ai planning system

and then you've got a very fast executed neural system planning to can explore much

bigger space and people can and then you actually have when you ask a wide

you say that then you go back remote the planning system where essentially it's going

to therapy in figure now why what i've said that

right because there are causally connected you could imagine them

actually producing the representation encoded to train it to do

that would be my

that's what am i get the answer questions

okay

so i think one more aspect about the trust is i mean

do the user's trust the devices or like the technology itself right and in one

interesting area that's i think fast case right now or like it's gonna be of

increasing importance as privacy preserving i and

the notion is whether you know data level there is on the device or you

know what is shared you know to the color who can access it like i'm

ideally percent where trust the veracity of the information that's coming back

all these are interesting aspect right i mean i mean in addition to the symbol

again initialize like the links the dimension i think this is going to so to

be even more important in the coming years because like

phone is where your most of the time these days right i mean that's not

gonna change its if anything it's only gonna get worse right so and you interacting

with these voices systems it like probably added exponential rate if you have one of

people and you have an unplugged so i

well i don't know as can be irritating sometimes right so which makes people do

this

so i think that's also an interesting and very useful aspect of trust and then

there's a like a elevator version of that like

regulations in gtd are like and imposing like in making sure like

there are third party sources it's which can verify this information right and it's not

just one central entity that you know is being out and you believe everything right

The Future of Dialogue Research

Panel

Phil Cohen - Moderator, Vikram Ramanarayanan (ETS), Sujith Ravi (Google), Gabriel Skantze (KTH)