Speech Transcript - The Future of Dialogue Research

0:00:27	okay
0:00:29	thank you for all state all first and late
0:00:32	an apology set
0:00:34	alan black wouldn't dryness
0:00:37	i'm phil collins from an f university
0:00:40	that you all introduce yourselves
0:00:43	everyone i'm become not item
0:00:45	and you can communicate comedy what my last name
0:00:50	and i work at
0:00:52	educational testing service research and development
0:00:55	where i work on
0:00:57	multimodal dialogue systems for language learning and assess
0:01:02	i and i said you try we from now google ai a working connotation ai
0:01:06	but also
0:01:08	a multimodal stuff but
0:01:10	vision about four and also in
0:01:12	efficient machine learning basic out you do
0:01:15	conditioning on like computer memory constraint
0:01:19	covers can so i am professor here at the age
0:01:26	but also co founder and chief scientist at or above x
0:01:30	spinoff company
0:01:32	it's h
0:01:33	developing social rubbled
0:01:35	that for
0:01:36	great
0:01:37	alright so i proposed a variety of
0:01:41	what i hope for
0:01:44	questions that would cause people to start thinking both about the field and also about
0:01:50	their own research
0:01:51	and trying to understand where this field it's going
0:01:57	can i make the text a little bit bigger right then it can read everything
0:02:00	but i can do that
0:02:03	about that
0:02:05	the back
0:02:07	"'kay"
0:02:10	well do that
0:02:12	so
0:02:13	the thought was
0:02:16	i hope will get to talk about all these because they're all interesting topics
0:02:22	the whole idea is to put everybody on the spot
0:02:25	in one sense
0:02:27	understand what it is we're doing here why we doing what we're doing
0:02:32	are we working on
0:02:36	the problem speak the problems that were working on simply "'cause" there's a corpus there
0:02:42	it's easy to work on a corpus that exists rather than either create your for
0:02:47	actually work on the hard problems rather than the problems that exist in this car
0:02:51	corpus
0:02:53	so the question is are working on the right problems that's the first question
0:02:59	will also want to talk about multimodal multiparty dialogues i wanna push the conversation into
0:03:06	somewhat more open space
0:03:09	where
0:03:10	very few people there are few people here in the room with thought about that
0:03:13	but not a lot of people
0:03:17	where they're our
0:03:19	architectures that we're building which tend to be type you know i do they are
0:03:23	pipelined or they're not pipeline then you know you should talk about
0:03:27	why it is we wanna do each of those
0:03:32	the next topic is why do i have to learn to talk all over again
0:03:36	why don't like be able to just have account you know why can't conversation speech
0:03:40	act and what not be something that domain independent that's related to the pipe one
0:03:44	question
0:03:47	the explain ability question has to do with well g d p r is an
0:03:51	interesting issue here
0:03:53	but if is to dialogue system
0:03:56	why did you say that
0:03:58	i like to get a reasonable answer out
0:04:01	so how do we get there and the last you know a very important problem
0:04:06	is
0:04:07	what are the important problems what would you tell your graduate students of the most
0:04:11	important like to work on next
0:04:14	okay and the last question is
0:04:17	okay think about
0:04:19	the negative side of everything we're doing
0:04:22	can you are technology or my technology their technologies be used for yellow for bad
0:04:29	interactions for robot calls that are interactive now
0:04:33	so lots of topics to talk about
0:04:37	we can kind of start with the first one
0:04:39	and then also down it shut up
0:04:43	so i imagine that a lot of work here on slot filling systems
0:04:47	so you ask your sis your system asks you what time you want me
0:04:51	and use at earliest time available
0:04:55	or you say what's the earliest time available when the system says six p m
0:04:59	and you say too early
0:05:02	so the system says seventy and so you say okay
0:05:05	notice the user didn't fill the slot the two of them together fill the slot
0:05:10	that's mixed-initiative collaboration et cetera there's lots of issues rather having to do with collaboration
0:05:18	are we only working on slot filling because the corpus is there
0:05:24	short would like to say
0:05:29	we do i guess everybody can be comfortable by some attacks
0:05:32	therefore nobody it i think it can keep it in track recorders
0:05:38	played the lead so show answers
0:05:42	just the dataset and metrics adding more than the dataset it's easy to evaluate and
0:05:46	for sure systems accuracy have on this one metric we're because we know the actual
0:05:50	values the true values and the precision recall single
0:05:54	but i also think that
0:05:56	it cannot be a slot filling system or the other extreme you know you go
0:06:01	all the way the logic and say it has to be a fully constrained the
0:06:04	system i think it has to be something in between and we have to be
0:06:07	flexible to adapt to it could go from a slotfilling to actually being understand okay
0:06:12	what slot
0:06:13	attributes or values can be actually changed morphed into something you know that maybe that
0:06:18	depending on some constraint for example temporal constraints right so the downside to going completely
0:06:24	constraint is there's no way we can you ever program all that logic
0:06:28	or even for the fact that like the system if you allow an automatically learn
0:06:32	system to you know in for that from corpus there's so many different possible ways
0:06:37	to infer that like i mean you're talking about this example like if you say
0:06:40	only i mean how many earliest time should i give you like seven p n
0:06:46	six fifty nine six fifty eight six fifty eight and sixteen learning work on something
0:06:51	like well i it doesn't necessarily right so which is why selects it has to
0:06:56	be something in between where you can
0:06:59	program and then it's okay to actually get some of these you know
0:07:02	heuristics or something where we say that okay
0:07:04	i'm looking at thirty second blocks are one minute blocks of thirty minute blocks
0:07:08	and then can be actually gradually x
0:07:10	you know sort of extent that are open it up to learning something more nuanced
0:07:17	i guess it depends on
0:07:19	what you want to do so if you want of restraint system poses an intelligent
0:07:23	system
0:07:24	nothing is really good coming up with belting systems you just give it a bunch
0:07:27	of dayton
0:07:29	you clean it really well but intelligence is something it so
0:07:34	i think that this is not a knock on any of these two things because
0:07:37	in some cases be do want between systems via be happy but between systems and
0:07:41	that's what we wanna look
0:07:43	but in other cases we might want without it
0:07:46	and
0:07:47	not that be really close to that but this you want to get
0:07:50	to something more which respects some kind of planning some kind of higher abstraction so
0:07:58	if you wanna go that route but it really depends on what we're talking about
0:08:01	just to build on
0:08:04	so i think this of course related to the corpora that are all there but
0:08:09	also like
0:08:10	what are the practical systems that people are building which are often these kind of
0:08:15	searching for a restaurant or something when you have the slots but
0:08:19	so i think i think it would be interesting to open up and look
0:08:25	completely different types of dialogue domains so i can give one track where their actual
0:08:30	are practical problem second when you want example so far as we are developing an
0:08:36	application with the robot performs job interviews
0:08:39	and the robot might ask the user so tell me about
0:08:46	a previous work you have already got the challenge that we manage to solve
0:08:51	so the answer to that question is not very well with a set of slots
0:08:56	that's you more it's quite hard but it is to come up what does that
0:08:59	slot structure look like so that kind all and then you that will also be
0:09:06	needed when we so open up to more application of the response we have now
0:09:10	i think would be very interesting to address is also perhaps not very see
0:09:15	to translate that to logic form a lower where an sql quick we're or something
0:09:21	there's something else that is needed there's some kind of narrative that is coming from
0:09:25	the user that you need to represent them that's what i one
0:09:30	so definitely would be interesting to try to but for doing that you have to
0:09:35	consider other domains i think
0:09:38	what'd
0:09:40	what did you think about the
0:09:43	the first talk this morning relative to
0:09:46	semantic parsing verses slot filling
0:09:51	that it was very interesting talk but it's more it's obviously if you have that
0:09:58	kind of queries you need more complex semantic representations and so on
0:10:05	we have different queries is a common way by a given the corpora we've collected
0:10:11	you know what random because the corpora doesn't exist because we define it that way
0:10:15	verses
0:10:16	you know you actually go travel at you have a conversation with a travel
0:10:21	and one would find perhaps of might be a little bit more
0:10:24	open ended in the way you
0:10:26	maybe
0:10:28	but it's like
0:10:30	it still perhaps the user at querying something getting some information on all the system
0:10:36	we sometimes as the other way around the estimates asking the user absolutely with without
0:10:41	sources so well
0:10:42	in fact the original
0:10:43	task-oriented dialog
0:10:46	with barbara rose his phd thesis in nineteen seventy four all the structure of task
0:10:50	oriented dialogue where the other way around the system is telling the user and you
0:10:53	something
0:10:54	we're trying to get the user to do something which of course are plenty of
0:10:57	examples
0:11:00	unlike arctic you had are added
0:11:02	change a tire
0:11:04	i just of the one more think that when we talk about this intelligence quite
0:11:08	often we sort of completely think that that's this one inflection point instantly the machines
0:11:13	are gonna learn how to reason and like you know understand everything i think one
0:11:18	sort of nugget i want to mention is that
0:11:21	whatever form logical form or anything else that we're gonna use being the important part
0:11:25	is to see you mentioned collaborative right is the on language understandable by the system
0:11:30	may not even generate like proper stuff right but is it understandable by the human
0:11:33	on the other side read and allow them to you know get to the you
0:11:37	know a better state and towards that and i think like
0:11:41	we're not going to see like you know one system trained on travel domain subtly
0:11:45	doing something
0:11:46	amazing in a completely different domain but i think we should start paying attention to
0:11:50	these because everything is machine learning the user's how well it systems doing and multiple
0:11:55	domains right i mean start like generalising
0:11:57	and think about the generalizability aspect when you're proposing models as well and also abstract
0:12:01	location so that it to than the third in the fourth question
0:12:06	to my whining i don't the head okay i that's okay well
0:12:14	i don't sick
0:12:16	okay
0:12:19	so there's a lot of obviously in the intended trained systems
0:12:25	where training dialogue system in addition to the language processing
0:12:29	and some of the slot filling systems we're doing exactly the same thing
0:12:33	which means you're dialog engine is
0:12:36	is basically start with that domain
0:12:39	and now you're gonna get a whole bunch a new kinds of domains and certainly
0:12:44	my dialogue system doesn't how to talk anymore
0:12:47	i don't know how to perform a request to understand the requested maybe there are
0:12:51	two kinds of speech act
0:12:52	that are coming in
0:12:55	we saw this morning as a lot you know in the semantic parsing they're trying
0:12:58	to deal with that huge amount of she mentioned
0:13:02	as mere element and is a lot of variability in a language
0:13:05	but i submit is much less variability
0:13:08	in what happens to people's goals in the course of a
0:13:12	in general you tend to achieve them you achieve the you fail you try again
0:13:18	you augment what you're trying to do you replace what you're trying to do et
0:13:21	cetera because actually i my suspicion is it's a relatively small state machine
0:13:27	why seven both of those together what can i figure out one through machine really
0:13:31	one or any other method
0:13:33	and then deal with the all the variability in the language in a pipelined fashion
0:13:42	versus train it all at once
0:13:45	please i guess i mean the
0:13:48	i agree i mean it's something reasonable to separate these things like this
0:13:52	the motivation for parameter and learning is that you wouldn't have to have any knowledge
0:13:56	about this
0:13:59	representations in between so gonna have to have a lot of data so that the
0:14:02	data but you don't need to know so much so i don't have a lot
0:14:05	of data happen is that
0:14:07	no that's the problem
0:14:09	i mean go one thing for go with the rest for the rest
0:14:13	in the standard as counteract so i think there is i mean that to that
0:14:19	of end-to-end learning systems rate i mean they're end-to-end learning system but we say that
0:14:22	all these components which are not pipelined fashion we can just gonna get rid of
0:14:26	all of them and they can and the input and the final output
0:14:30	in some settings i mean i would argue that you might actually have more data
0:14:33	for that then the individual components right like for example speech-to-text a right then you
0:14:39	know
0:14:40	all these fanatics annotations an intermediate you know annotations at all different levels in the
0:14:45	system might actually have just the speech signal and the you know they're transcribed text
0:14:48	or some response
0:14:50	that might actually be easier to obtain and indoor settings i would say the into
0:14:55	an systems at least
0:14:57	given enough amount of data have actually in recent years provements and this is not
0:15:01	just be planning i mean as the technology walter gonna see improvement in that like
0:15:06	the recognition error goes down now the question is when do not do you don't
0:15:11	have to do end-to-end learning in every scenario raymond there is also like okay you
0:15:15	know i
0:15:16	every into and learning system is not going to solve the error propagation problem right
0:15:20	and then you might actually creating more issues because no you don't know how to
0:15:23	debug the system there too many hyper parameters and like you have to deal but
0:15:27	that that's actually a worse problems in some settings then actually you know just fine
0:15:31	data just do the input and output annotations so i think it depends on the
0:15:36	use case like
0:15:37	if you have to prove the system or if their individual parts of the system
0:15:42	that you need to actually sort of transfer over to a different domain or for
0:15:45	other systems where you need that output not just like the last but by like
0:15:50	something intermediate like for example
0:15:52	it can be argued syntax is not necessary for every not task or domain using
0:15:57	howling when the last time you actually so part-of-speech tagging paper in the recent years
0:16:01	or even a parsing paper for that matter if you see the number of a
0:16:05	percentage of a present is yellow re mlp or not collide means going down dramatically
0:16:10	but doesn't mean that that's important to not important ready made exactly important depends on
0:16:14	what you trying to do with that pretty using the dependency parses to do something
0:16:18	in me do some reasoning over the structure substructures it is useful to generate a
0:16:23	doesn't know what it on the other hand that's just a precursor to peer into
0:16:27	an anti r machine translation system
0:16:30	it's arguable that that's not necessary
0:16:33	for the matrix that we're talking about parameter got automated metrics
0:16:36	again that does not mean you're gonna solve that we have to solve those problems
0:16:39	are used i can take models
0:16:41	any depends so well on what you're trying to use a system for
0:16:47	in some sense it's kind of a balanced rate so
0:16:50	typically for example but we are kind of
0:16:53	so this to take a specific example of what we're doing
0:16:56	however when we're trying to bill so
0:16:58	really building language learning module the building specific goal-oriented systems task-oriented systems a specific skills
0:17:04	like
0:17:04	this thing see
0:17:06	fluency of pronunciation or grammar or specific aspects of ground so
0:17:10	so how do you go about and this is the whole so how question but
0:17:14	you raised earlier which is about
0:17:15	you know how do i build these generalisable systems are how to a kind of
0:17:19	you know
0:17:20	use the same pipeline across these different
0:17:24	ceiling is similar tasks but there
0:17:26	probing each probing different things
0:17:28	so you start out with something perhaps which is because it's a limited domain you
0:17:33	don't have much data anyway
0:17:35	i have started more expert knowledge
0:17:37	and then start collecting data
0:17:41	to wizard-of-oz or some kind of outsourcing with some of the matter
0:17:45	and ultimately get more data that you can kind of build a more hybrid kind
0:17:49	of system
0:17:50	which could either be end-to-end but also be informed by
0:17:55	not that one so
0:17:57	that's
0:17:58	that's one way to
0:18:00	i guess what what's problem kind of look at
0:18:03	different points along this hybridization spectrum a combination of data to one another driven approaches
0:18:08	for
0:18:08	have implications for how your pipelining a system in training the forces
0:18:13	well i certainly don't agree
0:18:17	while you guys
0:18:19	but you know some of the techniques
0:18:22	for instance are not gonna be particularly
0:18:27	appropriate for certain types of tasks
0:18:29	so for instance i think attending to a knowledge base forces
0:18:33	computing actual complex query those two things can actually be very different
0:18:41	frontal use a probability comparative and things like that
0:18:43	it's not obvious to mean attention might solve
0:18:47	i guess that's related to the first question that you will probably addressed the kind
0:18:52	of dialogues that you can still with this
0:18:56	method and the other ones you will not address
0:18:59	so that's score so that the risk of
0:19:03	where this research is going as we just keep drilling into the problems that we
0:19:08	started with in we and not expanding or to go
0:19:12	so
0:19:13	talking about expanding this goal
0:19:15	i want to talk about
0:19:17	or have you guys talk about multimodal dialog so i've got
0:19:22	not just
0:19:23	the speech but i but other modalities and their coordinated in interesting ways
0:19:29	and about multiparty dot
0:19:32	which guys
0:19:33	so
0:19:34	take
0:19:35	any of your favourite speakers and stick it in a family stick an indoor environment
0:19:40	family not have a conversation with your family
0:19:43	and that device
0:19:44	and it can track conversation amongst multiple people what time you want to be a
0:19:49	merry want to be the month at three o'clock mary's is no i don't
0:19:53	okay so what the system into
0:19:56	what's representing as to what happened in that i'll we do that men
0:20:03	do we have any representation of cool what's the belief state
0:20:07	that we've seen in all these the
0:20:10	all these papers is there any notion of believe actually going on
0:20:17	so
0:20:18	the idea i mean there's a huge amount of thing to break open once you
0:20:21	start what within the multi party set and just there's the physical situation had actually
0:20:26	having a robot or gonna look at lex's physically situated it got a camera on
0:20:32	it
0:20:32	and i'm sure they have that right and it's
0:20:37	and it's can see what's going on in the room you can see who's talking
0:20:40	it was talking to consider you know if you allow
0:20:44	what do you to track out of all of that house is gonna actually helpful
0:20:47	family rather than just
0:20:49	and individual bunch of individual conversations
0:20:53	so
0:20:53	this is a whole rate better bigger space what we've been dealing with how we're
0:20:58	gonna go
0:20:59	well really worry about the multimodal multi party
0:21:03	adaptation so this is still very the this is the kind of dialogue that we
0:21:08	are trying to model with for a for example what you have multiple people on
0:21:12	one problem there is as you say sort of the
0:21:16	this
0:21:19	the belief states or sort of typically you think about a bit that's what does
0:21:24	the user
0:21:26	one
0:21:26	up to this point or what have agreed to this point but if you have
0:21:29	to people the might be of course to different states
0:21:33	so if the two people are ordering and one say
0:21:37	i would like a bird around the other once s like me to
0:21:41	but not with onions or something referring to that and you have to keep track
0:21:45	of course of what the two different person someone that sometimes of dialogue
0:21:49	it's also
0:21:51	you can't just are presented as individual adults it's common like we want to do
0:21:55	this we would like to do exactly so that maybe you should have like three
0:21:59	different representation one is what we want and one is
0:22:02	i one on the other one
0:22:05	the goal is to come to a consensus but this is i mean it's are
0:22:08	watering things you could have different things and so long so it could be a
0:22:12	mix of course
0:22:14	and that thing that you can refer to what the other person is saying
0:22:18	but also of course is to say if the two people are talking to each
0:22:21	other to what extent the system listening to that which is probably has to form
0:22:27	a part of real data part of the we
0:22:30	right if it's part of it's all of us together are trying to solve this
0:22:34	problem
0:22:35	what we're gonna happen what we're gonna order in more when we're gonna go out
0:22:39	for whatever
0:22:41	we then the system has to be part of this collect
0:22:47	and you have to have what we used to call in today's joint intention
0:22:51	we're trying to do together
0:22:53	but how we're how would you guys think about
0:22:57	this problem
0:22:58	a multi-user problem i guess the other thing to add to the mixes the multi
0:23:04	modality of things right so absolute so for instance
0:23:09	when you have audio video
0:23:11	which one be within two first and how do you how do you to choose
0:23:15	priority
0:23:16	and of course is unknown situation that something
0:23:19	that
0:23:20	it's is just
0:23:22	just missus usually is i
0:23:24	so and this also what we found is that the so
0:23:28	maybe looking largely the education context for this kind of thing the teacher training or
0:23:31	something that you looking at
0:23:33	for instance a person interacting with you know
0:23:37	a teacher interacting with this
0:23:39	you know able to a class of student outcomes
0:23:43	so
0:23:44	you know if the teacher dismisses one student how are you know you know
0:23:48	is the student or is one of the students to
0:23:52	so suppose they say for instance you like a low the in great but i'm
0:23:55	pointing in that direction so who does the system you know attend to work as
0:24:01	it into my speech is it into my just to
0:24:04	and this is always that kind of
0:24:07	or buckets may or but
0:24:10	so
0:24:12	try to positive spin to that i think we are at this stage we can
0:24:17	do belief tracking for sure that it is not at the level at be wanted
0:24:21	to generate cannot but i believe we have developed system are very close to
0:24:27	the technology that the point where we can actually do joint inference or video audio
0:24:32	and textual signals where we can actually disentangle you know between different entities all you
0:24:40	know corresponding at the same time and we can do the set scale
0:24:44	you could do that but then how do you
0:24:46	relatively prior knowledge of the simulated user the second point where i mean i'll give
0:24:51	you a different scenario like that so we do this
0:24:54	imagine it's not just like you know collaborative but we are i you know you
0:24:58	can actually attribute that to a specific entity what if it's a parent and child
0:25:02	mel whose preference you take into account the channels as a play the cartoon network
0:25:07	and look for twenty four hours right for example women alexi do that store who
0:25:11	will do this obviously there's a preference here like in the parents have to sort
0:25:16	of winter
0:25:17	the very tricky situation and it might not be as easy as like that in
0:25:21	some sort of a general-purpose model that says you know these are the entities and
0:25:25	like there's one model for k there are two people interacting and they have a
0:25:28	joint intend to write it might be customisable powerhouse over or you know set of
0:25:33	people and these might all vary across different sets of people at put together
0:25:37	and the relationships between them as well so all these things have to be factored
0:25:41	in right i'm into at the challenging mixer problems
0:25:45	but
0:25:46	simple thing is we don't have to line everything right i mean like one suppose
0:25:50	everybody things like machine learning we have to relearn everything you can just ask the
0:25:54	user for preference for a time you could just a person thank you are people
0:25:58	tell me what's your preference or just manually enter it like in an a or
0:26:02	whatever it is right i mean that's is that just one bit is enough to
0:26:06	sort of bootstrap the system or at least locking bunch of variables right which you
0:26:11	know would have cost a lot of confusion downstream
0:26:13	so
0:26:15	there's still hope i mean there it's
0:26:18	have to be this interactive mode not this system observing a bunch of things and
0:26:22	learning and then like certainly starting to do the writing of a point in time
0:26:28	alright i'll move
0:26:30	we finish what time
0:26:33	six about
0:26:34	and we
0:26:35	okay and i think we wanna have a fixed
0:26:38	so giving an audience participation
0:26:40	so i will try to move along with some of the other
0:26:44	questions
0:26:45	and
0:26:47	i
0:26:50	but the next one
0:26:52	that i had in mind was explained ability
0:26:55	okay so we have always lovely machine learning systems
0:26:59	you ask any of them why did you say that what do you get
0:27:03	not
0:27:05	okay
0:27:07	now the system could make up
0:27:10	white said that but you actually want white set it to be causally connected to
0:27:14	what it actually
0:27:17	so what
0:27:19	kind of architectures can you imagine
0:27:22	that will gain hours
0:27:24	explain ability
0:27:27	in the general case
0:27:34	whom like this
0:27:38	i mean
0:27:39	first the question is do you as a user really need to be able to
0:27:42	ask that i mean are us to use are interested in what the system i
0:27:46	did you recommend that i think it is a dialog assign a definitely want to
0:27:50	know it's but then the question is do you have to get the answer to
0:27:52	talk about restaurant we wanted me to go to
0:27:55	you give me recommendations s a y okay
0:27:58	so in that case like this
0:28:01	i didn't you suggest that
0:28:03	and i think that this not of course if it's if it's learn julie
0:28:12	and
0:28:14	i between a and especially then you have to build a dialogue
0:28:17	around that so whatever you where you're building your dialogue you have to train a
0:28:22	dialogue on explaining
0:28:24	dialogue
0:28:28	there you might not have that data
0:28:31	well that part of the point is
0:28:35	i just it's just offer a counterpoint to get your really are so for instance
0:28:39	in education this is really important so you if i'm and this is true for
0:28:45	had this but mental health and any other found that so if i and perhaps
0:28:50	radix as well
0:28:51	so if i you know telling operation that you know what you have depression but
0:28:57	seventy five percent probability you probably want to them what is what
0:29:00	they probably want to know why or why you can plug conclusion
0:29:04	are the same thing with the but someone what you're saying all you know what
0:29:07	you're this your fluency score is nine out of ten or
0:29:11	four out of ten by is it for i work and what we need to
0:29:14	improve
0:29:15	so in those kinds of case is really important having said that i think there
0:29:20	is an increasing body of work in the em in literature especially for those interested
0:29:25	in end-to-end models
0:29:26	to and
0:29:28	you know similar deep learning models really look at interpretability using a variety of techniques
0:29:33	and i think it is that has been relatively unexplored in the dialogue community but
0:29:38	i think we should really
0:29:40	this is one of those things i would really at two i think one of
0:29:44	those questions a little bit is what would you ask your graduate students or next-generation
0:29:49	exactly one and interpretability but there are several techniques so the techniques that
0:29:55	try to probe deep neural networks and trying to figure out what inputs are the
0:29:59	most salient that you know lead to classification
0:30:03	the techniques that look at
0:30:05	visualizing neurons the techniques that look at visualising memory units
0:30:11	and all the way up to so this is in terms of model interpretability but
0:30:14	input
0:30:14	but even in terms of feature interpretability but you believe that will actually get chewed
0:30:18	up to a comprehensible
0:30:21	explanation to an actual in user
0:30:23	not have them but so you wanna say something
0:30:27	just gonna say that my point is gonna be about
0:30:30	just because we say that a network is explainable doesn't mean i mean depends on
0:30:35	you know who is looking at it right i mean if it says okay activation
0:30:38	number for three sixes firing and that's causing like the positive class to go up
0:30:42	by probability x right
0:30:45	to the ml engineer scientist was actually think this model all great okay now go
0:30:49	to fix it or you know like do something to but i think what probably
0:30:53	more interesting it's lee at least for nlp and a lot would be like are
0:30:58	there is some high-level abstractions or even you don't have to you know incomprehensible
0:31:02	i sense that it can actually find in the let's eight knots alignments right where
0:31:06	these sets of examples of like are basically leading to the same sets of outcome
0:31:11	right i mean at higher level right so that higher level at time t right
0:31:15	you could be of the phrase a level i could be at the semantic level
0:31:18	but obviously a single higher i mean
0:31:21	bending unexplainable system would then become as hard as actually generating before system itself right
0:31:26	so then
0:31:28	and so this is while i think the field has to go hand in hand
0:31:30	but like you know the modeling work and also all the other work and applications
0:31:34	well the vision community if you like has like advance for their in this respect
0:31:40	and the lp community not just for probing networks and looking at activations in even
0:31:45	learned approaches where you actually backprop to the network and
0:31:48	look at regions and like you know sort of find like learn in online fashion
0:31:51	which regions actually and what ceiling natural colours et cetera our triggering certain types of
0:31:56	behaviours and sort of interpreting back from in an discrete fashion like it's a colour
0:32:01	map or like in a certain types of object patterns around or you know like
0:32:05	triangles et cetera
0:32:07	i think we want to see more that nlp community getting the most interesting words
0:32:12	that i've seen in the recent past like you know more of the probing type
0:32:15	where you have these black box networks and the other methods are actually trying to
0:32:20	providence you okay where they're gonna feel when are they gonna fit right and you
0:32:24	be very surprised
0:32:26	some of the state-of-the-art systems you just change one word in the input utterance and
0:32:29	suddenly it'll flip the probability so there's a lot of women lineman other types of
0:32:33	method which are looking at these things so i think explained ability and interpretability go
0:32:37	kind of hand in hand
0:32:39	for realizing consumer that you need to explain it
0:32:43	it's not just
0:32:44	probably nor on
0:32:48	and so i think we actually need to come that's and groups and there are
0:32:52	many people in a room we've worked on this problem
0:32:56	in the past in its time i think that certainly
0:33:00	in the learned systems need a figure out how they're gonna do this because
0:33:06	it you don't the european can you will
0:33:13	just the point i think the good news though is that i mean if you
0:33:16	see the number of papers on this topic right you know over the last just
0:33:19	two years i mean this is a very encouraging sign rate so it used to
0:33:23	be like a who wants to actually talk about explains as i just built the
0:33:27	system it does state-of-the-art you know like x y z
0:33:30	and now i think for grad students i think it's a very interesting and very
0:33:34	exciting field to be part of okay so that's the next question what's the most
0:33:37	important thing people are to be working on the right
0:33:43	i have my data
0:33:45	you've got
0:33:47	so i mean to start with i think it's very important that
0:33:50	people work on different things so
0:33:54	so we have a lot of different approaches but we can compare sum up everyone
0:33:59	does similar things
0:34:02	i also think sort of the
0:34:05	in the intersection between dialogue
0:34:08	speech and multimodality and so on because this arcane still separate feel so
0:34:14	i mean if you look at
0:34:16	this to google duplex demo for example that god's a lot of attention on people
0:34:22	for that while this sounds really human like
0:34:25	so if you look on a sum
0:34:26	dialogue
0:34:28	pragmatic level if you make a transcript out of that
0:34:31	it's not the very sophisticated dialogue the model but the execution
0:34:36	is great i we don't know if that was a sharp picked example but as
0:34:41	it sounds at least it sounds fantastic so be able to actually execute the dialogue
0:34:49	in a way that the has that kind of turn taking and that kind of
0:34:53	conversational speech synthesis and so on
0:34:56	using a model of the dialog a i think that something that is
0:35:01	are explored in both the speech and the dialogue community
0:35:08	explain ability is
0:35:09	super important
0:35:11	would say that
0:35:12	i mean this sounds like there's so many factors associated or like multiple areas associated
0:35:16	with this building more system so that we can make the system's less brutal the
0:35:22	number of ways to achieve this rate and
0:35:25	that's a very important topic and you can deduct a number of ways from the
0:35:28	ml community from like in injecting more structured knowledge one of the things that all
0:35:33	these things lead to in my been in is like
0:35:37	not just for generation but all the other aspects of dialog really research problems
0:35:42	what are the min viable sort of nuggets of knowledge that we have to encoding
0:35:47	the rain or the system after encoders that it can learn to generate well i
0:35:51	can then do recognise do the slots in turn spell it can be transferred to
0:35:55	a new domain so
0:35:57	is that like what is the equal and of a knowledge graph right i mean
0:36:00	for like different dialogue systems i mean that we can actually sort of we can
0:36:03	all agree on so i think if we come up with like some sort of
0:36:06	a shared representation of that i mean which is interpretable to at least to some
0:36:09	extent then i believe
0:36:12	you know we can actually make even more for the progress right of course it's
0:36:15	a hard problem right i mean and dialogue is like one of the hardest problems
0:36:19	in and that's language as well so
0:36:21	it's not just for looking up is what i'm talking about is like what are
0:36:25	the things about like you know the channel well right i mean it doesn't have
0:36:29	to cover hundred percent even like twenty percent of the knowledge can be encoded in
0:36:33	the concept space and relationships between them such that i know this now for a
0:36:37	new domain i might have to just
0:36:40	get like access to very small amount of training data or like learn a little
0:36:43	bit more do sort of market into existing concept or like sort of augmented by
0:36:47	existing concept you know database
0:36:49	so
0:36:50	i think that's
0:36:52	a super interesting thing and this could be multimodal as well it's not just about
0:36:55	like you know language it's about like
0:36:57	what are the visual concepts i need to keep in mind right i mean the
0:36:59	taxonomy of like objects relate to each other if i see a chair in forever
0:37:04	table i mean i know you know what is the positional relevance between you know
0:37:07	different things
0:37:08	all these spatial coherence all these sort of thing freedom and so what are the
0:37:11	mean mobile sets of relationships and you know concept that we need to one
0:37:16	but better dialogues
0:37:20	so
0:37:21	since gabriel and since you have already covered buns of things and say something complementary
0:37:25	to that but add to this because i think these are really interesting problems and
0:37:28	it was
0:37:30	gonna at least my list anyway
0:37:33	i just add that the
0:37:36	working on low resource problems
0:37:38	so for instance we already we always
0:37:40	well
0:37:41	so this is in terms of languages domains
0:37:44	and even you know the kinds of data sets that we kind of cv we
0:37:49	didn't do or what train and this is been this is nothing new everyone where
0:37:52	you're knows about this we all what we can do over trained on the restaurant
0:37:55	data sets of the cambridge datasets a good reason of course because the publicly available
0:37:59	but that's
0:38:00	that's one thing but
0:38:03	you know
0:38:04	apart from plano get more data sets and that's obviously one of the things we
0:38:08	want to do but
0:38:10	you know can be look into how do we do minute that
0:38:13	i don't this work already going on but perhaps more intense there's a lot of
0:38:17	work on c one shot
0:38:19	but trying to you know
0:38:21	look at the better ways of adaptation better ways of working on new domains
0:38:28	that with limited resources
0:38:30	a given the existing resources perhaps using
0:38:33	you know since but you know it begins by very techniques for machine translation or
0:38:38	some other
0:38:40	some of these other sister feels that
0:38:42	you know we might not think of immediately but for instance
0:38:45	this is starting to come up a lot more
0:38:47	trying to use data which you know
0:38:51	i kind of unconventional for dialogue what might be a useful for bootstrapping is kind
0:38:55	of low resource settings
0:38:56	that might be
0:38:58	also something very interesting and useful to look at
0:39:01	and especially for underserved domains so okay coming back to my to madison education
0:39:08	these are not necessarily the climate is how may i help you or you know
0:39:12	looking or those kinds of
0:39:16	domains but i think there's to you know this is where you have a lot
0:39:19	less data but still
0:39:21	might be useful to kind of
0:39:24	one thing we have very large loud structure maybe global don't it's block structure to
0:39:30	the group
0:39:32	and
0:39:36	and then
0:39:37	that's all unique
0:39:40	it's just the known structure and after that you already know how to have a
0:39:43	cons you know what objects are you know with the actions are you know what
0:39:47	the verbs or you know what they're preconditions and effects are why do you need
0:39:54	anymore
0:39:55	but i mean dialogue constantly able some the well unreasonable has a file that is
0:40:00	why don't why do we need any more than just
0:40:03	a change and knowledge
0:40:07	i don't need a big corpora "'cause" already learned head
0:40:11	or in that got a huge vocabulary have that all these vectors
0:40:14	so
0:40:15	one like just change the knowledge base
0:40:19	then how because be to make it you know what's
0:40:21	who needs universal just give me a alright i'm gonna do
0:40:25	cancer diagnosis or i'm gonna do
0:40:29	architecture where i'm gonna do whatever you know take arbitrary size
0:40:33	i was just a great so for each of those domains you need that lack
0:40:36	knowledge base and i
0:40:38	i think i like that everybody may precision and that's what they're
0:40:42	okay
0:40:44	but even if the knowledge bases let's a huge and static reasoning over that is
0:40:49	in keep changing rate i mean the same knowledge you might interpreted differently you know
0:40:55	sometime later as it was would you doing right now it could be because our
0:40:59	methods are not sophisticated enough or
0:41:01	you know be basically some new information pops up i mean the fast a the
0:41:05	same but you know the way you look at that changes over time right i
0:41:08	mean
0:41:10	and one give users about example for this but i think
0:41:14	i don't think the problems are gonna go away anytime soon if anything the machine
0:41:19	translation "'em" even the low resource setting
0:41:21	this is existed for several decades right i mean i mean number of not make
0:41:25	a similar to what he an unsupervised machine translation like now we use starting to
0:41:29	see okay that more system actually scalable systems working this domain and it's i think
0:41:34	that feels all and all the ml all a computer vision
0:41:38	has this tendency to okay we focus on like the solvable immediate big crunch and
0:41:43	problems and then you try to simplify are then like you know extent to the
0:41:47	zero shot setting extent to you know or so sitting but it's not be starting
0:41:52	from scratch all the stuff we learned about image method i mean convolutions are still
0:41:56	them single useful most useful blocks that you're transferring over a and foreign language i
0:42:02	would argue like over the last five years
0:42:04	attention seems to be a common i get that seems to be trendy can have
0:42:07	thousand variance of these networks but there's specific concept that even if transferred onto new
0:42:13	problems right now you build models so
0:42:16	hopefully these also would transfer you know as we start looking at you problems are
0:42:20	extensions of
0:42:22	well conceivably we should be thinking more about grand challenge problems but is going just
0:42:26	usually a alexi challenge but
0:42:31	larger ones you can get governments to support
0:42:34	but you know governments now we're gonna start asking us there's last quest
0:42:39	which is
0:42:42	so you built this wonderful technology
0:42:46	and now i'm getting phone calls the user interactive phone call that are trying to
0:42:51	get me to do stuff
0:42:53	either by stuff
0:42:55	or in the worst case commit suicide or you know a variety of activities
0:43:01	and these are by doing this
0:43:03	and they understand language pretty well
0:43:06	and they are
0:43:09	there enough to cause some people to be convinced
0:43:13	that they're dealing with the a person
0:43:17	and even as far back as the a light there were people are convinced about
0:43:23	the human this of that but these are you know who knows and letting these
0:43:28	things lows
0:43:30	how do we start that and ask
0:43:32	you know we've seen that we see what happen in computer vision where people were
0:43:36	really paying that much attention
0:43:39	and certainly it's being this
0:43:43	how do we prevent are technology phoneme is you
0:43:48	obviously it's our problem
0:43:52	suggestions
0:43:53	and then we'll turn over to the floor for any
0:43:55	you know will have enough time for twenty minutes questions
0:43:58	as only ten minutes
0:44:00	so you know obviously can do regulations that
0:44:05	bots always have to say that there were able but the
0:44:11	that would not will not stop people from doing that possibly
0:44:18	so adversary older networks
0:44:21	generated you know if the need for a year you're gonna have steve fakes in
0:44:26	language processing and dialogue processing of wherever successful
0:44:30	in that it might also come to stage where i don't pick up the phone
0:44:33	calls myself anymore but it's under your by six mile bit makes it up in
0:44:38	order to see if it's about corpsman
0:44:40	and they were talking to each other violent argue that is
0:44:43	try to convince my but that it
0:44:47	i don't know but that it actually happen that i mean it does so i
0:44:50	don't have take michael's but the local system takes the call for me
0:44:56	which might be nice even if it's a human coding like having an secretary
0:45:01	so and that could also be annoying so that in another way because the technology
0:45:05	might not work so well in the to start with so you spouses falling and
0:45:09	guess
0:45:10	your part sphere text you and it might it might cause system from millions correct
0:45:16	extra
0:45:17	so these are other problems also
0:45:22	so i think with every technology i guess like
0:45:24	they're both sides right eigen this example you said like pots talking to other bartending
0:45:30	i mean be awake those are then we think no they can and the generations
0:45:34	or at least for some of these things are super a good that don't have
0:45:37	the time the natural language exactly me just know the right keywords or trigger words
0:45:41	and it can now imagine one if you're box has access to critical account and
0:45:44	like the other what's a stock and then the code of the order you know
0:45:48	like this like eighteen hundred dollar stuff right and
0:45:50	it doesn't at a confirmation because the predicate info is already on so i think
0:45:54	there like blog sites the both of these things right so but one thing i
0:45:59	would say is
0:46:00	we can like just work on the research of like you know improving the dialogue
0:46:04	systems the recognition the machine learning and then sort of ignore or like sort of
0:46:09	re actively you know sort of go back or because of g d p or
0:46:13	something and go back and look at this problem track so this is also opened
0:46:16	up new research in other fields right i mean and tested we can still process
0:46:20	the bottom always gonna get better it's like spam right i mean
0:46:24	you know the you have to their multiple ways to deal but that's rate of
0:46:27	research also has to be like sort of state-of-the-art in terms of like how to
0:46:31	deal with either zero so there are methods which actually now try to improve i
0:46:35	mean
0:46:36	take the adversarial in flip it and try to improve the robustness of the system
0:46:40	basically using the same kind of adversary technique but like in a reverse way when
0:46:43	you know the gradient in the other direction of during training time
0:46:48	one way to look at it
0:46:51	in the commercial systems like should be make the so the money p-value or the
0:46:55	like number of tries these bots get like sort of increasingly more challenging or like
0:47:00	you know the amount of course like many of these are generated you know thousands
0:47:03	of times a day and also generated right so if there's that wonderfully cost to
0:47:07	that
0:47:08	how these companies won't exist right or they will actually change the strategy so there
0:47:12	are different ways of looking at these problems like them in the cost effectiveness the
0:47:15	research one thing is
0:47:18	i don't think it's gonna go away and i think that's if we solve this
0:47:21	like you know that was no problem right now be fixed towards can be something
0:47:25	that's it's a continually changing problem one example is like when we released like some
0:47:29	of the systems like you know it's multiply et cetera was people don't know we
0:47:33	have to it too "'cause" wait longer to actually build systems to actually
0:47:37	detect sensitive content the messages because you don't want any of these smart system to
0:47:42	say something stupid you'd rather not say anything man and you know traders be smart
0:47:46	and suggest responses and that's a continually evolving problem right and its cultural it's you
0:47:52	know depends on like the language so many different aspects to like
0:47:55	so it's a very hard problem but better i mean those i think research also
0:48:00	has to look into these aspects and like sort of
0:48:05	going back to the psd is what kind of problems your work on thing we
0:48:08	have plenty of problems that are uncovered by the advances we made in the last
0:48:13	ten years writer is opening up like new areas for research as well so
0:48:18	it's a constantly evolving challenging
0:48:21	okay let's point we one open it
0:48:23	okay let's open
0:48:26	we got a mike
0:48:28	we got a question
0:48:33	i feel
0:48:38	so i just want to fall on the explain ability discussion
0:48:41	i think one useful nuggets from watching be asserted that like video this morning is
0:48:46	that the all the users in that skit didn't trust region a set on not
0:48:51	sure about that
0:48:52	and it may make you think that russ is also very important for explanatory
0:48:56	and i was wondering more specifically
0:48:59	if the panel things that symbolic
0:49:02	representations are necessary for
0:49:05	modeling that sort of explain ability
0:49:07	the structure for
0:49:08	are we gonna the mean for the connectionist as a compared to connectionist models that
0:49:12	we see today and then the role approaches
0:49:17	well i think you can have both
0:49:19	really
0:49:20	it occurs to me to use
0:49:22	you are to be able to
0:49:24	training no
0:49:26	neural system with but ai planning system
0:49:30	and then you've got a very fast executed neural system planning to can explore much
0:49:34	bigger space and people can and then you actually have when you ask a wide
0:49:39	you say that then you go back remote the planning system where essentially it's going
0:49:43	to therapy in figure now why what i've said that
0:49:46	right because there are causally connected you could imagine them
0:49:51	actually producing the representation encoded to train it to do
0:49:56	that would be my
0:50:00	that's what am i get the answer questions
0:50:02	okay
0:50:05	so i think one more aspect about the trust is i mean
0:50:08	do the user's trust the devices or like the technology itself right and in one
0:50:13	interesting area that's i think fast case right now or like it's gonna be of
0:50:19	increasing importance as privacy preserving i and
0:50:22	the notion is whether you know data level there is on the device or you
0:50:26	know what is shared you know to the color who can access it like i'm
0:50:30	ideally percent where trust the veracity of the information that's coming back
0:50:34	all these are interesting aspect right i mean i mean in addition to the symbol
0:50:37	again initialize like the links the dimension i think this is going to so to
0:50:41	be even more important in the coming years because like
0:50:45	phone is where your most of the time these days right i mean that's not
0:50:48	gonna change its if anything it's only gonna get worse right so and you interacting
0:50:53	with these voices systems it like probably added exponential rate if you have one of
0:50:58	people and you have an unplugged so i
0:51:01	well i don't know as can be irritating sometimes right so which makes people do
0:51:06	this
0:51:07	so i think that's also an interesting and very useful aspect of trust and then
0:51:14	there's a like a elevator version of that like
0:51:17	regulations in gtd are like and imposing like in making sure like
0:51:21	there are third party sources it's which can verify this information right and it's not
0:51:25	just one central entity that you know is being out and you believe everything right
0:51:29	so
0:51:33	more questions
0:51:40	not until january see so i wanted to make a comment and then the what
0:51:45	documents
0:51:46	so the first one that i cannot algorithm or with in not being open to
0:51:51	an out-of-domain multi-modality explain ability we can already that's done had candle names and
0:52:01	an alarming domain may human learning machine learning domains and what we need an does
0:52:08	yes and the fact that we don't have large datasets and personally i can personally
0:52:14	in my projects i can't wait for you know that they is a deep learning
0:52:20	architecture tool
0:52:21	be able to jump from restaurants easily to be able to understand the conversational that
0:52:27	the patients and is engaging in when describing there is and so i'm not sure
0:52:33	exactly what
0:52:35	this solution is there but i see a narrowing that she actually and in a
0:52:44	well as you need a narrowing on this task i wanted to and bring to
0:52:50	your attention a very interesting paper i thought from ace it nothing to do we
0:52:54	still
0:52:55	the each race and sharing and whatnot there is an accountant and that they are
0:52:59	wasn't a and
0:53:01	energy consumption and i one slip ring of and training what is the learning model
0:53:07	as and i thought
0:53:09	human there was a the task i wasn't sure so shall i you know some
0:53:15	these technology i think that is also something that we may want to take into
0:53:20	account when we
0:53:23	train in retrained is machine learning
0:53:25	using the people was completely i
0:53:27	something like this is a difference you to ring radii screening so
0:53:31	logistically
0:53:34	i think i can now that space and the last but i think the second
0:53:37	point you made a is probably gonna be one of the most significant areas that
0:53:44	are gonna come up like not just for and all the anything touching ml and
0:53:48	then x five years
0:53:49	on how we can use compute i mean there's a general tendency of maybe just
0:53:52	keep increasing the compute on the cloud right i mean and they can keep using
0:53:55	as much as you want by segment via might arise and like you get access
0:53:58	to more t v resources if you're that's not gonna be true i think what
0:54:02	you will see is like
0:54:04	we training with more sources but you're also building more models and if you look
0:54:07	at some of the you know a statement going from some gladly well gonna i
0:54:12	ten x more compute power and
0:54:14	i think we expressly my group you're actually looking at a lot it like on-device
0:54:19	and also efficient machine learning and
0:54:22	they used to be a concern that all
0:54:24	these methods i mean if they have lower for rain or lexicon hundred printer memory
0:54:29	are
0:54:29	you know their you know factor we have to sacrifice quality but i think at
0:54:34	least for recognition classification sequence labeling et cetera and even for speech recognition too early
0:54:39	this year i and i of you know
0:54:41	seeing performance for these efficient models almost on par if not better than the see
0:54:45	that so there's no reason to say that all i need all these resources to
0:54:49	train the model there are much better ways to do it and that requires separately
0:54:53	you know like you have to introspective research that goes into that optimisations and lex
0:54:58	choices et cetera it's hard it's not there just making a black box
0:55:01	there are some black box to the there but it's a very important problem
0:55:05	and going to the first point out narrowing i think it is true but i
0:55:09	wonder if it's not just the deep learning i mean and i'm sure this has
0:55:12	happened in the you know previous tech it says well random and suddenly you know
0:55:16	there's some spike in technology and you know everybody grounded to its that and then
0:55:21	like over time that changes and like
0:55:23	i would see this like the rise in deep learning and the power of these
0:55:28	networks as i mean just the cord like you know that something everybody knows the
0:55:32	a very good function approximation sorry i would rather use a state-of-the-art model in one
0:55:38	of those black box components like for language modeling utterances
0:55:42	then having to think and tweak about like you know what model to the use
0:55:45	here right there are the focus on the domain problem vitamin like for how about
0:55:49	the focus of the high-level system than like what is the utterance generation mechanism that
0:55:53	i should use right it's hard but because
0:55:56	requiring you know that was also understanding what goes on because how that has contracted
0:56:00	the rest of the component but i would rather you and it's easier to access
0:56:04	these can open so these days as compared to what it was before so there
0:56:09	is i think a silver lining their
0:56:11	you know that more people have access to these state-of-the-art models right now and they
0:56:15	can use of mary's which of the using a very creative
0:56:20	or in the back
0:56:25	you on the smoothed from also
0:56:27	and thank you for the discussion i have
0:56:30	such as for the social impact
0:56:33	discussion
0:56:35	what do you think we could do about informing and uses
0:56:39	about the dangers of these technologies like
0:56:43	do you think maybe is feasible at some point
0:56:46	actually building blocks that help people
0:56:49	recognize logical policies
0:56:51	or marketing strategies and all these things what can we do what we do
0:56:57	in terms of educating and uses
0:57:00	you mean how to get defensive but
0:57:03	no an l c was pointing out not directly does it all the defence
0:57:07	the end user but the ball that teaches the and use the
0:57:12	about
0:57:13	logical fallacies about marketing strategies the about the fact that there are what's around
0:57:19	that try to manipulate you
0:57:22	can we get this to the politicians
0:57:25	i don't know logical fallacies input i mean them
0:57:28	it's
0:57:29	we have it is quite a small community compared to the
0:57:33	entire population and of nobody knows about the politicians one okay
0:57:38	there's just one can really get the robot calls
0:57:42	so this
0:57:44	i mean they're starting to care about deep fakes now that in the us congress
0:57:48	all those converse people were
0:57:50	misidentified for criminals from some f b i most one a database
0:57:54	this suddenly start a carry
0:57:56	so
0:57:59	okay
0:58:00	so now they have no they carry
0:58:02	suggested
0:58:04	now we have but i mean i agree you could definite haven't the this is
0:58:08	actually
0:58:09	in other applications of this area of dialogue system that are that this under started
0:58:14	on that's systems for training for example to train you to do a job interview
0:58:20	so the system would be you and you would
0:58:22	see what it's like or and that here i mean
0:58:26	it is the training scenario but you could training
0:58:29	a lot of different domains
0:58:32	or someone trying to sell something to you and trained on how to understand first
0:58:39	is really trying to doing and so on
0:58:41	so this kind of
0:58:43	training scenarios using dialogue system for that i think that's a huge
0:58:47	well like your idea of the defensive system by because a lot of the
0:58:52	systems that you don't you know all the ads that are being pushed actually
0:58:57	are you know the kind of things that they're gonna come and lots of modalities
0:59:02	right be auditory soon your defensive system could take care that for you said you
0:59:08	know the all pass l one thanks very much
0:59:12	you know on the defence
0:59:14	by
0:59:15	and you are gonna have to talk to me first
0:59:20	no i don't get to you don't get to pass along here
0:59:23	you know what it is you trying to push and so on
0:59:25	so i realise that may not be in the interest of
0:59:30	of commerce but it may be easy to rest of the
0:59:34	the people who
0:59:35	you know would like to be helped by these parts rather than attack by
0:59:40	so i think i was a great suggestions
0:59:47	more
0:59:50	the all i mean it enters common but you know
0:59:53	david just before dinner
0:59:56	i think of the gordon not so i c
1:00:00	i also discussed the remaining earlier about the well trained system versus the intelligent systems
1:00:07	in kind of ties in sets in a more just question and what you guys
1:00:11	had a higher rate maybe sort of that neural plus symbolic approach would be best
1:00:16	and
1:00:17	so why do you think more people are working on this kind of approach now
1:00:21	i didn't say people working on it but
1:00:24	i think just to the point of
1:00:27	what should be could be looking at anything this is something that you know we
1:00:31	want to probably look into more believable
1:00:33	as opposed to you know just running behind and again i'm not think this is
1:00:38	happening but
1:00:39	this is the addition to kind of you know see this use dataset which is
1:00:43	that it and it's easy to publish on and this is easy to get for
1:00:47	instance their stance of low this can is despite darts so it's very easy to
1:00:51	kind of log in late models right now
1:00:54	and so yes we should probably do that but as long as the problem is
1:00:58	that motivated
1:01:00	but you know
1:01:03	that temptation apart it would be good the kind of
1:01:05	look at other aspects the problem that are not just statically plug and play
1:01:10	i think that going
1:01:12	last question
1:01:13	believe it today the tram
1:01:17	we're related to the you were so maybe a false dichotomy between pipelining and
1:01:24	maybe other alternate but
1:01:27	i mean in this slide i think
1:01:30	the real issues more modularity okay where it doesn't necessarily imply sequential process or not
1:01:38	it's a limited modular where
1:01:42	there is insolence usually both directions which makes a point or
1:01:48	but
1:01:50	for this set is the set of
1:01:54	goals you're saying it may maybe for simple task execution fairly limited enumerable but
1:02:01	when one h in dialogue with other people
1:02:06	real situations
1:02:08	we're usually thinking about multiple matches completing a single task so all the pieces of
1:02:13	language or for
1:02:16	user or one
1:02:19	versus there are also useful for finding this reason how much my
1:02:26	placing r c is giving
1:02:31	so relations
1:02:33	future work so the constraints first questions also
1:02:40	so
1:02:42	either these extremes is really getting that's
1:02:46	that's
1:02:47	like a travel agent you'll probably
1:02:51	i
1:02:53	constrained problem for ways but not just words this separate problem
1:03:01	simple examples
1:03:04	you think about like this
1:03:05	speech like is this you know in four or were question
1:03:10	it's not a separable from a propositional content fine
1:03:16	chance it's like functional transformation
1:03:18	after a little and g i let's you to constrain a be you can say
1:03:28	and you know what i think about speaker identification
1:03:33	okay
1:03:34	well thank you all for coming and i think we have a dinner next

The Future of Dialogue Research

Panel

Phil Cohen - Moderator, Vikram Ramanarayanan (ETS), Sujith Ravi (Google), Gabriel Skantze (KTH)