Speech Transcript - Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

0:00:15	it's a really long title for a six minute
0:00:18	so we caff to convince anybody the robot's are increasingly present in human robot are
0:00:22	human environments that sort of open with that one thing i want to address though
0:00:27	is that things that we wanna say to robots in those different environments are quite
0:00:30	different
0:00:31	so the kinds of demands we give to a robot and hospital don't really look
0:00:34	anything like the kinds of commands would give to a robot in an office
0:00:38	the on that robots have different sensing an activation capabilities different microphones cameras and also
0:00:43	like arms maybe norms at all
0:00:46	so if we try to develop algorithms that work across a lot of platforms we
0:00:50	might end up in a situation where we have to define a lot of in
0:00:52	domain data per individual robot platform for individual deployed scenario
0:00:58	so my work is focused instead on using dialog
0:01:01	as a way to with individual human interactors learn the information of the robot needs
0:01:06	on the fly in the particular environment it's displayed in
0:01:10	so i'm gonna jump right into the video of that working that will sort of
0:01:14	what they're
0:01:19	and a sequence
0:01:21	no it's not that was four
0:01:24	there are so
0:01:29	so i give the screen one
0:01:36	yes
0:01:37	it's never for
0:01:39	so as to learn what this they were dreams on the flight figure out what
0:01:43	you see
0:01:49	jokes or synonym doesn't get one so it actually has to learn the new concept
0:01:55	so what it's gonna do is ask me for example rather
0:02:05	we're adding one describing policy
0:02:09	so there's these objects that are in the room with is anyone's going things somewhere
0:02:14	so can ask o
0:02:17	a little holes
0:02:20	robot a system i appreciate how much that
0:02:27	so i show which one
0:02:29	the
0:02:36	shall not use the word about what is trying to say
0:02:42	so negative examples
0:02:46	it is played with these architectures
0:02:49	so it has feature representation of mileage trying to figure out what the discriminative signal
0:02:53	associated with the word rattling is
0:02:57	what is right o c
0:03:01	two examples is not enough
0:03:10	this lunch
0:03:13	the
0:03:15	so with those three examples it started building like pretty we but reliable classifier
0:03:24	so it would require sort of hmm i three five one where
0:03:33	you have to trust me that that's the conference room
0:03:39	something three five one four three five one
0:03:47	yes
0:03:50	so now it's
0:03:51	i don't be able probably knows something that's gonna go there and find the object
0:03:58	space which one best it's you description rattling contain for delivery
0:04:04	and again all the objects using this work i have been played with whatever for
0:04:09	all shocked looks like we're we show what it does but it's basically like pick
0:04:12	them up push them around drop them from all right
0:04:15	and for this work modeling units that are learning that picking up an object and
0:04:20	dropping it
0:04:21	it's a small sound and that's the discriminative signal wouldn't using
0:04:24	so these three objects there's like some white something can in the paper container and
0:04:29	it decides the paper container is gonna be the rattling object is an example instruct
0:04:37	calculating the graph text so on battery power
0:04:46	so we find a rattling container and let's go to deliver the box office
0:04:54	i kind of regrets feeling of this part where it back so that makes the
0:04:57	cute little backup noise
0:05:05	something about this
0:05:08	all these names but an optimized initialize the system like this with a post relative
0:05:13	since all this
0:05:15	but it's possible
0:05:23	so we do a little hand off should consider
0:05:30	so we initialize the system and cannibals are all over that very quickly basically we're
0:05:35	gonna and
0:05:36	have conversations with humans and ask these questions about local objects to the available classifiers
0:05:41	are applied and learn words like travelling in others
0:05:45	we're also gonna strength are semantic parsing component
0:05:47	by asking questions so when the first and says go the middle lab maybe we've
0:05:51	never seen in adjectival construction like that but we do know how to do it
0:05:54	with a proposition so it asks where should i go to process the lab in
0:05:57	the middle we can now strength our semantic parser by adding this the grammar rule
0:06:01	that says like
0:06:02	you can say that allow for in the middle and other adjectives work that's the
0:06:05	way
0:06:07	we test this bunch of tasks about the relocation moving an item from one place
0:06:11	to another
0:06:12	we quantitatively seen improvement when we retrain both with parsing in perception and we have
0:06:17	a user's rate the system is more usable for real world task as what we
0:06:21	do both orson perception retraining like this
0:06:25	so i think i mostly have time for questions
0:06:27	you know that if you have a robot to
0:06:30	any system you have a lot more collaborators
0:06:47	thank you
0:06:49	very quick question
0:06:52	how does he noted that dropping an object is a good proxy for shaping to
0:06:57	make this time
0:06:59	that's a great question so you those examples to try to figure out
0:07:03	what we actually do with the low-level is because we have so few labeled examples
0:07:07	an expectation for word is build a tiny svms
0:07:11	so every svm operates over a particular behaviour and listening context
0:07:15	so we have a feature space that says this is what it's like and you
0:07:19	listen to your microphone and you draw something this is what it's like when you
0:07:22	push down on something and listen to the motors in your arms
0:07:25	and then you can use a cross-validation to estimate how reliable each one of those
0:07:31	classifiers on the being
0:07:32	so in this case like dropping something listening to audio was more reliable than looking
0:07:36	at its colour so you are not trusting the classifier for rattle
0:07:40	or heavy it's like picking something up and feeling the motors in the arm
0:07:50	this is focusing on object actually their performances were given means to those what about
0:07:56	the robot itself like maybe doesn't know that doing this is this could possibly learned
0:08:01	using
0:08:03	no using this framework we have done some work on trying to figure out which
0:08:07	behaviors are relevant using like word embeddings just the this kind of exploration
0:08:11	but
0:08:12	there's like a whole space
0:08:13	of trying to do this sort of learning from demonstration
0:08:16	and for example where i have become the object and shake it in say like
0:08:20	this one models
0:08:21	there's something to like when a human watches another human do that we know that
0:08:25	the fact that i like should get is actually
0:08:28	the discriminative signal
0:08:30	and i think there's something therefore like lifting and lowering and shaking
0:08:34	in that we can see how someone else does the discrimination and not actually have
0:08:38	to
0:08:39	do this svm estimation

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

Special Session: Late-breaking and work-in-progress talks

Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, Raymond J. Mooney