Speech Transcript - Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

it's a really long title for a six minute

so we caff to convince anybody the robot's are increasingly present in human robot are

human environments that sort of open with that one thing i want to address though

is that things that we wanna say to robots in those different environments are quite

different

so the kinds of demands we give to a robot and hospital don't really look

anything like the kinds of commands would give to a robot in an office

the on that robots have different sensing an activation capabilities different microphones cameras and also

like arms maybe norms at all

so if we try to develop algorithms that work across a lot of platforms we

might end up in a situation where we have to define a lot of in

domain data per individual robot platform for individual deployed scenario

so my work is focused instead on using dialog

as a way to with individual human interactors learn the information of the robot needs

on the fly in the particular environment it's displayed in

so i'm gonna jump right into the video of that working that will sort of

what they're

and a sequence

no it's not that was four

there are so

so i give the screen one

yes

it's never for

so as to learn what this they were dreams on the flight figure out what

you see

jokes or synonym doesn't get one so it actually has to learn the new concept

so what it's gonna do is ask me for example rather

we're adding one describing policy

so there's these objects that are in the room with is anyone's going things somewhere

so can ask o

a little holes

robot a system i appreciate how much that

so i show which one

the

shall not use the word about what is trying to say

so negative examples

it is played with these architectures

so it has feature representation of mileage trying to figure out what the discriminative signal

associated with the word rattling is

what is right o c

two examples is not enough

this lunch

the

so with those three examples it started building like pretty we but reliable classifier

so it would require sort of hmm i three five one where

you have to trust me that that's the conference room

something three five one four three five one

yes

so now it's

i don't be able probably knows something that's gonna go there and find the object

space which one best it's you description rattling contain for delivery

and again all the objects using this work i have been played with whatever for

all shocked looks like we're we show what it does but it's basically like pick

them up push them around drop them from all right

and for this work modeling units that are learning that picking up an object and

dropping it

it's a small sound and that's the discriminative signal wouldn't using

so these three objects there's like some white something can in the paper container and

it decides the paper container is gonna be the rattling object is an example instruct

calculating the graph text so on battery power

so we find a rattling container and let's go to deliver the box office

i kind of regrets feeling of this part where it back so that makes the

cute little backup noise

something about this

all these names but an optimized initialize the system like this with a post relative

since all this

but it's possible

so we do a little hand off should consider

so we initialize the system and cannibals are all over that very quickly basically we're

gonna and

have conversations with humans and ask these questions about local objects to the available classifiers

are applied and learn words like travelling in others

we're also gonna strength are semantic parsing component

by asking questions so when the first and says go the middle lab maybe we've

never seen in adjectival construction like that but we do know how to do it

with a proposition so it asks where should i go to process the lab in

the middle we can now strength our semantic parser by adding this the grammar rule

that says like

you can say that allow for in the middle and other adjectives work that's the

way

we test this bunch of tasks about the relocation moving an item from one place

to another

we quantitatively seen improvement when we retrain both with parsing in perception and we have

a user's rate the system is more usable for real world task as what we

do both orson perception retraining like this

so i think i mostly have time for questions

you know that if you have a robot to

any system you have a lot more collaborators

thank you

very quick question

how does he noted that dropping an object is a good proxy for shaping to

make this time

that's a great question so you those examples to try to figure out

what we actually do with the low-level is because we have so few labeled examples

an expectation for word is build a tiny svms

so every svm operates over a particular behaviour and listening context

so we have a feature space that says this is what it's like and you

listen to your microphone and you draw something this is what it's like when you

push down on something and listen to the motors in your arms

and then you can use a cross-validation to estimate how reliable each one of those

classifiers on the being

so in this case like dropping something listening to audio was more reliable than looking

at its colour so you are not trusting the classifier for rattle

or heavy it's like picking something up and feeling the motors in the arm

this is focusing on object actually their performances were given means to those what about

the robot itself like maybe doesn't know that doing this is this could possibly learned

using

no using this framework we have done some work on trying to figure out which

behaviors are relevant using like word embeddings just the this kind of exploration

but

there's like a whole space

of trying to do this sort of learning from demonstration

and for example where i have become the object and shake it in say like

this one models

there's something to like when a human watches another human do that we know that

the fact that i like should get is actually

the discriminative signal

and i think there's something therefore like lifting and lowering and shaking

in that we can see how someone else does the discrimination and not actually have

do this svm estimation

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

Special Session: Late-breaking and work-in-progress talks

Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, Raymond J. Mooney