it's a really long title for a six minute
so we caff to convince anybody the robot's are increasingly present in human robot are
human environments that sort of open with that one thing i want to address though
is that things that we wanna say to robots in those different environments are quite
different
so the kinds of demands we give to a robot and hospital don't really look
anything like the kinds of commands would give to a robot in an office
the on that robots have different sensing an activation capabilities different microphones cameras and also
like arms maybe norms at all
so if we try to develop algorithms that work across a lot of platforms we
might end up in a situation where we have to define a lot of in
domain data per individual robot platform for individual deployed scenario
so my work is focused instead on using dialog
as a way to with individual human interactors learn the information of the robot needs
on the fly in the particular environment it's displayed in
so i'm gonna jump right into the video of that working that will sort of
what they're
and a sequence
no it's not that was four
there are so
so i give the screen one
yes
it's never for
so as to learn what this they were dreams on the flight figure out what
you see
jokes or synonym doesn't get one so it actually has to learn the new concept
so what it's gonna do is ask me for example rather
we're adding one describing policy
so there's these objects that are in the room with is anyone's going things somewhere
so can ask o
a little holes
robot a system i appreciate how much that
so i show which one
the
shall not use the word about what is trying to say
so negative examples
it is played with these architectures
so it has feature representation of mileage trying to figure out what the discriminative signal
associated with the word rattling is
what is right o c
two examples is not enough
this lunch
the
so with those three examples it started building like pretty we but reliable classifier
so it would require sort of hmm i three five one where
you have to trust me that that's the conference room
something three five one four three five one
yes
so now it's
i don't be able probably knows something that's gonna go there and find the object
space which one best it's you description rattling contain for delivery
and again all the objects using this work i have been played with whatever for
all shocked looks like we're we show what it does but it's basically like pick
them up push them around drop them from all right
and for this work modeling units that are learning that picking up an object and
dropping it
it's a small sound and that's the discriminative signal wouldn't using
so these three objects there's like some white something can in the paper container and
it decides the paper container is gonna be the rattling object is an example instruct
calculating the graph text so on battery power
so we find a rattling container and let's go to deliver the box office
i kind of regrets feeling of this part where it back so that makes the
cute little backup noise
something about this
all these names but an optimized initialize the system like this with a post relative
since all this
but it's possible
so we do a little hand off should consider
so we initialize the system and cannibals are all over that very quickly basically we're
gonna and
have conversations with humans and ask these questions about local objects to the available classifiers
are applied and learn words like travelling in others
we're also gonna strength are semantic parsing component
by asking questions so when the first and says go the middle lab maybe we've
never seen in adjectival construction like that but we do know how to do it
with a proposition so it asks where should i go to process the lab in
the middle we can now strength our semantic parser by adding this the grammar rule
that says like
you can say that allow for in the middle and other adjectives work that's the
way
we test this bunch of tasks about the relocation moving an item from one place
to another
we quantitatively seen improvement when we retrain both with parsing in perception and we have
a user's rate the system is more usable for real world task as what we
do both orson perception retraining like this
so i think i mostly have time for questions
you know that if you have a robot to
any system you have a lot more collaborators
thank you
very quick question
how does he noted that dropping an object is a good proxy for shaping to
make this time
that's a great question so you those examples to try to figure out
what we actually do with the low-level is because we have so few labeled examples
an expectation for word is build a tiny svms
so every svm operates over a particular behaviour and listening context
so we have a feature space that says this is what it's like and you
listen to your microphone and you draw something this is what it's like when you
push down on something and listen to the motors in your arms
and then you can use a cross-validation to estimate how reliable each one of those
classifiers on the being
so in this case like dropping something listening to audio was more reliable than looking
at its colour so you are not trusting the classifier for rattle
or heavy it's like picking something up and feeling the motors in the arm
this is focusing on object actually their performances were given means to those what about
the robot itself like maybe doesn't know that doing this is this could possibly learned
using
no using this framework we have done some work on trying to figure out which
behaviors are relevant using like word embeddings just the this kind of exploration
but
there's like a whole space
of trying to do this sort of learning from demonstration
and for example where i have become the object and shake it in say like
this one models
there's something to like when a human watches another human do that we know that
the fact that i like should get is actually
the discriminative signal
and i think there's something therefore like lifting and lowering and shaking
in that we can see how someone else does the discrimination and not actually have
to
do this svm estimation