Speech Transcript - Information-Gain View Planning for Free-Form Object Reconstruction with a 3D ToF Camera

0:00:16	well thank you uh so well minnesota for is um
0:00:21	i would percent might work that has the title information gain view planning for
0:00:27	free form object tracking destruction without really time of flight cameras this work has been
0:00:32	uh and on in collaboration with the german interspersed centre and you'd of politics or
0:00:38	dlr
0:00:40	with uh segment real and that some folks and might supervisors in america
0:00:47	um
0:00:49	okay the presentation is divided by the motivation of this plane
0:00:54	what motivated to do this work and then i will just possible to the main
0:00:59	algorithms that and how they may not way that works that's more or less with
0:01:04	the that decision was trained how the information gain representation works how we compute the
0:01:10	different kind of a view one generation
0:01:14	and what secretary on that we use in order to choose those viewpoints that i
0:01:18	will show my results and i finally conclude with my conclusion
0:01:23	so the motivation is active view planning and why did we know um in cs
0:01:30	given an unknown seen what we try to do is to move our sensor in
0:01:36	space in order to get more information and get more data of the of the
0:01:41	scene in order to
0:01:44	a build a model
0:01:47	so i went i'll uh how objective is to do this autonomously a two models
0:01:53	three uh an object in three D this object can be uh of preformed it
0:01:59	doesn't have to be of any form
0:02:02	and i one of our prerequisites is that it has we don't have any kind
0:02:06	of information of the see that different methods in the literature that the use a
0:02:10	three model based or some kind of course in order to get oriented uh through
0:02:16	the so the modeling of they opt
0:02:20	and our proposal is to use the information gain in order to decide which of
0:02:24	beers we are going to use in order to uh build our model
0:02:30	so
0:02:31	our main algorithm it's looks like this and it's embedded in mainly in four steps
0:02:37	and the first one it's the data acquisition we use a three D time of
0:02:41	my camera in order to get a point cloud from the image so once we
0:02:45	extract this images the for the second step is still update um some internal in
0:02:51	representations
0:02:54	uh_huh one the principle one is an occupancy grid it some of the resolution not
0:02:58	occupancy grid where the data of a time of flight camera get stored not only
0:03:04	like the point cloud but also the you statistically um and so trinity of those
0:03:09	points
0:03:10	i will explain it one of these steps like product so it gets good so
0:03:14	after D is that this firstly what we get is we have data now the
0:03:19	representation and its image representation from this mission we can compute uh like the boundaries
0:03:25	of these a rematch uh in order to select some pubes that a formal the
0:03:31	bikini of that match
0:03:33	so once we we've got these views what the out the main part of this
0:03:38	algorithm that's it's to decide between these views which one we should choose in order
0:03:44	to get more information from the model okay and that's nonviolent decision maker and it
0:03:49	gets information from the uh mesh representation of from the occupancy grid that is the
0:03:55	one that has all the inserting at of the model okay so now i will
0:04:00	go well we show all the steps of the know to quite high and then
0:04:03	i go
0:04:05	and explain uh each step called so this is that the same um first be
0:04:12	that we saw before so we've got an initial posted we set up yeah like
0:04:17	anywhere but just looking at the scene the only prerequisite is that it's looking at
0:04:22	the scene then we get a beer we have they the both um representations and
0:04:28	then we simulate a those be used in our occupancy grid in order to get
0:04:35	which once we are supposed to have this uh information gain once we have one
0:04:39	that it provides the highest information gain what we do is we go with the
0:04:44	wrong what we choose that goes and then we extract another point from the set
0:04:50	and again this is done repeatedly until the algorithm finish and completes the model features
0:04:55	just before uh it changes to the presentation of a mesh tries uh
0:05:00	we could also be to um be used can be a providing more information and
0:05:06	then it computes again information gain of those be used and then we select one
0:05:11	in order to continue modeling them uh the object
0:05:15	so the first that it's the data acquisition as i already said we used uh
0:05:20	a time of like camera in this experiment we were using uh the message imaging
0:05:25	asr a four thousand
0:05:28	it has to be said that it has been calibrated and characterize what we use
0:05:32	but we signal you rate it is not only the intrinsic parameters as normal parameters
0:05:36	do but we already calibrated F measurements
0:05:41	by amplitude done by all kind of errors that these cameras
0:05:46	so but even the when we finish this calibration this camera are one of the
0:05:51	disadvantages this comment is that they are still have noise in definition and so i
0:05:56	think that what we do is characterized that so each pixel has a covariance associated
0:06:02	with depending on the definite it's mess so it's pixel has that's really covariance
0:06:09	i related to it
0:06:11	so once well for those who doesn't know about this time of flight cameras they
0:06:15	provide intensity images and that uh images over just a correspondence or one by one
0:06:22	and they are rolling resolutions like one hundred seventy six or one hundred forty four
0:06:28	um pixels but they provided that uh twenty five frames per sex of a up
0:06:32	to approximate so they are very fast enough to get this just
0:06:37	so once we have uh
0:06:40	and this camera appointed to always seen what we do is to get a point
0:06:44	cloud and this point cloud gets updated you know an occupancy grid this occupancy grid
0:06:49	is some of the resolution occupancy grid
0:06:52	and the first two
0:06:54	and this occupancy grid it's first field in with nothing and feel nothing wouldn't we
0:07:00	understand like an unknown area and it's just one box with a high and something
0:07:05	to do then as far and as far as we are getting um introducing point
0:07:11	close to the occupancy grid all these um pixels in space this box and get
0:07:17	updated with new a measurement and these new measurements modify the entertain it is inside
0:07:22	all these boxes
0:07:24	okay so we've got an example of how to measurements like you pathetic a mess
0:07:30	of measurements will be if they were like ninety degrees
0:07:34	of each other that would be a box so the first row with the but
0:07:38	so without any kind of information and then we've got before the updated will be
0:07:42	these two measurement is to covariances and after updating the model like for using the
0:07:48	entertain it is it will get something like this okay so that's the formulation the
0:07:52	typical information gain at the
0:07:56	so this is only for you know uh putting the answer to anything inside a
0:08:00	model and keeping it so after this that what we in this produces its it
0:08:05	give us you know sensor directionality so each voxel stores that covariance in the direction
0:08:13	if the covariance hasn't direction and usually does that value of the measurement has a
0:08:17	higher and today needed and not the X and Y values so it gets a
0:08:23	story in each voxel in which uh direction is it has been taken and the
0:08:28	good thing is that
0:08:30	this allows to model refinement so at the end we can choose which be used
0:08:35	we will be able to choose which we use you know um give us less
0:08:40	um even more information or reduces morgan's identity of certain areas
0:08:46	so once we update this these representation what we do is to create a uh
0:08:51	a match
0:08:53	in order to get more candidates use to uh to check what's uh information gain
0:08:59	we provide so these candidate viewpoint generation is made on a more uh of gradient
0:09:05	that he presented in like the two thousand eleven
0:09:09	and what it does is it builds a at an alarm age it detects boundaries
0:09:14	of this image uh given certain parameters like the length of the of the boundary
0:09:18	or do you deviation of the comforter of the of these boundary then it separates
0:09:24	them and then what it does it grows uh region inside this match in order
0:09:29	to fit a quadratic patch
0:09:32	in order to so this but i think but
0:09:37	alright
0:09:38	so it's fitting next to the to the previous iteration in order to uh i
0:09:42	sure some overlapping between the two beams and then send you be it's extracted from
0:09:47	this from each button okay so after D is what we do is these new
0:09:52	bills we simulate an in the occupancy grid and then we take the information so
0:09:58	how is done in X
0:10:00	slide
0:10:02	so what we do is now that we've got these deals that we extracted from
0:10:06	the viewpoint planner we come back to the occupancy grid and then we take like
0:10:12	assimilating those of use a C as if we were extensive so we ray tracing
0:10:18	with a ray tracing in order to see all in which areas would collide our
0:10:22	readings and C of those readings how uh the information gain will be okay so
0:10:28	for each one of these like point one of the point clouds simulated pointless we
0:10:33	start the covariance we do this using the same pushing that we did it as
0:10:37	we needed in real and then we compute the information gain based on this formulation
0:10:43	so what it does it's just like estimation of all the logarithm softly traced metrics
0:10:49	that it contains all the updated um
0:10:53	covariance matrix
0:10:57	okay so by doing this we're piddly and at the end we manage to get
0:11:02	our results and these are the results of we
0:11:05	we obtain by three we tested by a on three statues with different shapes free
0:11:10	form
0:11:11	yeah as you can see we get quite a very nice property um models of
0:11:17	then you can see some areas that they have been not feeling or model but
0:11:22	it's you to the configuration because the studies where on top of uh like a
0:11:27	little
0:11:28	chair and then the robot can not access to certain band
0:11:34	and you can see that they are not there we define uh models in some
0:11:37	of them but that's mainly because of the resolution of the camera it doesn't have
0:11:41	more resolution
0:11:44	so and for concluding i presented uh this new three D information gain new method
0:11:49	for viewpoint selection
0:11:51	um you to its internal representation its simplicity allows D model refinement so what in
0:11:58	the future we would like to do is to define liking which resolution we would
0:12:02	like to have a model like or in which sent which parts of the model
0:12:07	we would like to have more resolution so in order to try to get uh
0:12:12	a better a better model so we could even decide like by if it has
0:12:16	a lot of curvature that's an interesting place so we would be able to get
0:12:20	more refinement of these in this i
0:12:26	that's so thank you
0:12:44	something how
0:12:51	i
0:12:54	no
0:12:59	not for about four oh i
0:13:02	not like not one thousand times but i cannot guarantee in a certain like number
0:13:09	of time
0:13:11	no it doesn't really concludes by construction it will and it's like definitely because it
0:13:19	will always fit anywhere we have that are no and then at some point it
0:13:23	will you know like calls the object
0:13:27	but it at like in this in this one we had to close of like
0:13:30	manually because we what we had restricted the area of down because we could not
0:13:34	go down so i can not shown in simulation we could we could do everything
0:13:39	like
0:13:41	but i can assure a number of leaves i can assure that they will be
0:13:45	close to a minimum because it's always
0:13:48	by construction it's obvious
0:13:51	building it incrementally
0:13:55	sorry
0:13:59	how this
0:14:02	i
0:14:03	so it's quite a D
0:14:09	uh i yeah actually yes
0:14:19	so
0:14:20	there's a distance like the got the camera has its calibrated that sent a thirty
0:14:24	centimetres so you can not move far away from the object always in the in
0:14:29	the distance that you probably because they are quite sensitive in that and yeah with
0:14:33	but uh i'd like what we assume it's like in this overlapping it has to
0:14:39	be like a at twenty percent of the of the of the first row of
0:14:44	the of the camera and
0:14:48	yeah and then it follows the angle of the of the product fitting surface
0:15:08	sorry
0:15:12	yeah well with that
0:15:14	it's the ones that fit
0:15:23	no
0:15:25	no
0:15:28	oh
0:15:33	like
0:15:35	by construction so in order to refine the
0:15:40	the model you will be like getting new be used from different places following the
0:15:47	same structure because like usually what we have like they may never it's in this
0:15:51	at fourteen is what is plane and then you'll see a in a like you
0:15:56	matching just one point and you've got
0:15:58	see
0:16:01	i can be structure will be like the nicest when i just put it like
0:16:05	you know also normal way you just do a reading orthonormal way then you will
0:16:09	get rid use your and your covariance as much as possible but rather than these
0:16:13	i will not be able to get like if you decisions in which and i
0:16:17	will not get better than this like this is the best refinement that i can
0:16:22	get or calibrated camera better in order to get of reviews like this
0:16:33	yeah well yeah that will be able consider that
0:16:53	so what do i actually probably
0:16:56	okay i just a method of us that some folks and what he does it
0:17:01	calibrateds so this cameras um like they have ever skin distance for certain in distance
0:17:08	for each certain distance they have an offset the different often it follows a signal
0:17:13	to dial uh um function so you can get uh
0:17:19	you can then the detected and use it we usually calibrated sorry it's all in
0:17:24	the process calibration is like with a normal battery like the one that we use
0:17:28	in
0:17:29	in four intrinsic calibration but a huge one so that just the huge went and
0:17:35	then usually we use are different gray scales in the in the button so we
0:17:39	can because i'm different amplitudes the camera reacts differently so we have a different number
0:17:45	of incomplete you seen that depending on the intrinsic uh on the integration time that
0:17:51	we choose so all these parameters have to be chosen like in this experiment was
0:17:56	chosen for thirty centimetres and you calibrate the camera for that like for a range
0:18:00	of these
0:18:01	and then with this that with this pattern what we do is like we can
0:18:06	we compute all these uh functions that uh minimize the ever by uh we projecting
0:18:12	a plane like with the usual optical uh weight so what we do is you
0:18:18	know you can get because the intrinsic a parameters and then you were we like
0:18:23	put the plane on the space and then you mention what the mention that you
0:18:26	can get
0:18:28	for
0:18:30	i don't know it get it right
0:18:38	yeah

Information-Gain View Planning for Free-Form Object Reconstruction with a 3D ToF Camera

3D, Optics, and Light

Sergi Foix, Simon Kriegel, Stefan Fuchs, Guillem Alenya, Carme Torras