0:00:14 | good morning everyone welcome to the second they have sre two thousand eleven |
---|
0:00:19 | i hope you're enjoying it as much as i am |
---|
0:00:21 | oh it's my pleasure to introduce a professor david foresight and without risk of you know yeah i wanna champagne |
---|
0:00:28 | and came there from exhibit |
---|
0:00:31 | i'm going to skip some of the by here but we probably more than a hundred and thirty papers |
---|
0:00:37 | yeah he's very active in the A ieee community as well |
---|
0:00:40 | he was a program called J for i two P C V P or twice two thousand and two thousand |
---|
0:00:46 | and one |
---|
0:00:46 | and he was there a general coaching for C P R in two thousand six |
---|
0:00:51 | is also active in their siggraph community |
---|
0:00:55 | here is that |
---|
0:00:56 | ieee technical achievement |
---|
0:00:59 | i |
---|
0:01:01 | became an I Q |
---|
0:01:04 | is an textbook |
---|
0:01:06 | yeah |
---|
0:01:09 | well i don't |
---|
0:01:12 | couple years ago |
---|
0:01:13 | cypress |
---|
0:01:16 | yeah |
---|
0:01:20 | thank you for those kind what |
---|
0:01:22 | so i was a little bit |
---|
0:01:27 | maybe we could |
---|
0:01:32 | nine out |
---|
0:01:33 | i |
---|
0:01:35 | being a vision |
---|
0:01:36 | talking |
---|
0:01:37 | speech |
---|
0:01:39 | can i |
---|
0:01:40 | identify the next |
---|
0:01:43 | and i just |
---|
0:01:45 | right |
---|
0:01:46 | it's in |
---|
0:01:47 | well |
---|
0:01:52 | i'm gonna talk probably about it |
---|
0:01:56 | a lot of cali |
---|
0:02:00 | yeah |
---|
0:02:02 | my colleagues |
---|
0:02:04 | them are a little from |
---|
0:02:08 | and correspond to X |
---|
0:02:11 | one form or not |
---|
0:02:12 | ollie hardy and dress i mean so that the teaching |
---|
0:02:19 | yeah |
---|
0:02:20 | oh |
---|
0:02:25 | oh |
---|
0:02:33 | and reconstruction is essentially you may a model |
---|
0:02:37 | from pictures or video other kinds |
---|
0:02:41 | and the recognition |
---|
0:02:43 | i think of recognition as being |
---|
0:02:44 | what is |
---|
0:02:46 | oh |
---|
0:02:47 | and it's |
---|
0:02:48 | oh |
---|
0:02:49 | it's gone |
---|
0:02:50 | being |
---|
0:02:50 | you know |
---|
0:02:53 | yeah patient of a small number of people |
---|
0:02:55 | to a very successful |
---|
0:02:58 | i don't |
---|
0:02:59 | and the massive applications we're we have a standard problem of academic field which is whatever something really works generates |
---|
0:03:07 | money |
---|
0:03:07 | we say that's not really what we do and ignore it |
---|
0:03:10 | but there are a whole bunch of those things that have spinal |
---|
0:03:13 | and we'll see some of those are |
---|
0:03:16 | i'm not gonna talk very much about reconstruction but i want to mention the state-of-the-art can |
---|
0:03:23 | in this study are if you have multiple can |
---|
0:03:26 | can get |
---|
0:03:27 | still be astonishing results huge geometric scale |
---|
0:03:31 | so if you walk around for example a quadrangle lots of big buildings and |
---|
0:03:37 | waving a video camera those buildings you can reconstruct the geometry of two centimetres or less |
---|
0:03:45 | and the reconstructions of holes city |
---|
0:03:47 | that had been prepared using those made the error gets a little bit bigger and it very largely automatic |
---|
0:03:54 | and furthermore you can put ten you can trying to pretend |
---|
0:03:58 | that a bunch of scattered images of the way i like that it's slightly harder something's it looks like you |
---|
0:04:05 | and so on but that's kind of and |
---|
0:04:08 | if you have a single picture |
---|
0:04:11 | it's much more difficult to reconstruct |
---|
0:04:13 | but you can get some progress actually in the recognition stuff i'm gonna talk about you'll see some of this |
---|
0:04:19 | so some of the things that tell you about the shape of the world might include the symmetry of objects |
---|
0:04:24 | in the world the stylised shape so later on we're gonna pretend that every room is about |
---|
0:04:29 | and that turned out to be a very useful assumption |
---|
0:04:32 | a contour information texture information shading in my |
---|
0:04:36 | can all tell us something about |
---|
0:04:38 | i'm gonna show you reconstruction maybe it's about seven years old now |
---|
0:04:42 | but gives you some sense of the state of the |
---|
0:04:44 | study are like this but big |
---|
0:04:46 | here's a movie all my cultural thing |
---|
0:04:50 | somewhere out there in the well and it's being video from a bunch of the right |
---|
0:04:56 | all this helps a lot from you can reconstruct an enormous number of points lying on the op |
---|
0:05:02 | and well all of those cameras the view that way |
---|
0:05:05 | i have render all the midi might have random order |
---|
0:05:09 | because that would make the rendering |
---|
0:05:11 | but you can see whether cameras when weather points and that's by standard methods this is the complete system |
---|
0:05:17 | you can join those points that in a second we're gonna do that |
---|
0:05:21 | to make a mess |
---|
0:05:22 | and the message will give you some yeah about how i |
---|
0:05:25 | the geometry |
---|
0:05:26 | points that look that |
---|
0:05:28 | yeah we have a nice |
---|
0:05:29 | and i wish i and has them into the mess you can really see we've got a tremendous amount of |
---|
0:05:34 | information about that you know |
---|
0:05:36 | the difference of the last seven years what i'm showing you now on what people do |
---|
0:05:40 | is that |
---|
0:05:41 | okay now that sort of thing is a quadrangle for the buildings or a city or something of that form |
---|
0:05:47 | here it's a subtract the law |
---|
0:05:49 | and that of course we can texture that meant mash and they are not really sweet |
---|
0:05:55 | we have a very cool |
---|
0:05:57 | physical reconstruction of what it's like which we could show to other people and we could use in our method |
---|
0:06:03 | reality applications you should see other applications here as well so if you wanna block downtown los angeles the colours |
---|
0:06:10 | a bit difficult to get but you can fly helicopter over it build a model of the model up in |
---|
0:06:15 | a movie on your phone |
---|
0:06:16 | and if you want to join in a movie sequence all some real live action |
---|
0:06:24 | to blowing up a model action you need to know what the camera part |
---|
0:06:27 | and we could do that as well |
---|
0:06:29 | so the tremendous applications looking behind this |
---|
0:06:33 | that's it for reconstruction i'm gonna talk mainly about recognition |
---|
0:06:37 | why do we care about visual object |
---|
0:06:40 | the answer is if you want to act in the world you have to draw distinctions |
---|
0:06:44 | and those distinctions could be or a very simple kind or a very complex car |
---|
0:06:49 | so if you would building a robot |
---|
0:06:52 | you have this great advantage of vision that it can predict the future |
---|
0:06:57 | you can look ahead of you and you can see things you haven't done yet and figure out what would |
---|
0:07:01 | happen |
---|
0:07:02 | is the ground so |
---|
0:07:04 | if it is maybe out oh my god is that person doing something dangerous |
---|
0:07:08 | does it matter if i run that object of |
---|
0:07:11 | which end of that object has is the shopping |
---|
0:07:14 | and these are really important questions when you |
---|
0:07:18 | now for information system |
---|
0:07:20 | it just really valuable to be able to search for pictures |
---|
0:07:24 | cluster pictures or the pictures to understand what they tell you |
---|
0:07:28 | and all of those the recognition functions you might not need to be really good recognition but you need to |
---|
0:07:34 | build descriptions of what's going on to support |
---|
0:07:37 | and of course the general engineering applications which are demonstrate in a second |
---|
0:07:42 | there is this universal fact about vision systems pretty much any animal that has vision has a recognition |
---|
0:07:49 | they are often pretty lousy so if you look into it by a horseshoe crabs identify female horseshoe crabs visually |
---|
0:07:57 | but what they're looking for you doc square |
---|
0:08:00 | if you build the right kind of dark square and leave it lying on the floor of the ocean a |
---|
0:08:06 | line of amorous males horseshoe crabs will build up behind because the vision system just isn't up to the job |
---|
0:08:12 | what you might not have right recognition but if you if you got this and you got recognition |
---|
0:08:17 | okay as an example of a more general engineering application of vision |
---|
0:08:21 | and i believe strain are not array on we'll talk about this on thursday as well probably in more detail |
---|
0:08:28 | imagine you watch a whole bunch of people |
---|
0:08:31 | and you manage to a bunch of stuff as well so you could look at the physiological mark as you |
---|
0:08:36 | could listen to the sounds and speech and you could watch him and the behaving naturally |
---|
0:08:41 | then what you could do is a bunch of things the first thing is |
---|
0:08:44 | if they behave in a way you don't want you could feed |
---|
0:08:48 | the other thing is you could screen |
---|
0:08:51 | so for example autism spectrum disorders is an affliction where if you catch it into written very ugly |
---|
0:09:00 | you sometimes have better chances of interventions it would be really nice to screen children very ugly |
---|
0:09:07 | in line and it would be very nice to screen every |
---|
0:09:10 | what you'd like to be able to do is to say this child needs to see someone who knows what |
---|
0:09:15 | to do in this child doesn't and you'd like to do that in a very low skill white |
---|
0:09:20 | well maybe what you could do is observe them behaving and say gee the need to see someone you can |
---|
0:09:25 | tell whether they're really |
---|
0:09:27 | and it turns out that models like this there are you can apply that story to in the home care |
---|
0:09:32 | to caff a demented patients to caff a stroke recovery |
---|
0:09:38 | building design and sound models like this look as though they're gonna be really fat |
---|
0:09:42 | and S F is put a bunch of money into the sort of thing on the expeditions program and we |
---|
0:09:47 | have good things will come |
---|
0:09:49 | here's another example you might want to take pictures and simply predict what |
---|
0:09:54 | why would you like to predict what tags well people like to search for pictures with words lots of pictures |
---|
0:09:59 | don't come with words attached what you might do is look at the picture and say based on various classification |
---|
0:10:05 | machinery and on what i know about how words are correlated |
---|
0:10:09 | and so on give me a bunch of word text to associate the picture that would be you |
---|
0:10:14 | and the state-of-the-art in this activity is moderately advanced you get we have very good experimental methods |
---|
0:10:21 | we're getting |
---|
0:10:23 | if you actually retrieve images based on predicted word tags you can get estimates as in the third |
---|
0:10:29 | which may not sound a let impressive bowl ten years ago they were in the three percent so you know |
---|
0:10:35 | it's an order of magnitude which is wonderful and this look this is genuinely useful in |
---|
0:10:41 | but words and pictures affect one another and much more complex ways |
---|
0:10:45 | so there are many interesting problems that are just sort of the merging |
---|
0:10:49 | from the presence of word and picture datasets this is example due to tamara the you'll see in my these |
---|
0:10:55 | approaches from catalogues and their descriptions underneath them all the things in the picture |
---|
0:11:00 | oh there are another two existing vision mechanisms for saying that the thing in the picture is an adorable people |
---|
0:11:09 | telecom |
---|
0:11:10 | but we just don't know how to do that |
---|
0:11:12 | the first instinct problem that arises from that is if you had a whole bunch of catalogues you might actually |
---|
0:11:17 | be able to fish phrases out of the text |
---|
0:11:22 | a fish descriptions out of the pictures and build classifiers that could predict adorable people |
---|
0:11:27 | this something else going on that you re these description |
---|
0:11:30 | the fairly comprehensive descriptions of the object |
---|
0:11:34 | but they don't tell you what colour they are |
---|
0:11:36 | and furthermore gonna tell you what colour the session on the breast |
---|
0:11:40 | and the reason they don't at that is it's a blindingly obvious from the picture does not point |
---|
0:11:46 | but from our perspective if we're looking for things all searching for things or doing things like recommending things to |
---|
0:11:52 | customer |
---|
0:11:54 | being able to push |
---|
0:11:56 | information jointly or i check and a description might add real data |
---|
0:12:04 | okay |
---|
0:12:05 | so getting to the end of the kind of summary of vision and i'll show you some stuff about recognition |
---|
0:12:11 | i was asked to describe just recently what every vision person should not and it's useful "'cause" it gives you |
---|
0:12:16 | a flavour of the distance |
---|
0:12:18 | the big thing is that vision is really useful it's really hot and it's still really poorly on this |
---|
0:12:24 | it's very helpful to know a bunch of |
---|
0:12:27 | it's also very helpful to know a bunch of scepticism in hot probably understood disciplines is always somebody who comes |
---|
0:12:33 | along |
---|
0:12:34 | with a revolutionary new solution and that's come along every five years or so and then they go away so |
---|
0:12:40 | a moderate degree of scepticism is available but is valuable |
---|
0:12:45 | opportunism a simple |
---|
0:12:47 | right so a vision is difficult because you need you need to know a lot of stuff |
---|
0:12:52 | and there's a lot of evidence that the knowledge of any one thing doesn't seem to help much |
---|
0:12:56 | the really are a lot of different ideas that are just sort of boiled together and we'll see some |
---|
0:13:02 | however the main thing is to know the general principles of its |
---|
0:13:05 | and that is you can deduce from evolutionary examples and what has been successful in computer vision that outfit on |
---|
0:13:12 | the slides us to come on the next |
---|
0:13:15 | there aren't |
---|
0:13:16 | well it's not a subject that has general print |
---|
0:13:20 | it's just one of those things |
---|
0:13:22 | anybody who offers you a general principle is either a fool or a liar and you can you can make |
---|
0:13:27 | your |
---|
0:13:29 | so now i'm gonna set up a series of discussions about our state and recognition i like to do this |
---|
0:13:35 | with a conclusion "'cause" then we know where we're going so the first thing is object recognition is subtle but |
---|
0:13:41 | we actually have really strong methods of what really quite well |
---|
0:13:45 | based on class |
---|
0:13:47 | so rather loosely we could believe this about object tracking |
---|
0:13:51 | the object categories of fixed and known this is a cat that's account that's a motor car every object belongs |
---|
0:13:59 | to one category in there are K of the |
---|
0:14:02 | that you can get good training data so i've got a hundred pictures of cats hundred pictures of cows a |
---|
0:14:07 | hundred pictures about it "'cause" |
---|
0:14:09 | and then object recognition sort of turns into K way classification |
---|
0:14:14 | and it will turn out the detection turns into lots of task |
---|
0:14:18 | in that belief space which has been very valuable there's an actual programme of research you get i'd say you |
---|
0:14:25 | bang together a bunch of features you do better fitting with classifiers and you produce a represent |
---|
0:14:32 | and that strategy has been amazingly back it's very |
---|
0:14:36 | we could it features |
---|
0:14:37 | so the summary of about ten years work in features use the to really input |
---|
0:14:44 | one is features need to be illumination invariant so when the lights gets right to the features shouldn't change all |
---|
0:14:51 | that much and there's an easy way to do that which is to look at the orientations of image great |
---|
0:14:56 | a second big principle is you never the object is never quite where you think it is in the image |
---|
0:15:02 | it's away shifted around a little bit and that means if you look at the image gradient at a particular |
---|
0:15:07 | point |
---|
0:15:08 | you're not gonna do well |
---|
0:15:09 | instead you want to look at local pools of image gradients |
---|
0:15:13 | or histograms of orientation |
---|
0:15:16 | and it turns out if you take those stupid |
---|
0:15:18 | suppose |
---|
0:15:20 | and you can see in a fairly natural fashion in development |
---|
0:15:24 | then you get hogan sift feature |
---|
0:15:26 | i've shown in here for a series of different pictures on the left you on the one side sorry i |
---|
0:15:32 | get the right mix that you'll see a woman with a bicycle and then show next to it is i |
---|
0:15:37 | features style representation each of those little balls are basically histograms of gradient orientation in a little ball |
---|
0:15:45 | so what we're saying is at the top of that in |
---|
0:15:49 | the gradient orientations go in pretty much every direction in local but |
---|
0:15:54 | but then when we get down to the sides of the women |
---|
0:15:56 | there are lots of gradients |
---|
0:15:58 | that there are lots of kampala along the side of the |
---|
0:16:03 | the gradients of |
---|
0:16:04 | yeah |
---|
0:16:05 | and adaptive contrast around the bicycle |
---|
0:16:08 | and again in this room with the traffic you can see in the tree is the brightest guy in all |
---|
0:16:13 | directions |
---|
0:16:14 | but a round the colour |
---|
0:16:16 | they have |
---|
0:16:18 | again in this picture the bicycle down the bottom |
---|
0:16:21 | see the rough structure of the wheels on the frame reflected in those patterns of boring |
---|
0:16:27 | and essentially what we do is take this information and buying it in a class |
---|
0:16:31 | when we do this we get really quite good results |
---|
0:16:34 | rather good at |
---|
0:16:37 | "'kay" this kind of K Y classication running up to K A a couple of hundred |
---|
0:16:41 | when we get into the ten thousands things get very interesting but |
---|
0:16:45 | you know we'll set |
---|
0:16:47 | and they're a standard item datasets for investigating methods and features you can take one O one for example you |
---|
0:16:54 | a set of pictures of a hundred different categories one hundred one different categories |
---|
0:16:59 | the pig somewhat at random from a selection of useful looking categories and the main thing here use the error |
---|
0:17:06 | right the number of classication a ten |
---|
0:17:08 | while is now likely about twenty percent |
---|
0:17:13 | if you stick a picture of an isolated object in the caltech one O one list of object into a |
---|
0:17:19 | good model method you're likely to get the right now |
---|
0:17:23 | and if the collection of categories you know about is somewhat bigger you are not as likely to get the |
---|
0:17:28 | right answer out so the accuracy runs up to the fifties if one's very likely |
---|
0:17:32 | and has lots of training examples but you still got a really good chance of getting the right answer |
---|
0:17:38 | so there are some problems we could do quite well |
---|
0:17:41 | and this machinery extends |
---|
0:17:43 | really very complicated and non obvious judge |
---|
0:17:47 | so you can extend these features to work in space time |
---|
0:17:51 | and then what people do now is like take movies |
---|
0:17:56 | they get the script of the movie that's marked up with time codes by and sees the S on the |
---|
0:18:01 | internet |
---|
0:18:02 | the time align these two and then say okay here |
---|
0:18:06 | shen description in the script look for some features a round that are distinctive in the movie trying to classify |
---|
0:18:12 | like that |
---|
0:18:13 | and then run it on something |
---|
0:18:15 | and you can get really quite effective actions part is like that for complex actions like hans and the fun |
---|
0:18:22 | getting out of the hugging kissing sitting down |
---|
0:18:25 | on the top production and a bunch of true positive |
---|
0:18:28 | on the second row a bunch of tuna |
---|
0:18:31 | on the third row some false positive so if you look at the onset and false positive for example the |
---|
0:18:36 | guy on the bed leaning to the side |
---|
0:18:38 | looks as though he could be sitting on a bed on string of fine you just doesn't actually have a |
---|
0:18:43 | phone and |
---|
0:18:45 | right and then of course there are also like |
---|
0:18:47 | people also in the fine in unusual circumstances where distance |
---|
0:18:51 | so it's machinery extends to really quite complicated |
---|
0:18:55 | this machinery can also be used for detection |
---|
0:18:58 | so the way you detect with a classifier used imagine i have a picture with some interesting things in it |
---|
0:19:03 | that i want to detect |
---|
0:19:05 | what i'm gonna do is take a window of the all that in mind |
---|
0:19:09 | oh correct illumination an estimate orientation and then button in that window and to classify and say yes on the |
---|
0:19:16 | and then i'll go to the next window and i'll say yes or no not keep doing that |
---|
0:19:20 | i don't find the best detection responses if the good enough also write it so that |
---|
0:19:25 | if i want to find a big one i'll make the in a small and search it with a fixed |
---|
0:19:30 | sized window again |
---|
0:19:32 | if i wanna find a small one i'll look at a very high resolution version of the image |
---|
0:19:37 | this recipe again this amazingly successful we are really quite good at detecting moderately complicated |
---|
0:19:44 | standard detector has |
---|
0:19:46 | some |
---|
0:19:47 | additional complexity attached to this description |
---|
0:19:50 | yeah additional complexity use these little yellow box |
---|
0:19:54 | if you look at these columns each column displays the behaviour of the standard detection on at the different categories |
---|
0:20:01 | so the first column so i run i'm getting my rise mixed |
---|
0:20:05 | the first row use human detection the second rows vocal detection and the third row discarded |
---|
0:20:11 | in the first row you'll see that going back to step in front of the train |
---|
0:20:15 | as how to learn a little like blue box placed on top of him |
---|
0:20:19 | with yellow so |
---|
0:20:20 | then is a big group of people which has been incorrectly counted one of them is minutes |
---|
0:20:26 | but most of them have boxes on top of them and we know that there are people |
---|
0:20:31 | in the third column of the first row |
---|
0:20:34 | you see somebody hiding behind a bush |
---|
0:20:37 | he's had a box placed on top of them the obvious monty python joke is so obvious is not with |
---|
0:20:42 | mike |
---|
0:20:44 | as my colleague rubber cholesky is the site it's claudia cutting edge the detectors on perfect and that she has |
---|
0:20:52 | been marked as a pet |
---|
0:20:54 | in the second row you'll see martin best bottle detectors on the go we're pretty good at detecting bottles we |
---|
0:21:01 | can find them even if they're in people's hands or on tables but we get bottles and people mixed up |
---|
0:21:07 | a quite good reasons detectors really like |
---|
0:21:11 | strong |
---|
0:21:12 | identifiable high contrast curves |
---|
0:21:14 | people have them around the head and shoulders started bottles and they tend to look the same |
---|
0:21:19 | right so human humans and bottles often get mixed |
---|
0:21:22 | we're also very good at detecting "'cause" and we case they get the mixed up with buses which is no |
---|
0:21:27 | not |
---|
0:21:28 | the i referred to have a carry sort of the standard technology you can download and run the code it's |
---|
0:21:34 | all very established and it's widely you |
---|
0:21:39 | a problem with the belief space about recognition that i described is that is beginning to come apart at the |
---|
0:21:44 | seams because most of billy's obvious notes |
---|
0:21:47 | right that's just not true |
---|
0:21:49 | object belong to multiple categories a good training data might be very hot to get and that present serious problem |
---|
0:21:57 | C has one example i think is all mine |
---|
0:22:05 | well i you like what it's is usually got into |
---|
0:22:07 | i know |
---|
0:22:11 | no i went to that audience is usually going to vapour lock some roundabout this point because they know i'm |
---|
0:22:17 | gonna get them from the side but they don't know which side i'm gonna get |
---|
0:22:20 | okay so if you look at them depending on what you please the could it might easily could not the |
---|
0:22:25 | first one is in fact a mighty size the second and the fourth isn't i |
---|
0:22:31 | right the but it is in fact the monkey i had to check this i'm not that good on product |
---|
0:22:35 | a taxonomy but most of these are i |
---|
0:22:38 | and the one on the bottom row in the second column is a little plastic toy |
---|
0:22:42 | right so the whole point about categorization here use the concept okay |
---|
0:22:48 | i think this can belong to more than one category at the same time perfectly |
---|
0:22:53 | so what we've inherited from the point of view are described few |
---|
0:22:56 | is a tremendous amount of information about feature computation construction |
---|
0:23:01 | we're really good at building and managing and using classifiers |
---|
0:23:05 | and a lot of practise it improves |
---|
0:23:08 | but this is really evil subtleties that yeah and the next thing is to describe some of the efforts to |
---|
0:23:13 | deal |
---|
0:23:14 | so the big questions the really big questions of computer vision that are in play right now |
---|
0:23:20 | what signal representations should we you |
---|
0:23:23 | this sort of at the early level before you get the classifieds and learning stuff |
---|
0:23:28 | some extent models what aspects of the world should we represent and how should we represent |
---|
0:23:34 | and then the other which is what should we say about pictures |
---|
0:23:37 | and those three questions are really very difficult in the |
---|
0:23:41 | so let's start looking and |
---|
0:23:43 | the coming technologies on the nasty problem |
---|
0:23:46 | one big issue is the unfamiliar |
---|
0:23:49 | the recipe i described you really just doesn't deal with the un from |
---|
0:23:53 | let me show you a little movie of somebody doing something |
---|
0:23:56 | almost certainly you've never seen people doing this before it doesn't happen every day |
---|
0:24:01 | and at the same time it doesn't really present you with any problem |
---|
0:24:04 | right it's not you might not have a word to describe it but you know what's going on then that's |
---|
0:24:09 | fine |
---|
0:24:11 | yes another more extreme example of something where you really don't see this every day |
---|
0:24:16 | but you can still watch it and it's just all us what's going on |
---|
0:24:21 | and even that at this point even the donkeys |
---|
0:24:25 | accustomed to |
---|
0:24:26 | i mean done |
---|
0:24:27 | treat this as |
---|
0:24:29 | because you don't have training |
---|
0:24:31 | you can deal with the unfamiliar in satisfactory ways and you probably have put together in your mind a little |
---|
0:24:37 | narrative of what's going on and why they're doing what they're doing and it's all over and they can get |
---|
0:24:41 | on that |
---|
0:24:43 | now that's a really but |
---|
0:24:44 | fling thing |
---|
0:24:45 | from the perspective i described to you we just have no approach |
---|
0:24:50 | there are methods that you can you so you can you can take |
---|
0:24:54 | the stuff i described in rewrite |
---|
0:24:56 | yeah as a an architecture that people are using quite a lot i take a picture ideas and feature selection |
---|
0:25:03 | and stuff and instead of building classifiers that side but |
---|
0:25:07 | i build a bunch of classifiers that say the picture has a peak and it it's got an I and |
---|
0:25:12 | it it's gonna for and it's got a |
---|
0:25:16 | the reason i would think that is if i ran into something else i might not know what it was |
---|
0:25:21 | but i could say oh okay it's got that it might be a feather dust or a but that's got |
---|
0:25:26 | that is so i can say something useful about |
---|
0:25:30 | this is kind of neat because you can then build systems that can make predictions for objects that never seen |
---|
0:25:36 | before |
---|
0:25:36 | where they haven't even seen that |
---|
0:25:38 | degree of that type of object |
---|
0:25:40 | on the slide the little yellow boxes are the spatial basis of the predictions in the image a and underneath |
---|
0:25:47 | them are prediction so that rather baffled look man here but there it's reported is having a kid having an |
---|
0:25:53 | yeah having a snout having a nose and having a man |
---|
0:25:57 | it would be an able to say something useful about something we'd never seen |
---|
0:26:02 | it's harder to get these predictions right |
---|
0:26:05 | you can see on that yeah right for example it is it's gotta tile it's gotta snout it's gotta lay |
---|
0:26:11 | it also say it's got text on it and its might apply |
---|
0:26:15 | and it's is got text on it because text is characterized by little dark and white stripes next to each |
---|
0:26:20 | other and plastic is characterized wonderful bright |
---|
0:26:23 | so the these predictions are hot to make but you can make |
---|
0:26:28 | the other neat thing about this architecture is if you happened to have seen lots of the |
---|
0:26:34 | it's relatively straightforward to add something else it says okay this really is about |
---|
0:26:38 | and that again that's in the whole recipe of classification that i describe |
---|
0:26:43 | if i say that i can also look at that list of attributes and say well gee it's a but |
---|
0:26:49 | something's missing or something's extra |
---|
0:26:51 | some known objects things that i know about whose names are now could be unfamiliar by being different from the |
---|
0:26:58 | typical |
---|
0:26:59 | and if they are different from the typical it's worth mention |
---|
0:27:03 | we can build systems that do that as well essentially if we really sure it's the object and we really |
---|
0:27:08 | sure it has a missing attribute or an extra attribute we can say it |
---|
0:27:12 | so i think yeah i have a bunch of examples from one recent system the semantics of attributes all messed |
---|
0:27:18 | up so that the down there was one is reported is not having a tail not because this compelling evidence |
---|
0:27:25 | that it is a tale this one but because we can see that i we haven't that little detail hasn't |
---|
0:27:30 | been sorted out |
---|
0:27:31 | that aeroplane as reported is not having a jet engine |
---|
0:27:34 | and gloriously but this on the friday she had done like that sheep is reported is not having well |
---|
0:27:41 | what it has in fact been sure |
---|
0:27:44 | and you can report extra stuff as well again you know there was a two questions the semantics that need |
---|
0:27:49 | to be sorted out here that the in the little yellow box on the end there is reported as having |
---|
0:27:54 | an extra lee |
---|
0:27:55 | no but is never have actually so one should |
---|
0:27:58 | have some more complex interpretation sitting on top but there's a bicycle with whole on an aeroplane with a big |
---|
0:28:06 | and a bus with a fine |
---|
0:28:08 | well within the sort of extra special features of the object and we can report |
---|
0:28:14 | no one nice thing about this is |
---|
0:28:17 | joe asked recently there are technologies emerging that say some regions in images actually would like to be not |
---|
0:28:25 | so if the region would like to be an object then what we can do is take collect attribute machinery |
---|
0:28:30 | catch it to the region that would like to be an object and reported description and that sort of stuff |
---|
0:28:35 | is being discussed in the hallways but doesn't yet |
---|
0:28:40 | yeah the second interesting and disturbing thing about modern vision like we coded visual phrase |
---|
0:28:45 | so meaning comes in class |
---|
0:28:48 | i talked about object recognition is something where you spot individual objects |
---|
0:28:53 | but it's really hot to talk sense about what it means to be or not |
---|
0:28:58 | so if you look at this can honestly you could think about that as an object because if you fish |
---|
0:29:03 | around in your head you could come up with a single word to describe that's a flat |
---|
0:29:08 | but it isn't one thing or two |
---|
0:29:10 | what should we cut her off the slack and then sort of think of the person as a person than |
---|
0:29:15 | the slate is a slight that way lies madness because we can also kind of a head inside sick a |
---|
0:29:20 | kind of a jacket and say it's a jacket kind of issues inside shoes and so on |
---|
0:29:25 | so what we might want to do is just sort of excel |
---|
0:29:28 | that is a chunk of meaning of a yeah represented by what many people would think of as at least |
---|
0:29:33 | two or |
---|
0:29:36 | as a precedent for this we think a common notion envision is that of a C |
---|
0:29:41 | so it's a likely stage where particular kinds of object a particular kinds of activity might occur things the things |
---|
0:29:49 | like box rooms or greenhouses or playgrounds or bedroom |
---|
0:29:54 | and we really quite go to classes |
---|
0:29:56 | so you can use the procedure i describe we previously you get a bunch of labeled images of scenes you |
---|
0:30:03 | compute some features you button a minute classifier and it turns out you could be really good at saying that's |
---|
0:30:08 | a picture of a bathroom that's a picture of a boring that's a picture of a clock |
---|
0:30:14 | and the advantage of doing that is you have some idea of the kind of things that might happen |
---|
0:30:19 | so we've known since the early nineteen |
---|
0:30:22 | but if you get the scene right |
---|
0:30:25 | you can predict where to look for objects |
---|
0:30:28 | and although you can't get it right because so i've sent to examples you have from the rubber stuff |
---|
0:30:33 | one is an outdoor scene where you know we predict on the top row the buildings are sort of on |
---|
0:30:40 | the top and street is on the bottom and trees of vertical and they might be in front of you |
---|
0:30:45 | and the spline tends to be on the top and the cause will tend to be on the side of |
---|
0:30:49 | the middle and so |
---|
0:30:51 | not sure that all of these predictions are right there aren't any "'cause" i |
---|
0:30:55 | but they tell you where to look for "'cause" if they work |
---|
0:30:59 | and that seems to be help |
---|
0:31:01 | yes thinking about scenes is currently we talked about meeting is coming in class at two scales |
---|
0:31:07 | one scale is the scene the whole image |
---|
0:31:10 | and the other is individual objects all over a little like is to what it means to be an all |
---|
0:31:14 | and it turns out very recently that is |
---|
0:31:17 | come good practical evidence that the might exist useful clumps of meaning between the scene and the visual and the |
---|
0:31:24 | object and these are referred to as visual phrase |
---|
0:31:27 | the compass |
---|
0:31:29 | so the compass it's where the compass it is easier to recognise in that spot |
---|
0:31:34 | so one useful visual phrases a person drinking from a bottle |
---|
0:31:38 | it turns out it's much easier to detect a person drinking from a bottle that it is to detect a |
---|
0:31:44 | person or to detect above |
---|
0:31:46 | because people who drink from bottles do special thing |
---|
0:31:50 | right they hold them are they a don't special configurations and the law |
---|
0:31:55 | the same goes for things like this and riding a bicycle it's much easier to detect a whole person riding |
---|
0:32:00 | a bicycle that it is to detect the person in the bicycle and then reason about spatial relation |
---|
0:32:06 | because the appearance is constrained by the relation |
---|
0:32:10 | so when you bill when you have this observation then you get into a serious mess about what to report |
---|
0:32:17 | about an image |
---|
0:32:18 | so we might build a person detector we might build a host detect and we might also build a person |
---|
0:32:24 | writing a whole |
---|
0:32:26 | we have to figure out which if any of them is right if we're really lucky the person riding a |
---|
0:32:30 | horse detectable report in the same place |
---|
0:32:33 | as the person detector on the host detector and we have to figure out just how many people just how |
---|
0:32:37 | many pulses and just how many people riding horse |
---|
0:32:40 | so what we do is rack up a whole bunch of detectors |
---|
0:32:44 | and then go through a second phrase which is currently right phase which is currently referred to as decoding where |
---|
0:32:49 | we say |
---|
0:32:51 | based on all of the evidence of the detectors i'm willing to believe you and you |
---|
0:32:57 | and that judgement is again a discriminant a judgement we essentially take the responses of nearby detectors report them to |
---|
0:33:05 | the current detector and construct a second classify which is should we believe you |
---|
0:33:10 | and you can get quite good ounces of that procedure |
---|
0:33:14 | it turns out they help quite a lot |
---|
0:33:16 | so if you look at the picture of the |
---|
0:33:19 | the top row pictures these a detector responses without any decoding a global vision of what's going on a you |
---|
0:33:26 | can see a sofa and a bunch of people on the set to go very small |
---|
0:33:31 | if one then says okay i'm gonna look at the totality of detect which includes more than that |
---|
0:33:37 | and try and find a consistent selection that makes and then you get a side because you gotta so that |
---|
0:33:43 | there's a fair amount of evidence that you got a dog lying on the sofa because you got something it |
---|
0:33:47 | looks a bit like a personal but like a dog and you got a dog lying on the sofa and |
---|
0:33:51 | that's also a dog |
---|
0:33:53 | you can significantly improve detection procedures by this kind of global view |
---|
0:34:00 | another thing that gives a global view that significantly improves detection performance and scene understanding is john |
---|
0:34:08 | so if we know there's something about the geometry |
---|
0:34:11 | we can really improve detectors so on than the one side with the blue line on it i have an |
---|
0:34:18 | image with the horizon and |
---|
0:34:19 | i want to build a pedestrian detector you can see the boxes around pedestrians |
---|
0:34:24 | and cost |
---|
0:34:25 | now the thing about how right |
---|
0:34:27 | is in perspective cameras things that get closer to the horizon from below must be smaller |
---|
0:34:34 | otherwise the bigger in three D |
---|
0:34:36 | what that means is if i wanted to K |
---|
0:34:39 | i pedestrian and i think it's a big one you have to be lower in the image |
---|
0:34:44 | and the small ones have to be i |
---|
0:34:47 | furthermore if i get something pedestrian detector responses |
---|
0:34:51 | i can look at them and say well the big ones they care in the small ones of that helps |
---|
0:34:55 | me estimate the rice |
---|
0:34:57 | and if i estimate the horizon and my reports joint |
---|
0:35:02 | i can get much better re response |
---|
0:35:05 | so for example on the top row of the local detections or the yellow ones of pedestrians the green ones |
---|
0:35:11 | that cause and we just tested against a threshold and all but just sort of band perry pedestrians hovering in |
---|
0:35:17 | the sky |
---|
0:35:19 | but from that and other detector information we can estimate a horizon |
---|
0:35:23 | pedestrians have the feet on the ground most of the time and that just rolls out all those false positives |
---|
0:35:29 | up that and it rolls in some small detects the close to the horizon because they're about the right |
---|
0:35:36 | similarly if we go looking and scene with "'cause" and people in |
---|
0:35:41 | you'll notice the this is the one on the bottom |
---|
0:35:44 | by estimating the horizon |
---|
0:35:46 | several detect the responses the little dotted red ones for the pedestrians |
---|
0:35:51 | have gotten back |
---|
0:35:52 | because we know that even though the image data didn't look all that great it really is it the right |
---|
0:35:59 | size in the right place to be a pedestrian and that gives us just a little bit more calm |
---|
0:36:05 | no geometry is wonderful stuff the roles of the geometric estimates that are making detection better right now |
---|
0:36:12 | one thing is you can pretend that the room is a box |
---|
0:36:15 | when using a variety of standard method |
---|
0:36:18 | you can then estimate the box even if the room isn't exactly |
---|
0:36:22 | you can then estimate the box and when you estimate the box you can get some idea of whether flores |
---|
0:36:27 | so over there we got a room with a box painted on it you think about isn't quite right |
---|
0:36:32 | nonetheless because you got the box we can figure out what the walls look like what the floor looks like |
---|
0:36:37 | and what the ceiling |
---|
0:36:38 | so the rate is one will lose another wall the yellow is that the wall the green is the floor |
---|
0:36:46 | so i the blues the ceiling and the particle is stuff that use none of the above what we call |
---|
0:36:52 | class |
---|
0:36:53 | things that you might bump into and such |
---|
0:36:56 | another thing way that you could benefit from so firstly we gave an account of free space |
---|
0:37:02 | but another thing you could do is you could take that and you could say well because i know the |
---|
0:37:06 | box |
---|
0:37:07 | i can use standard methods to ask |
---|
0:37:09 | what the what would faces all boxes inside the room look like |
---|
0:37:15 | if i looked at them front |
---|
0:37:17 | so if i want to build a better detect and it turns out the people who did this budget head |
---|
0:37:21 | down |
---|
0:37:22 | collings actually have the world's best ad detector which sounds like sort of a slightly eccentric thing to have but |
---|
0:37:27 | there's a principle here and you'll see it being useful in a second if i want to build a good |
---|
0:37:31 | bet detect if i just look at images |
---|
0:37:34 | i have to deal with the fact that the band might appear at different orientation |
---|
0:37:39 | and because it appears at different orientations it's going to look at |
---|
0:37:43 | but if i know the box of the rhythm i can say bad so axis along |
---|
0:37:48 | they have one they typically have one face against the wall of the room |
---|
0:37:51 | therefore i'm gonna write take the box of the room so the faces of the bed of frontal |
---|
0:37:57 | and i can now remove some |
---|
0:37:59 | source of ambiguity in my features and build a better detect |
---|
0:38:04 | now the thing that's nice about that |
---|
0:38:06 | is when you know whether babies you want to know something about where the room |
---|
0:38:10 | because they do not penetrate the walls of room |
---|
0:38:14 | so what i can if i |
---|
0:38:15 | do is estimate the room and the bands simultaneously and come up with quite good estimates as to whether furniture |
---|
0:38:23 | is in free pictures the room so every here at the top you see a and estimated ball |
---|
0:38:29 | in the middle you see a bad that's estimated without thinking about where the box with without re-estimating the box |
---|
0:38:36 | and at the bottom you see a joint estimate or bed and room box |
---|
0:38:40 | and that jointly estimate is used somewhat better than the it's sort of three or four percent but it's way |
---|
0:38:47 | oh the nice thing about box |
---|
0:38:49 | is you can do other things with them as well |
---|
0:38:51 | so very recently kevin "'cause" has shown |
---|
0:38:54 | that if you know the box of a room you can figure out whether like so |
---|
0:38:59 | and you can figure out what the L B O is on the sides of the room whether it's black |
---|
0:39:02 | or white to right |
---|
0:39:04 | or green or red |
---|
0:39:06 | and if that's the case and you know with authority is you can stick out the stuff into the room |
---|
0:39:12 | so i'll go backwards and forwards |
---|
0:39:13 | we put some pieces of computer graphics chat in the room and you'll notice that statue is behind the ottoman |
---|
0:39:21 | and as a result is occluded and the lighting is wrong |
---|
0:39:25 | oh i think about this which is kind of fun is if you can do it for a static thing |
---|
0:39:29 | you can do it for moving stuff |
---|
0:39:31 | so here's a picture of a billiard room from |
---|
0:39:34 | like and you can just play but it's on the ballot |
---|
0:39:39 | yes another picture from flick and so everything i'm showing you come from a single picture but |
---|
0:39:43 | yes another picture from like a and a little glowing bowl manage to get into the picture and is going |
---|
0:39:48 | to explore it |
---|
0:39:49 | you'll notice it gets reflected in them there |
---|
0:39:52 | it costs shadows the way it should |
---|
0:39:55 | and when it flies under the table is more like twitch |
---|
0:39:59 | so these kind of simple geometric inference |
---|
0:40:02 | can support amazing functions the usefulness of this is pretty obvious you can stick furniture into pictures of your and |
---|
0:40:09 | living room |
---|
0:40:10 | if you're inclined to do such things you can should aliens in your or |
---|
0:40:15 | dining room on a computing |
---|
0:40:19 | so let's look at the last sort of begin puzzling principal that's kind of a merging in modern vision |
---|
0:40:24 | and then a selection |
---|
0:40:26 | what should we say |
---|
0:40:28 | so a couple of years ago judy of how can my went out collected a whole bunch of images |
---|
0:40:34 | and then set them on mechanical turk got people want to pretty qualifies english speakers |
---|
0:40:40 | this is kind of important otherwise things get a bit funny and ask them to write a sentence about the |
---|
0:40:44 | pig |
---|
0:40:45 | and then what you do is you get multiple sentences about a single picture |
---|
0:40:50 | and you look at a sentence |
---|
0:40:52 | and just start playing thing about those sentences is that can see |
---|
0:40:57 | people presented with this picture talk about two girls sitting and talking they one of them is holding something that |
---|
0:41:04 | chanting the wearing jeans but that'd talk about the step |
---|
0:41:08 | that i talk about the specular reflections in the window at the back of the image that i talk about |
---|
0:41:13 | the two people in that when the that'd talk about the chewing gum on the ground |
---|
0:41:18 | the capable of looking at this thing and saying this is important |
---|
0:41:22 | this is what's worth mentioning and this is |
---|
0:41:25 | and the moderate beacons |
---|
0:41:27 | not understanding that is terribly important than the reason it's important |
---|
0:41:30 | pictures are all about |
---|
0:41:32 | and if you model is on the record every object in the picture then you're dead because you report is |
---|
0:41:37 | too big |
---|
0:41:37 | so we need to know what's what |
---|
0:41:41 | we can do some of those |
---|
0:41:42 | is this a fair amount of work on predicting sentence level descriptions of images or video |
---|
0:41:48 | so for example have enough got turned colleagues took video all baseball game |
---|
0:41:54 | and they use method similar to the discriminative methods i described to identify who's kidding who was catching who was |
---|
0:42:01 | running |
---|
0:42:02 | no they also build a little |
---|
0:42:05 | a generative model of baseball essentially you can do this and then that once you've done this could happen or |
---|
0:42:11 | that could happen all that |
---|
0:42:13 | and you can think of it as being represented by a tree of events and some surgical rules that allow |
---|
0:42:19 | you to rearrange the tree and then what you do is you say okay i've got these detector response |
---|
0:42:25 | these are the structural rules of the game let me generate a structure that explains those responses and of course |
---|
0:42:31 | of course if i can generate that structure i can generate things that you know without close inspection look like |
---|
0:42:38 | described again |
---|
0:42:40 | no sportscaster would emit something that's as pitch approaches the ball before batting yet that it's and then simultaneously better |
---|
0:42:48 | runs the base and feel the runs towards the ball feel that catches the ball and it it's not the |
---|
0:42:53 | way people talk |
---|
0:42:55 | at the same level at the same time it |
---|
0:42:57 | a description of what's going on |
---|
0:42:59 | that you could use to produce something that scene |
---|
0:43:03 | and it's a fairly detailed description of what's |
---|
0:43:07 | we can generate sentences for over three pictures although it's still a bit rough and ready there are methods that |
---|
0:43:14 | essentially say i go from an image space to some sort of intermediate space of detector response |
---|
0:43:21 | and then i'll go from a sentence space |
---|
0:43:23 | to some intermediate space of detector responses and then i would try and line a sentences and images in that |
---|
0:43:30 | space and report the best matching set |
---|
0:43:33 | the kind of results one gets a channel with yeah so that top picture the detectors are paying sleep on |
---|
0:43:41 | ground animals sleep and ground gold standard ground and the kind of sentences one sees generated a see something and |
---|
0:43:48 | expect |
---|
0:43:50 | "'kay" so people remark things up into account the thing to say the least |
---|
0:43:53 | but you might also get counted grass field which is not that it's a shape but you know it's not |
---|
0:43:58 | bad guy |
---|
0:44:00 | the third one down |
---|
0:44:03 | a man stands next to train on a cloudy day it looks like a wonderful |
---|
0:44:08 | if you raise the money and that it's actually a one |
---|
0:44:10 | so we did you know you can make minor mistakes because sentences a really calm |
---|
0:44:15 | sources of information and sometimes you make houses so this is not in fact i'd that laptop connected to a |
---|
0:44:22 | black belmont there really isn't all that much black on the on the four |
---|
0:44:27 | the sentence is more recently tamara but enrich them significantly like joining a this machinery to machinery about attributes |
---|
0:44:36 | and was able to produce again you know we're not doing sentence generation you know the should be fairly obvious |
---|
0:44:41 | from this end |
---|
0:44:43 | a descriptions of pictures that look like this |
---|
0:44:45 | there were two aeroplane the first shiny aeroplane is near the second |
---|
0:44:50 | again we're not in sentence generation |
---|
0:44:52 | but if you did do sentence generation you might see there's enough meaning that's been extracted from the image that |
---|
0:44:57 | you could turn it into a reasonable for |
---|
0:45:01 | they're all one dining table one chair into windows wouldn't dining table is by the wooden chair and against the |
---|
0:45:06 | first when the noise |
---|
0:45:08 | kind of objection you would right is to that is too much information and not selection as opposed to |
---|
0:45:15 | it's wrong |
---|
0:45:17 | okay now i'm gonna show you a movie too |
---|
0:45:21 | illustrate how far the side your selection seems to go in human vision it's a fairly wrenching movies the first |
---|
0:45:27 | thing is just to warn you that nobody was a watch |
---|
0:45:30 | watch one yeah |
---|
0:45:32 | and then we'll think about so it's clearly a surveillance movie on a train that |
---|
0:45:47 | and that's not there as it gets interesting |
---|
0:45:50 | okay yeah the question how many adults were on the platform and what were they doing |
---|
0:45:56 | right do not right i no audience so or is always give sort of a variety of answers it somewhere |
---|
0:46:03 | in the two seven range it's just not in it |
---|
0:46:07 | you look at that thing in it is clear what's important and it's clear what's not important then you really |
---|
0:46:11 | good at climbing on simple |
---|
0:46:14 | and the important stuff looks like what outcome do we expect how other people feel |
---|
0:46:19 | this feeling thing is not just because we're nice people and we care about what other people feel it's because |
---|
0:46:24 | it gives you a really good idea of what they're gonna do next which match |
---|
0:46:28 | a what we like |
---|
0:46:30 | and of course what's gonna happen the by |
---|
0:46:32 | again actually the whole sequence |
---|
0:46:35 | nobody was that the child was not a it's something about how good probably in baby carriers can be |
---|
0:46:41 | but it's a lot |
---|
0:46:57 | and the trying times |
---|
0:46:59 | but as i wouldn't show that if the child been but it's quite a well known that |
---|
0:47:03 | the baby carrier and it upside down and was pushed along the child was annoyed but not seriously damage |
---|
0:47:10 | if you look at this your ability to predict the behaviour of that one could just nearly threw herself in |
---|
0:47:17 | front of the train |
---|
0:47:18 | it's pretty good she's gonna react in kind of a strange way of the next ten |
---|
0:47:23 | what you don't is you look at this you identify what simple shape well what we're going to notice this |
---|
0:47:29 | guy because he is an important |
---|
0:47:31 | and they build a little narrative around it and they focus on the |
---|
0:47:36 | we don't know how to do that we are trying to but we don't know how to do that yeah |
---|
0:47:39 | so carol some of the two crews would something crucial open questions as well as we move towards the end |
---|
0:47:46 | one is dataset by |
---|
0:47:48 | so |
---|
0:47:49 | i distinctive feature vision is that frequencies in data |
---|
0:47:53 | misrepresent applications |
---|
0:47:55 | for a whole bunch of reasons the labels are wrong |
---|
0:47:58 | the things that are chosen to get labelled a not uniform people collect things in very specific ways |
---|
0:48:05 | and this is not a chart nobody goes out there and does we could things with data collection but it's |
---|
0:48:10 | a real issue |
---|
0:48:12 | so the bias is pervasive and we know it's a big deal envision datasets "'cause" and tanya to rub an |
---|
0:48:18 | eyelash on your staff russ produced this wonderful paper this yeah |
---|
0:48:22 | proved a good classifier can tell which dataset and image come |
---|
0:48:28 | which is very scary news in the |
---|
0:48:31 | and a smart image of the smart vision research you can do it very quickly so you have a little |
---|
0:48:35 | text there the pictures which dataset doesn't come from people run about sixty to seventy percent classifiers are a little |
---|
0:48:41 | bit weak |
---|
0:48:43 | size doesn't make by scale way |
---|
0:48:45 | if you get a really big dataset that doesn't mean it's an unbiased dataset and it might make it worse |
---|
0:48:51 | because you might become |
---|
0:48:52 | so if you look at the he when i collected these pictures from google had you not twenty three million |
---|
0:48:59 | pictures of lines here are the top |
---|
0:49:02 | i don't know however many |
---|
0:49:04 | and you might think they're unbiased but have a close look so the kinds of things you could deduce from |
---|
0:49:09 | these pictures all the lines right of course is fairly or |
---|
0:49:13 | there were two pictures of lines on horseback |
---|
0:49:15 | there's a line lying down with the lamb |
---|
0:49:17 | there's another one |
---|
0:49:19 | putting a having a person and putting a hand on it |
---|
0:49:23 | and is aligned with i'm that way |
---|
0:49:25 | that's on the first |
---|
0:49:27 | so if you use that as your resource of online information you'd be in serious trouble that just not long |
---|
0:49:35 | this is an effective territorial bias people are more interested in with pictures of lines in |
---|
0:49:41 | a common ones |
---|
0:49:43 | the problem is this blows huge holes in what we know about machine learning |
---|
0:49:47 | so machine learning is based on a form of induction that's is the future is going to be like the |
---|
0:49:52 | pot |
---|
0:49:54 | in if you can't make the future like the cost then you've got a problem |
---|
0:49:58 | and current machinery just doesn't sort of go to this |
---|
0:50:01 | this place |
---|
0:50:04 | this is good reason to believe that this issue is pervasive in object recognition that the world cannot be like |
---|
0:50:10 | the training dataset because many things already that's why unfamiliar things a common and we can deal with |
---|
0:50:15 | of course of many things a red then this exaggerates by |
---|
0:50:19 | so gang wang produced a little histogram |
---|
0:50:22 | that said okay all the objects in a marked up dataset that's common envision how many instances out |
---|
0:50:30 | and there's small number of objects that have you know four thousand five hundred instances also but very quickly you're |
---|
0:50:36 | down in that |
---|
0:50:38 | and after that most objects appear two or three times in this data set some most objects a right |
---|
0:50:44 | this is kind of should be a fairly familiar phenomena |
---|
0:50:46 | but it wasn't really an issue envisioned to re |
---|
0:50:50 | are several things you might do about bias |
---|
0:50:53 | you could think about appropriate feature representations and what i described about illumination invariance is one form of doing that |
---|
0:51:00 | if you're features are invariant to illumination then the fact that you're dataset is biased with illumination just doesn't |
---|
0:51:07 | another thing you might do is build appropriate intermediate representation |
---|
0:51:11 | so that those intermediate representations you might be able to make unbiased estimators of classifiers evens out of the objects |
---|
0:51:19 | the right |
---|
0:51:20 | and that's one way of interpreting this attribute |
---|
0:51:23 | on the other thing is if you have a good representations of things like geometry |
---|
0:51:28 | you just might be able to skate the effects of that set |
---|
0:51:34 | so i last conclusion and then we're almost done |
---|
0:51:39 | object recognition links to utility in complex ways that the not terribly well understood yet |
---|
0:51:44 | so |
---|
0:51:45 | biggest question in computer vision right now is what should we actually say about visual day |
---|
0:51:52 | a picture goes into the or a very goes into a recognition system question what should come out |
---|
0:51:59 | one answer is a list of everything that's in the picture that's a silly also the too many things in |
---|
0:52:03 | the picture |
---|
0:52:04 | if i look at this room in front of me it's silly to be describing the not on the bolt |
---|
0:52:09 | that holds the emergency X |
---|
0:52:12 | thing to get that still |
---|
0:52:14 | so that i could on so well a useful representation of reasonable size which is a lousy on so because |
---|
0:52:19 | we don't know what it means to be useful and we don't know how to make the size read |
---|
0:52:25 | it seems that object categories depend on utility |
---|
0:52:28 | so when i talked about that monkey |
---|
0:52:31 | or it could also be a plastic toy but the other category it can occupy is iran |
---|
0:52:37 | it really just doesn't matter no we're not that interested in it so why can't |
---|
0:52:41 | if you look at this little fellow who turned out in my doesn't breaking a bottle recently somebody pointed out |
---|
0:52:48 | that that's a be a bottle so you know you could think of that as a person or a child |
---|
0:52:52 | or be a drink |
---|
0:52:53 | or a be a drink each other a tourist or a hotline like a or an obstacle or potential the |
---|
0:52:58 | rights |
---|
0:52:59 | you know you see that you can write something right or around |
---|
0:53:03 | so just depending on what you're doing that object occupies a wide range of different potential categories |
---|
0:53:10 | so what i talked about suggests |
---|
0:53:13 | the emergence of and you believe space about object recognition with sort of a heading in this direction and it |
---|
0:53:18 | looks as though it's gonna be interesting when we get |
---|
0:53:21 | and the billy spaces look cadres are really flew |
---|
0:53:25 | they're opportunistic devices to a generalisation they're affected by your problems and buying utility |
---|
0:53:31 | things can belong to many cat |
---|
0:53:33 | some people would refer to this is a cellphone or is modified if i fling it into the audience it |
---|
0:53:38 | would turn into a project on the media |
---|
0:53:41 | and in fact the fact that it was a smart find would have nothing to do |
---|
0:53:45 | with whether it was a project |
---|
0:53:47 | so at the same time the same instance can belong to different categories of sorry at different times it can |
---|
0:53:52 | belong to different |
---|
0:53:54 | categories of shape when we talk about objects as being special within the category that's meaningful |
---|
0:54:01 | it's not like all birds of the same but |
---|
0:54:03 | some of the interesting because the missing tiles other the interesting because they have special fetters other birds |
---|
0:54:10 | alright thing "'cause" they're inside this room flying around we had to just before the talk |
---|
0:54:15 | many categories seem to be right |
---|
0:54:18 | and many characterisation's mike because |
---|
0:54:20 | unlike think about some things differently than you and if we don't talk about it is it really just doesn't |
---|
0:54:27 | and in turn that suggested recognition is |
---|
0:54:29 | it's not really just discrimination it's constantly coping with the unfamiliar |
---|
0:54:34 | in the presence of massive an unreasonable by |
---|
0:54:38 | and we need new tools and machinery to do |
---|
0:54:42 | so i'm done on what through my major points |
---|
0:54:45 | and it remains any to point out that if you want more information you can get it |
---|
0:54:50 | but if you if somebody tries to sell you the one with the brown colour than their appellant because that's |
---|
0:54:55 | the first addition and that's ten years i'll second edition appeared physically november |
---|
0:55:01 | so they do exist and they're around and its follow quite up to date information about the state of recognition |
---|
0:55:07 | and thanks to what i describes been supported by numerous agencies and organisations is including the office of naval research |
---|
0:55:15 | the national science foundation |
---|
0:55:18 | and we don't |
---|
0:55:36 | oh |
---|
0:55:37 | just a quick question about size |
---|
0:55:39 | so the issue when the person was misrecognized as a bottle |
---|
0:55:43 | or the issue you know this is a miss recognition when we go well that's something just the wrong scale |
---|
0:55:49 | so but i is size is really difficult to tell how big something |
---|
0:55:53 | oh okay |
---|
0:55:56 | with so many vision |
---|
0:55:58 | yes no the same |
---|
0:56:01 | we know that people are amazingly good at making |
---|
0:56:04 | so |
---|
0:56:06 | the main literature about this i |
---|
0:56:09 | that describing things that they get wrong in attendance at my house |
---|
0:56:14 | we don't know how they do it |
---|
0:56:15 | and we don't have methods right now envision that in computer vision |
---|
0:56:19 | that can do size estimate satisfactorily the |
---|
0:56:25 | it would one reasonable resolution to the personal model is you know but also just a lot smaller than people |
---|
0:56:32 | but how do you know how big the thing you see is |
---|
0:56:35 | in an absolute sense well wonder on some more but i look at some kind of big scale geometric context |
---|
0:56:42 | around that i use it to make some estimate of the camera and with things a and that tells me |
---|
0:56:48 | something about the size and if i get really gross size mismatches that i can say no that isn't gonna |
---|
0:56:53 | work |
---|
0:56:55 | yeah right now nobody can do in a satisfactory way i would regard that as something that sort of in |
---|
0:57:01 | the air coming |
---|
0:57:02 | wouldn't |
---|
0:57:05 | i would think in three or four years time we might do |
---|
0:57:08 | factor to size judgements moderately well |
---|
0:57:12 | more details are judgements i think is still very mysterious |
---|
0:57:15 | they do require putting together a whole bunch of contextual machinery because of the scaling effect of respect |
---|
0:57:22 | that looks like a small what'll in an image might just be a mess of a long way away so |
---|
0:57:26 | you need some notion of the space that it occupies |
---|
0:57:29 | and that's one of the attractions by the way you want to show you that fun movie of the things |
---|
0:57:34 | moving around in the room |
---|
0:57:35 | well the attractions that movie is |
---|
0:57:38 | when you have that degree of understanding of space you probably can make size protection |
---|
0:57:44 | and that you could use them to drive recognition but as far as i know there's nothing right yeah that's |
---|
0:57:54 | so i |
---|
0:57:55 | just |
---|
0:57:56 | i |
---|
0:57:57 | the sets are biased unfairly biased towards things that are interesting |
---|
0:58:02 | and i'm wondering why in computer vision we don't use the data sets as the vocabulary from which to describe |
---|
0:58:09 | do women |
---|
0:58:10 | the bias obviously or something that people are drawn to and it seems that the data itself |
---|
0:58:17 | could be the vocabulary which you describe yeah that is |
---|
0:58:21 | you describe an image in terms of its representation in this huge dataset |
---|
0:58:26 | so on |
---|
0:58:28 | i think that when i mean this is just a |
---|
0:58:31 | setting right because |
---|
0:58:35 | different agenda react to this very different |
---|
0:58:38 | so if you think about vision is something you computer vision is something you do when you stick a camera |
---|
0:58:43 | on your head and you will walk around well |
---|
0:58:45 | then the line that's it i showed you just or |
---|
0:58:48 | right but if you think about computer vision is something where what i do not use google images to interpret |
---|
0:58:54 | more |
---|
0:58:55 | the whole issue bias is just not an issue right because |
---|
0:58:59 | one is the first sample of the other |
---|
0:59:02 | the there is very little explicit writing about what you're referring to what is a lot of what that implicitly |
---|
0:59:09 | takes into account |
---|
0:59:11 | so much of what i've talked about in recognition actually |
---|
0:59:16 | in vol |
---|
0:59:18 | some interesting use of a common graphic |
---|
0:59:22 | which is a is a way of talking about what you're talking about |
---|
0:59:25 | we don't have a good enough and |
---|
0:59:28 | standing of that issue to be able to talk about |
---|
0:59:31 | clearly so you know the two kinds of convention one is the lines interest |
---|
0:59:36 | this one's got that one's riding a horse that solution |
---|
0:59:40 | and the other is we really tinted photograph lines handle |
---|
0:59:44 | you want C will that many pictures or |
---|
0:59:47 | you know a line photograph |
---|
0:59:49 | three quarters with the shoulder dominating action |
---|
0:59:52 | and it seems like one possible iconoclastic convention is different from another one |
---|
0:59:56 | one of them if you like is interesting this in terms of properties and also it's a semantic stuff |
---|
1:00:02 | and the other is |
---|
1:00:04 | characters |
---|
1:00:07 | just don't have the language to separate those rates and talk about them sensibly |
---|
1:00:11 | yeah again i think it's very much on the agenda because of these separating three |
---|
1:00:16 | you know if you really want to learn about the world from google actions |
---|
1:00:20 | you're gonna have trouble |
---|
1:00:21 | and |
---|
1:00:22 | we know that we don't really have |
---|
1:00:26 | so it's of a coincidence the but the best i can do so |
---|
1:00:32 | and a comment a comment on the what you said about the utility of a of what matters in a |
---|
1:00:38 | picture why did what matters in the picture depends on the utility at i'm of the view |
---|
1:00:43 | but yet it seems like when you gave any image today the image of the two girls |
---|
1:00:49 | to several people they came up with pretty much the same description so this seems to be a sort of |
---|
1:00:55 | that baseline utility which is sort of context independent was wondering if you could comment on that and i think |
---|
1:01:02 | you're right |
---|
1:01:03 | so there's a fair amount of experiment |
---|
1:01:07 | one |
---|
1:01:08 | people select dimension |
---|
1:01:11 | the situation is a little bit because |
---|
1:01:14 | it's how to do the experiments exactly right and it's to be precise |
---|
1:01:18 | but this some evidence suggests that kind of things that we dispose people to mention thing |
---|
1:01:23 | oh |
---|
1:01:24 | the really interested in people begin |
---|
1:01:27 | and you can explain that because people have the potential to affect you when you've got a right and left |
---|
1:01:32 | yes |
---|
1:01:33 | the sort of always interesting kind of baseline |
---|
1:01:38 | that thing is that all begin should tend to be mentioned |
---|
1:01:44 | i |
---|
1:01:44 | things that the unusual you know if you have a small rhinoceros in the downtown street view people are gonna |
---|
1:01:51 | say gee you don't see that very often therefore |
---|
1:01:54 | and that seem to be rough |
---|
1:01:56 | principles for baseline utility but i |
---|
1:02:01 | we do not again yet have |
---|
1:02:03 | class of understanding required to say well okay there's a baseline utility and then there's also component that's linked to |
---|
1:02:09 | the immediate task |
---|
1:02:11 | well i would guess that |
---|
1:02:13 | that's a situation |
---|
1:02:14 | if one wanted to take a very extreme point of view you could say |
---|
1:02:18 | the right way to division is with reinforcement learning because that's the white knight should it you just should every |
---|
1:02:24 | vision system in the head if it doesn't do everything right |
---|
1:02:27 | the downside of that one is it was a top notch an awfully long time |
---|
1:02:31 | and you know it's appealing open these utility issues and getting better understanding of the principal seem to be important |
---|
1:02:41 | again sorry surveillance the understanding |
---|
1:02:48 | question |
---|
1:02:49 | so i mean |
---|
1:02:51 | obviously we all kind of an hour had sort of comparing to how vision people do their stuff and how |
---|
1:02:56 | speech people do their stuff |
---|
1:02:58 | and the two things that's kinda make speech recognition work in my view at a very abstract level is one |
---|
1:03:05 | that we model how the various units that we're trying to recognise change in context for instance you know phones |
---|
1:03:13 | the pen |
---|
1:03:14 | rightly how the realised i'm on what other phones their car next to and then we really use in a |
---|
1:03:20 | massive way |
---|
1:03:21 | the this what you called joint modeling you know we model how phones occur together how words occur together how |
---|
1:03:28 | high level unix linux like topics and other linguistic units at various levels all interact and have a co-occurrence statistics |
---|
1:03:39 | that can inform the units would end them |
---|
1:03:42 | so this joint modeling that you just touched arms is really massively important for speech recognition |
---|
1:03:49 | and so these two aspects the modeling of how things change as a as a function of context and then |
---|
1:03:55 | modeling the context itself |
---|
1:03:57 | and it's statistics is the you see that as being find a |
---|
1:04:02 | till having a long ways to go or is it just not something that people that works as well in |
---|
1:04:07 | the in the vision domain what can you draw some comparisons there at your finger on a really nasty ms |
---|
1:04:13 | we know about context we've been talking about context since the eighties |
---|
1:04:18 | and then the question is sort of how what and why and under what circumstances and what you get the |
---|
1:04:24 | contextual statistics and all that jazz |
---|
1:04:26 | and there it is |
---|
1:04:29 | a tremendous amount of work on that topic |
---|
1:04:32 | the |
---|
1:04:35 | i guess a reasonable summary you is |
---|
1:04:39 | clever use or contextual information |
---|
1:04:43 | often improve |
---|
1:04:45 | i particular function just a little bit |
---|
1:04:48 | but there is no example in anyone knows what context just hits the issue out of the |
---|
1:04:53 | and i'm using the word context in the broadest possible sense of various kinds of co-occurrence to |
---|
1:05:00 | so the geometric stuff so for example you can you can make pedestrian detectors a little bit better by knowing |
---|
1:05:08 | about geometry and the little bit is what having like that's one person doesn't get run over or whatever |
---|
1:05:14 | but i know of no example envisioned with |
---|
1:05:19 | things get a lot better by heavy duty contextual information no you could argue that a bunch of what is |
---|
1:05:25 | and people do argue about two ways one argument you |
---|
1:05:29 | well use not using enough contextual information if you use much richer contextual models and more detail in a like |
---|
1:05:36 | things will get back to you if you feel it get back under the whole research programs based on that |
---|
1:05:41 | hypothesis |
---|
1:05:42 | the other arguments as well those elaborate structures |
---|
1:05:47 | become increasingly subject issues upon us issues of variance in estimation and all that jazz |
---|
1:05:53 | and basically what you when with one hand you lose with the other one and you sort of back where |
---|
1:05:58 | you want |
---|
1:05:59 | i would say the juries just count on this question it's very firmly on the agenda it's |
---|
1:06:04 | very aggressively study |
---|
1:06:07 | and my own and that would be contextual information really matters |
---|
1:06:12 | but it also really matters which contextual information you use and which you know |
---|
1:06:17 | and that's second choice is pretty |
---|
1:06:21 | we don't really have the machinery that says this is the good stuff this is the bad |
---|
1:06:27 | i |
---|
1:06:28 | one i know i it's not easy to sort of meaningfully contrast vision and speech the just different activities different |
---|
1:06:36 | communities at the different things |
---|
1:06:38 | but i would say |
---|
1:06:41 | we have a baffling leave rich selection of potential contexts to use |
---|
1:06:47 | everything from camera geometry to geometric context |
---|
1:06:51 | two special properties of texture all night or co-occurrence statistics of objects all objects seen co-occurrences and the like and |
---|
1:07:00 | one possible source of the difficulties we just don't know what to select on that |
---|
1:07:12 | i'm this to address the first you and jeff's where i |
---|
1:07:17 | the mechanism that i don't know if you heard jeff's talk yesterday morning on |
---|
1:07:25 | these segmental conditional random field right in the idea he's proposing which is you know basically to model you know |
---|
1:07:34 | speech at you know it's eight eighty basically the it incorporating information from multiple detectors |
---|
1:07:41 | using the segmental random fields i mean i actually don't know enough to know whether that was inspired by the |
---|
1:07:48 | vision waltz so and migrating to speech or vice versa but i was wondering of |
---|
1:07:54 | both of you know could comment as to what the commonalities you see between those two approaches |
---|
1:08:02 | and whether there is anything you know you think you might obscene in jeff's to upload jeff whether you see |
---|
1:08:09 | anything here you know based on what you're from david for some a little bit of cross pollination between the |
---|
1:08:17 | two areas |
---|
1:08:18 | so i think |
---|
1:08:19 | yeah |
---|
1:08:21 | and i guess jeff is next a microphone and i think from my perspective there are strong resonances and harmonies |
---|
1:08:28 | and one of "'em" year is an idea that's pervasive envision which is |
---|
1:08:33 | if you can call up a picture into pieces the mikes |
---|
1:08:38 | you can get |
---|
1:08:39 | much more information about the P |
---|
1:08:41 | because you got special support of which to cool features in lecture |
---|
1:08:46 | there |
---|
1:08:48 | i'd be |
---|
1:08:49 | most serious vision people believe that if you could do a good job oh |
---|
1:08:54 | coming up on it |
---|
1:08:56 | everything will get |
---|
1:08:58 | i are used with billy because there's no evidence to support that we |
---|
1:09:03 | and |
---|
1:09:05 | it's reasonable to say that the people who believe it simply say that all tested unsupported belief of the wrong |
---|
1:09:11 | statistic any so you know we sort of in a position where smart people think it should work out |
---|
1:09:18 | but right now none of the best |
---|
1:09:20 | detection or classification methods takes any account a special support or just look so the buttons as the whole |
---|
1:09:26 | i think that will try |
---|
1:09:29 | i will go to my grave believing that if it hasn't changed we've done something wrong and we'll come right |
---|
1:09:35 | later on |
---|
1:09:36 | but it hasn't changed yet and that's it's a very disturbing feature of the vision land |
---|
1:09:41 | so i think there's potential that but nobody's demonstrated yeah would be my reaction |
---|
1:09:46 | i don't i i've got the light in my lexicon see if that's just one oh yes |
---|
1:09:52 | i |
---|
1:09:53 | and yeah so i was i thought that was very interesting and that |
---|
1:09:57 | it was i think there are many points of commonality two things struck me one of them in was in |
---|
1:10:06 | addition case |
---|
1:10:07 | and it seemed that the attributes were much clearer or |
---|
1:10:11 | then we have been a speech case for example has that there's |
---|
1:10:16 | has wings has a geek has wheels |
---|
1:10:20 | those are high level attributes |
---|
1:10:23 | that we can sort of rat a lot |
---|
1:10:26 | just by thinking about the problem and i'm not sure that we have the same attributes |
---|
1:10:31 | available to us |
---|
1:10:33 | how looking at the spectrogram or the speech signal |
---|
1:10:37 | and the other thing that occurred to me was |
---|
1:10:40 | that perhaps in fishing case |
---|
1:10:43 | there's an interesting extension today S which were dealing with in interspeech case which has to do with the sequential |
---|
1:10:50 | aspect of thing |
---|
1:10:52 | for example if you're working instead of with a fixed image with the video where you have a sequence of |
---|
1:10:58 | scenes and you might wanna segment that i into segments using some of the attributes that exist within the segments |
---|
1:11:08 | so |
---|
1:11:09 | the |
---|
1:11:10 | responding to one |
---|
1:11:11 | this should discussion and |
---|
1:11:16 | what attributes |
---|
1:11:18 | in the niger talk like this one summarizes about that |
---|
1:11:22 | but |
---|
1:11:26 | it's easy to write down a couple of hundred |
---|
1:11:29 | it's not clear that they're independent of each other and it's not clear that covers the game by any manner |
---|
1:11:34 | we don't really have a story about what you do if you don't know what natural attributes |
---|
1:11:40 | the story currently the people use it is if you can come up with something that's discriminant of it's gonna |
---|
1:11:45 | be an attribute one way or another and what colour attribute going like |
---|
1:11:49 | but there there's actually |
---|
1:11:52 | a moderately interesting vision problem where we sort of know we don't have attributes and would like to and that |
---|
1:11:59 | question developing attributes for things which is hot to write down a list is a big deal for us and |
---|
1:12:06 | i think we can learn about it we would be pleased to learn |
---|
1:12:11 | time help segmentation |
---|
1:12:15 | but it |
---|
1:12:16 | again |
---|
1:12:17 | segment a special temporally segmented videos |
---|
1:12:21 | the |
---|
1:12:22 | doesn't seem to be much better anything we know how to do the non spatially to |
---|
1:12:27 | special temporally segmented videos |
---|
1:12:30 | people like |
---|
1:12:31 | i you know i'd say most of the serious people in vision believe that's because we're understanding something wrong |
---|
1:12:37 | but we don't know what it is and we don't know how to make fine |
---|
1:12:44 | just what you said is this section of the community that does believe in feature detectors like articulatory feature detectors |
---|
1:12:55 | he you know in terms of your whether i i'm not saying it's right or wrong but the there was |
---|
1:13:02 | that part of the community that look |
---|
1:13:04 | each recognition from that viewpoint which is a little more similar one thing i wasn't sure this is just a |
---|
1:13:10 | clarification then all that mike talk is in what menu produce these features i presume these are all are these |
---|
1:13:18 | yard features that are being produced that is either the idea or not there were these all soft decisions |
---|
1:13:24 | that it or extracted so is there like a set of ten billion possible things |
---|
1:13:31 | and is the probability that's thresholded or you make a decision here it's a potatoey as a septic tank et |
---|
1:13:38 | cetera et cetera |
---|
1:13:41 | well the nice thing about |
---|
1:13:42 | you like |
---|
1:13:44 | and you make a list or you know a bunch |
---|
1:13:48 | potential |
---|
1:13:49 | something about a paper about any combination |
---|
1:13:53 | usually what people do this is report |
---|
1:13:59 | one alternative |
---|
1:14:01 | you know it's a pedestrian a pedestrian use a cat not but there's |
---|
1:14:05 | a fair amount of interest in for example the top five |
---|
1:14:08 | rob a bunch applications where |
---|
1:14:11 | as long as you get a ranking that's good and you get the wrong thing plus the top ranking then |
---|
1:14:15 | you're okay and people are very interested in that one there's another class of activity which is look if i |
---|
1:14:22 | build these detectors i can actually think of the output |
---|
1:14:26 | as being features and what i'm gonna do is i'm gonna pretend on building detectors and then i'll look at |
---|
1:14:31 | the responses |
---|
1:14:33 | and pretend that the features and use them for completely different activity so essentially all the alternatives you describe appear |
---|
1:14:41 | in someone's paper somewhere |
---|
1:14:43 | and i wouldn't say there's any consensus about what the best thing is |
---|
1:14:47 | which is unfortunately not so you know you do this you're okay this is not really |
---|
1:14:56 | i difference between speech and our teams about images of that all the images that seem to be isn't dataset |
---|
1:15:03 | seem to be sort of high quality get images no one seems to post their crappy pictures on the web |
---|
1:15:08 | and so as well i have some of these techniques work when the pictures are |
---|
1:15:12 | poor quality blurring you're overexposed or under exposed |
---|
1:15:15 | "'cause" in speech we have a lot more of a sort of |
---|
1:15:19 | variability it seems like of quality which affects the performance of our system |
---|
1:15:26 | so |
---|
1:15:27 | i mean this is what was what was also with it |
---|
1:15:31 | i |
---|
1:15:33 | at the fc is there's an awful lot of pretty pictures and cruddy videos that i like that and often |
---|
1:15:38 | in on you two will reassure you want this point |
---|
1:15:41 | and some things a hot this |
---|
1:15:49 | we |
---|
1:15:52 | the things that mike feature computations |
---|
1:15:56 | a very |
---|
1:15:58 | the acoustical phenomena that mike |
---|
1:16:01 | you you're feature computations give them problems but there are some points of contact |
---|
1:16:07 | we benefit quite a lot from time so for example just one moderately good example if you're interested in human |
---|
1:16:17 | activity recognition |
---|
1:16:20 | if you think about things like soccer field |
---|
1:16:23 | a long view of soccer field with a player running across the field you really just contras all the arms |
---|
1:16:29 | and legs |
---|
1:16:29 | what you got motion blur to worry about the is about one pixel across anyhow it's just a minute |
---|
1:16:35 | but if you look over a more time scale you can get fairly good picture of what's going on what |
---|
1:16:41 | just looking at the sequence of pixels on the motion and pixels |
---|
1:16:45 | so i think |
---|
1:16:47 | some of the losses the resolution might not be as destructive as some of the acoustic effects that you encounter |
---|
1:16:53 | but i'm not sure that that's true |
---|
1:16:55 | there are a whole series envisioned the awe basically dead in the water as a result of |
---|
1:17:02 | it's reflections of light |
---|
1:17:05 | where i think yeah multipath acoustic distortion probably isn't the biggest thing in your life the other things to worry |
---|
1:17:10 | about |
---|
1:17:11 | so i it's and it depends kind of situation |
---|
1:17:15 | there's a lot of interest in low resolution pictures how agencies care about or for pictures that come out of |
---|
1:17:23 | forward looking infrared sensors for example |
---|
1:17:25 | for |
---|
1:17:26 | somewhat alarming reasons |
---|
1:17:33 | i |
---|
1:17:34 | yeah |
---|