0:00:13each G for the past fourteen years the ieee present of the ieee jack it cool be signal processing metal
0:00:20the some portal or fun the refer to as the kill be metal
0:00:24it's what i you refers to as a major metal
0:00:28it was found about the ieee signal process society has been generously funded since its creation
0:00:34but the texas instruments company
0:00:36jack you'll be love to be in an engineer use was an inspiration to many do decided to pursue electrical
0:00:41engineering as a career
0:00:43well he is no longer an are males
0:00:45his influence contain
0:00:48as of already mentioned because it's a major metal
0:00:51the ieee jack he'll be signal processing metal is presented by of the ieee
0:00:56with the ieee medals and awards ceremony the will be held issue you're in august and san francisco california
0:01:03however it is been the practise
0:01:05of the metals down in society
0:01:07that would be us
0:01:08to present a special commemorative plaque
0:01:11from the society
0:01:12to the kill be metal B recipient
0:01:15this year the recipient of the two thousand eleven ieee jackie a skill we signal processing metal
0:01:21is in great do over shape
0:01:24professor over a is receiving this honour or for pioneering contributions and you theory and applications of wavelets
0:01:32and a filter
0:01:34in could not be with us today we she will receive the metal at the ieee tripoli medal ceremony in
0:01:39august in san francisco
0:01:42but i do recommend we go for a round of applause in it since yeah
0:01:54ladies and gentlemen that concludes today's ceremony i want to thank you all for attending the in the at triple
0:02:00you signal processing society
0:02:02order order B against than a that triple follows the recipient of the
0:02:07ieee james flanagan at speech processing metal and the research you know of the ieee tripoli jet kill be metal
0:02:15the so sorry hold as awards ceremony and really and i cast
0:02:19a look forward to seeing you again next year
0:02:22thanks again
0:02:36i john for something thing all these works
0:02:39and can grow to your workplace
0:02:41so this is closing that
0:02:42the official opening
0:02:44and sort money
0:02:45and that the and let me thank our musicians mister hour and mister of ski
0:02:50for making this a a more you and
0:02:55okay thank you
0:02:56we are on time
0:02:57it's miracle
0:02:59the we are moving on to the technical program of for conference
0:03:02the first one talk
0:03:04i will come on the code um uh are or for a speaker
0:03:08and a from a nokia
0:03:10and professor before cory one and from helsinki university of technology
0:03:14we will present or for a speaker
0:03:16and should six
0:03:19Q
0:03:21where is send them and uh i have a great place or to introduce the an our to speaker
0:03:27but that it really a in or was present or and that of market research and
0:03:32nokia research center break was to enable a new business opportunities for not
0:03:38yeah race the responsible for
0:03:40you you know we're white that first
0:03:42struck in the
0:03:43and work closely we or not give a a in to promote open in the base
0:03:49and working phone research is in collaboration with we we equal global research universities and
0:03:55is
0:03:56i believe that means that expense
0:03:58what of time not only in is of this but also at the airports and a
0:04:05and it well it's a T in computer science from the universal of think in one
0:04:09and he joint here in
0:04:11do got from for as the results go in software applications well
0:04:15is previous positions include working at a at and T you go uh and uh us a visiting research scientist
0:04:21at a so things
0:04:23where can going to the wells role
0:04:25called the also for email
0:04:28a or and cool or able or for more than the or as would a i of
0:04:32a a personal computer science social science statistics and holes like that
0:04:37some of these papers are are are very can the school of single produce or so
0:04:41including machine learning and and uh
0:04:44uh
0:04:47um
0:04:48okay okay
0:04:49is been also used to professor at to use C berkeley and stanford it works
0:04:54is other interest include a a a a a long board surfing and scroll boarding
0:04:58and i think that's why you like look in california would
0:05:03i that the crib leads to work in a and risky more much of a is and i enjoy every
0:05:08ladies and gentlemen of and it could be very oh do you know ahead
0:05:17okay
0:05:19good morning
0:05:20um those of you who are right from uh uh U S i know
0:05:24it's very early in the morning high i came from the your can the data from on but i'm
0:05:29uh like uh
0:05:31was mentioned i've spent a lot of times on the plane side actually not sure which time zone i'm
0:05:36and in
0:05:39um
0:05:40what i was thinking about
0:05:41what's the topic
0:05:43that i would like to touch to for sets are how would i say
0:05:47wide scope audience
0:05:49i chose data
0:05:51and they are multiple reasons of choosing data
0:05:54one was that i spend
0:05:56more than twenty five years some my life for searching for the medical data set
0:06:00that could prove that my method some better than the other
0:06:05um
0:06:06a i for charlie that search um
0:06:08of course
0:06:09uh was a a a a difficult one not only because of the maybe the problems with my methods
0:06:15but the access to data
0:06:18and um
0:06:20having worked long in the field
0:06:23and and and looking at multiple datasets i know that
0:06:26where we use of lee and
0:06:28in that type of a situation ease the fitting
0:06:32are methods to that particular datasets that we have a
0:06:36or generating synthetic datasets which we know that will have their own
0:06:40problems when when you use them some my feel was
0:06:43a lot of what you would call machine learning or
0:06:45information T legal learning or bayesian learning
0:06:49one of the reasons is i i was so on the bayesian learning as we will hear um
0:06:54later on during the conference was that as i had so little data at the bayesian
0:06:59let's some what's uh easier methods to apply than the frequent just methods
0:07:05so i decided to talk about data
0:07:07today
0:07:08just because i think it's a very common topic for
0:07:11most of the things that you see the con for
0:07:14uh
0:07:15but i'm taking a very different approach spend again a a lot of time of doing um
0:07:22the
0:07:23i writing papers on on on on the different that a mathematical properties of of learning
0:07:29and also implementing that different algorithms
0:07:32and today i'm and going back to my roots of hacking
0:07:36so i'm celebrating
0:07:37this year forty years of hacking
0:07:39which is a long time
0:07:41and and taking a little bit just an perspective as people see
0:07:46now so let's start
0:07:49uh the the title to make sense of as that of by twelve well might wonder what those green things
0:07:54that they are that the going it's not the warm gain the famous weren't gain that just to be in
0:07:58in in the old days and on the mobile phones
0:08:01so if you look at those uh green thing
0:08:05a new reveal something out of eight actually it nice a description of representation all a mobile device is that
0:08:12would be in a moving
0:08:14the cool
0:08:15going round that
0:08:17white
0:08:17yeah area are use the congress centre
0:08:20so those would be the sensors that would be driving around of roads in in the device sees uh
0:08:27that you carry with you
0:08:28so
0:08:29here really
0:08:30uh typically nowadays which why call mobile computers because some of can be just and this so everything is a
0:08:35computer to me
0:08:37now this is
0:08:38synthetic data
0:08:40as we say
0:08:41but that's
0:08:42zoom a little bit
0:08:45so this is real data
0:08:47and this is real data
0:08:49as collected by did not get devices
0:08:52in prague
0:08:55a by looking the navigation request
0:08:58oh the different devices
0:09:00in real time
0:09:01you know
0:09:02so pretty posing them on this plane with the court in eight
0:09:06and the timing
0:09:08i think it's see there is no matter
0:09:11and the name
0:09:13but after a while you see just thing developing nicely to that
0:09:17map of prop
0:09:22so
0:09:23if we go for are
0:09:28and assume i'll
0:09:30this is a similar picture
0:09:33of europe up
0:09:35as seen
0:09:36by billy in some cory is
0:09:38all the all be
0:09:40max
0:09:41navigation
0:09:42not just by the way there are some interesting cultural differences this speaker start mining this
0:09:47as the topic of day
0:09:49um
0:09:50the green areas in fronds and also in russia
0:09:53so that the uh the the those areas the people are more interested in navigation as opposed to seven your
0:10:00of what the point of interest meeting
0:10:02finding restaurants or
0:10:04uh those are much more calm
0:10:08so there's a multitude of day that that i've
0:10:10face in my
0:10:11current to life
0:10:13it's also an a i the be do prove each to be an and here where this they are
0:10:19and the C or or search of mine
0:10:21is
0:10:22starting to be fulfilled
0:10:24we have access to data
0:10:27in a a
0:10:29totally
0:10:30new way is
0:10:32to to to develop of computers to to develop and of a multiple
0:10:36different aspects that i that touched
0:10:40so first i wanna uh
0:10:42this costs
0:10:43would you
0:10:45some of the source is of this type of data that that is available to you
0:10:49i my papers is to challenge you
0:10:51later on to think about
0:10:54what do you need to do
0:10:56what do you need to think about when you in real life want to address
0:11:01datasets like this
0:11:02and you wise thing and and uh
0:11:05so of making sense of the state
0:11:12so first of all i showed you now
0:11:15data that was based on the G P S
0:11:18location information
0:11:20in a time code
0:11:22actually actually mobile device
0:11:25already already like some multiple the for and
0:11:28sensor data at the same time
0:11:30we don't that they have a X are meters
0:11:33yeah actually for me camera and audio is also a a a a a sensor
0:11:38so each of these sensors that you at that is
0:11:41this device
0:11:43that that now so prevalent
0:11:45adds a new layer
0:11:47and there um more or more sensors that are coming a in the different radios that we are adding to
0:11:52these devices
0:11:53you're a more or more
0:11:54a a sensor data that coming from the same source is
0:11:58but at E
0:11:59to do is uh uh base layer
0:12:02of of location and time
0:12:04why do why consider a location and time to be different than the ad
0:12:08because
0:12:09from my perspective
0:12:11location and time can dish
0:12:14the usefulness of a lot of the data
0:12:16so if you don't know your location
0:12:18many of the application for example like asking where the close
0:12:23beer
0:12:24part
0:12:25don't make very much sense
0:12:27so what some location and time of very fundamental
0:12:30but beyond that
0:12:32you can add more and more layers
0:12:34and you in up in these piles of data
0:12:38which of course
0:12:39if you collapse them
0:12:41shows how he man amount of data we have available
0:12:44already from this type of the don't
0:12:52that about it
0:12:55the
0:12:57well at least i it a part of my language but only row room full of geeks so we most
0:13:01of us are
0:13:02so excited about that the different numbers
0:13:05i i was always possible
0:13:08i i one studied mathematics that they are these different terms for the same numbers and and of four in
0:13:13particular could be just as we are particularly excited of using
0:13:17uh just power some two
0:13:19so we talk about set up
0:13:20now now set up what is an interesting number because it's the time um
0:13:25currently approximately this year
0:13:29although much of the data E is in fact copies of each or
0:13:33we are producing a one point to set up of digital information
0:13:38but it it this number also as any a large number
0:13:42tends to be uh
0:13:43difficult to to grasp
0:13:45so um
0:13:47it before sort of try do uh look at the different sources of this
0:13:51at let's try to sort of reflect the little bit what it is so uh
0:13:55okay
0:13:56it's
0:13:57approximately close enough to be six that we am
0:14:00ten to the power of twenty one so we have twenty one zeros
0:14:03but that doesn't tell us too much
0:14:05so uh typically E um
0:14:07if you want to just to get an idea of a big number or you should reflect it with uh
0:14:12some major that you know
0:14:15so let's
0:14:16say that one but
0:14:18of this would be one meter
0:14:19and a good question to you is that now okay if
0:14:22if we have
0:14:23set meters
0:14:26oh but this that's how long
0:14:29what that this that's speech if we start from here
0:14:32is it here from to moon
0:14:36here to to peter
0:14:40here to i'll for sent that already
0:14:44okay so those have very clever and fast with the uh and know something about astronomy
0:14:49would figure out that
0:14:50this
0:14:50is is that meters
0:14:52is actually uh
0:14:55the same as the diameter of the milky way
0:14:59approximately
0:15:01which is about hundred thousand like your
0:15:04a lot of it's a big number
0:15:06now i actually prefer somewhat the
0:15:09the that bit more mundane uh reference sees that uh you can find on on on the net that when
0:15:16we were a looking at the numbers
0:15:18so one is that
0:15:20set up by just amount of information if all the people on uh at
0:15:24would be to twenty four seven four hundred year
0:15:30or
0:15:31it would be like seventy five really and
0:15:35sixteen gigabytes
0:15:36i had
0:15:38for a of the data
0:15:40which actually fills at four times
0:15:42the more line time
0:15:46uh but my favourite he's that if you like T V shows
0:15:49it would be watching the the you know the he
0:15:52C uh series or actual the first is sort of series
0:15:56all the T V series twenty four
0:15:58uh a four hundred and twenty five million years
0:16:01continuous
0:16:04talk about the sort of a
0:16:06getting bored a bit probably
0:16:09someone that
0:16:10okay
0:16:11now um
0:16:13the standard answer when a
0:16:15what we look at this type of a large dataset is that hey
0:16:20we should be using a approximations we should be using sampling we should be doing you know
0:16:26not exact things we we should be somehow
0:16:29you know many lady
0:16:31uh the the data set and then is that is correct
0:16:34it's actually very old idea um in this scare out have lights this is a very old tab
0:16:41eight thousand years old it
0:16:42it's the babylonian tablet
0:16:44that in fact
0:16:46shows
0:16:47a a approximation
0:16:49all the uh the square root
0:16:51a a a a a a unique
0:16:52uh square as sorry at the diagonal don't our uh
0:16:56although a unit square
0:16:58uh
0:16:58then
0:16:59that allows us
0:17:01in fact
0:17:01to do
0:17:02uh
0:17:03a square with calculation
0:17:05a for construction and
0:17:07and and complex
0:17:09so also to approximate measures sum of course of very all thing
0:17:14now unfortunately
0:17:16uh approach in a measures also as we know lose information
0:17:20set K C that is perfectly fine
0:17:23certain cases
0:17:24uh it cost is by sees that we know what will be very hard
0:17:29but the amount of data that we are talking about today
0:17:33by four
0:17:34requires as
0:17:35to go to these approximate method of course
0:17:39oh good have
0:17:40like was pressing the wrong but
0:17:43so let's look at a little bit about the source C east of the data now
0:17:47at this is a
0:17:48different picture than the previous ones
0:17:51because this is not showing the absolute capacity
0:17:55it is still the relative capacity of the type of the data that that is available for you go
0:18:01so
0:18:04in the old days when i remember one computer networks started
0:18:08the remember forty years of hacking
0:18:10one time
0:18:11there's a lot of F T P traffic going on
0:18:15E became very popular in the early eighties
0:18:19uh there was something like telnet i don't know how many
0:18:22remember or anything like that
0:18:24and
0:18:25but if T P was by far did don't mean a uh D that that was available meeting file transfer
0:18:31from one place do not
0:18:33that's to nineteen ninety
0:18:37ninety only nineteen nineties
0:18:38we all know um
0:18:40the one of the
0:18:42still annoying fact to the computer scientist that the physicist
0:18:45so the introduced the the H U T P protocol and and the way but it was not the computer
0:18:50this
0:18:51should
0:18:52and
0:18:53the web was born
0:18:54as only as you can see
0:18:58the
0:18:59if T P part
0:19:00start
0:19:02diminishing diminishing proportionally remember
0:19:04this is a proportional of the act up some will not a you know a amounts are going up all
0:19:08the time
0:19:09and to way easy grabbing a more and more she
0:19:13newsgroups are pretty happy
0:19:14that tell let this sort of disappearing
0:19:17email keeps it sort of a constant
0:19:19and in that
0:19:22if you go further
0:19:24to dine T five
0:19:25where has already captured half of the traffic
0:19:29and you see in the upper corner or something interesting
0:19:32like data
0:19:34appearing from individual
0:19:37peer-to-peer communication of the computer
0:19:40which didn't used to be a case
0:19:42in the past
0:19:43because we only had this
0:19:45few mainframes frames go
0:19:48and if we go even further
0:19:53to two thousand
0:19:55we did see that the video
0:19:59and video information
0:20:01starts to grab a larger and larger here
0:20:05oh the digital traffic
0:20:06now way be still strong
0:20:09but we D L the the purple part
0:20:12uh ease
0:20:13ease
0:20:14just
0:20:15morning
0:20:15there at the corner
0:20:18peer-to-peer
0:20:19and web dominating
0:20:20but
0:20:21it's scroll very fast
0:20:23so if we go no further
0:20:25of course to two thousand five you see that the video is getting bigger
0:20:30and um i don't know what a a because of the various different type of legally shoes another is used
0:20:36the the uh the sort of percent each of a peer to peer
0:20:39anyway not growing anymore to saying way and way is going down
0:20:43and if you a ripe to two thousand ten
0:20:46one could argue now that one the most important and interesting data sources that we and yeah
0:20:53is the video
0:20:55and nice there is no sign of it
0:20:58going down at the moment
0:20:59it actually
0:21:00if you believe just go
0:21:02which of course
0:21:03uh can be a little biased you
0:21:05the video will be so dominating at in the next couple of years on the network traffic that it will
0:21:10be the majority of
0:21:12and when
0:21:13is in fact going down
0:21:14proportion
0:21:16which is quite obvious because you think about
0:21:19the bits
0:21:20required from of the normal web back
0:21:24so when
0:21:25the quality
0:21:27sort that type of data
0:21:29that that he's he's
0:21:31moving around in these networks
0:21:33is obviously you video data
0:21:35so a lot of the things
0:21:37that i used to be interested in which where relate with
0:21:40sort of pattern matching in text
0:21:43or some stuff in the files of music
0:21:46is actually replace now in in stress to do this type of the mining
0:21:51or a a processing of video
0:21:55that that was the web
0:21:59now
0:22:00a king about where i just one of major something which it because it touched me so much in a
0:22:05related to well sort talk about video
0:22:08i wanna touch something um that a three D really captured my heart in in march in long beach ten
0:22:15this was there raw is
0:22:17from mit media lab
0:22:19where on capturing ninety thousand hours of video all he's child growing up
0:22:26and mining that we deal
0:22:27uh in such a way that for example he could show
0:22:30in the speed up manner
0:22:32or the development of the work
0:22:35water
0:22:37in he a you know in the the the language development of the time
0:22:41again a unique experiment
0:22:44but related to topic
0:22:46even more interestingly he's company as blue then
0:22:50is working
0:22:51and and delivering a
0:22:54they are sort of a uh and now it takes or visualization
0:22:59all uh both of the tv V broadcast real at with the social networking track
0:23:06basically linking
0:23:08something that is
0:23:10shown on the T V on the discussion
0:23:13that you have one on the net
0:23:21okay
0:23:23the little but different domain as a reference of this is the large had collider for those of you who
0:23:28have not seen that
0:23:30right picture
0:23:31um
0:23:33i do remember
0:23:34that basically when the large had drunk lighter in and the data read was planned
0:23:41there was a lot of talk about the capacity capacities now first of all a happened collider
0:23:46has a hundred and fifty million sensors
0:23:49so that's a lot of sensors an and we all know of course that these sensors are also proved using
0:23:53then the data
0:23:54with that in yeah
0:23:55so of very rapid speech
0:23:58so uh
0:24:00the actual
0:24:01approximate uh
0:24:03so the bound of data with you about in this a structure used one paid the bite per second
0:24:09i do remember that the original specs when we started doing the data rate i i i of was there
0:24:13in the in this huge european union
0:24:16consort you which has its benefits and a normal of your opinion and
0:24:20"'cause" so a a is
0:24:22uh we were talking about four point five that are bytes per second so
0:24:25the the the bottom some got up
0:24:28okay
0:24:29hundred fifty million sensors
0:24:30cool so
0:24:31this is the physics six experiments
0:24:33this is the science
0:24:35big science you know
0:24:37what does it do with you know the regular well or whatever it is this that very special device
0:24:42expensive device put some
0:24:45that that us come back this is now a picture of all of the whole well
0:24:50related to the picture use of already and you know prod your up
0:24:55um
0:24:55the the different uh core
0:24:58on
0:24:59on the on the navigation this based on twenty billion court
0:25:03now
0:25:04remember what i showed you earlier
0:25:06this a button one point two billion a devices currently on a that that of course and the number of
0:25:12a mobile device is is a four point
0:25:15or more than four billion
0:25:17if each of these one point to build and device has ten sensor
0:25:23it's ten
0:25:24more than ten
0:25:25bill
0:25:26sensors
0:25:28these ten billion sensors
0:25:30although they don't feet
0:25:32the sis that
0:25:33with the same speed than a large how drum or would be
0:25:36are still is
0:25:37super substantial amount of
0:25:39data that is available for
0:25:43and this is
0:25:44i'm not talking about the future somewhere
0:25:47i'm talking about
0:25:49the actual
0:25:50today
0:25:52not saying that all that sensor information is now at where collect the in one place
0:25:58but it really really really
0:26:01ease shown the potential and the different
0:26:04uh
0:26:05past that we have in the fit
0:26:08i mean the different types of sensors in this mobile about computers
0:26:13and i mentioned the sensors that are relay a with the a lot of the user in the phase or
0:26:18or or very different types of a
0:26:20uh uh uh uh a sort of uh positioning and so on
0:26:24but this an interesting you
0:26:26uh source
0:26:28that at
0:26:29to this sensor
0:26:31wall
0:26:32and that's the cognitive radio i so that the of some papers in a common give radio in this conference
0:26:36as use one
0:26:38to just want to point out that from this
0:26:39that about uh perspective for those also you by the way to calm radio used in the
0:26:44then and make a a location of the radio spectrum
0:26:48uh in such a way that the device itself can actually choose
0:26:52which part of the spectrum meet using
0:26:54uh a it's signal
0:26:56a transmission
0:26:57uh actually can be used for a out of things do and and a for the sensor at of
0:27:01so
0:27:02that put detailed
0:27:04introduction of county the radio will already bring that
0:27:08a again at new very interesting source of since information which is in the infrastructure itself
0:27:16so the traditional picture of having the device is talking to a
0:27:20power
0:27:22sell power and we'd already know something about the sting all strings that they can year
0:27:27uh a how to sell power can recognise of nice that the device
0:27:30is gonna change
0:27:32to a picture which is a much more mesh
0:27:35what a device are aware of each other's
0:27:39press sense
0:27:39or partially aware of each other's presence
0:27:42in different type a radius spec
0:27:44now these fingerprinting information ads and not the layer
0:27:48again
0:27:49which is inherent to the billy and
0:27:52mobile about computer infrastructure that we have
0:28:00well
0:28:02and that the source of data
0:28:03available for us all in that one is
0:28:06ease the social media
0:28:08there's currently about how nine hundred million social media users in the well
0:28:14of course
0:28:14if you look at that
0:28:16um that
0:28:17in principle means that there's is but something like one one point five billion of it's just social networks every
0:28:24day
0:28:24each of these base it's
0:28:26leaves a trace or a is a a operation
0:28:30and of course
0:28:32if you want do uh divide this we know what that a majority of this is coming from a single
0:28:37source
0:28:39place book
0:28:40seven hundred million currently
0:28:44but the important part here is that this about thirteen billion
0:28:48or more pieces of content axe
0:28:52by these users
0:28:53and this is the richest just context we have a one uh for for my because this is a uh
0:28:59as you know in face book
0:29:01or sort of information of all different types it is
0:29:04it is both image we is it's is a low eighties textual data
0:29:09E Ds a different lean C is sort of informative in a very different
0:29:15additional things that you can uh a of course
0:29:18C in the social space is that we have about sixty billion three
0:29:22expected in two thousand eleven
0:29:26and this sixty billion to
0:29:28and uh
0:29:29is
0:29:30still a growing number because we have a four hundred sixty thousand you tweeter are guns at a daily which
0:29:36by the is not to growth rate because
0:29:38there are also people
0:29:39that a drop that or accounts
0:29:41but still shows that
0:29:43the actual so the that population
0:29:46is growing
0:29:50and of course
0:29:51back to our favourite
0:29:52video
0:29:53that is that a lot of the traffic
0:29:57and in in the picture that showed you earlier
0:30:01comes from you two
0:30:02but uh of course in areas like in you dies states another it comes also from net flicks and so
0:30:08so thirteen million hours of video on
0:30:12in you to that doesn't look at very big number think about i was talking about the billions of there
0:30:17but these to remember that these videos
0:30:19are in fine it
0:30:21snippets or so
0:30:22so that two these thirteen billion our
0:30:25is much more
0:30:27uh in the number of videos that we have a below
0:30:30um G
0:30:31do
0:30:33pushed on on but
0:30:36so that million hours of video a it to you
0:30:41thirty five hours a new video uploaded per
0:30:44so those are you working on video mining
0:30:47you have
0:30:47great future
0:30:49and no head of
0:30:50now i'm i'm an optimistic and possibly person so i like this thing when all my life that things go
0:30:56up and the are upper right corner
0:30:58i like things growing i like things becoming more challenging i like things become fast there
0:31:04small
0:31:05bigger and so
0:31:08what's the problem
0:31:10well the problem is that
0:31:13as opposed to you know
0:31:15having this thing on the paper
0:31:19what as to form a
0:31:21or even ask calculation in your machine
0:31:24we're talking about real systems you
0:31:28and this data that exist somewhere
0:31:31we need to access it we need do you hand the like and if you want to make use of
0:31:36a
0:31:36we need to be build systems that that that sort of a a a a a able to
0:31:41do or what but
0:31:43what happens if you are not careful of building this just a
0:31:48that's a crack
0:31:50this
0:31:52actually uh the the the cover is from an older days but it to tall some ten we know that
0:31:57that the one some point
0:32:01decline in new york stock exchange in in a very short period of time
0:32:06which actually result of a complex
0:32:08a cohort it
0:32:10yeah a computer software
0:32:12that where of course doing uh uh uh what they are supposed to do they are competing on the market
0:32:18in the super whom human human speech
0:32:21the available data that they have
0:32:23making it looking at weak signals and in sort of a a
0:32:26uh a sort of a a or a the crash
0:32:29so like i always point out
0:32:32it's nice to write a paper
0:32:35then have a good learning out
0:32:37and a good predictive model
0:32:39a bit to be
0:32:40much more certain and when you start applying that
0:32:43in the real well and you course
0:32:46some interventions in the real world
0:32:48the two very different
0:32:52so what i want to talk about i was talking about the what now
0:32:56what is the data available
0:32:58i would like the little bit touch
0:33:00how and why wide what want to do it for this audience i mean i'm not talking in operating system
0:33:04conference i'm not talking in a networking conference on not talking about
0:33:08uh the people even in my formal of those databases database
0:33:12community
0:33:15when i was a can be just signed this undergrad
0:33:18i've as support about the memory computing trade
0:33:22and a little bit later i was told about
0:33:24the paging and you know virtual memory and and somewhat asked
0:33:29my point is that
0:33:32we are now
0:33:33unfortunately in a different architecture
0:33:37we are using a difference just a and when we are writing our algorithm
0:33:41and when we are running them
0:33:43we need to take into account
0:33:45to a a to greedy D tells which are beyond the turing machine yeah even beyond the for neumann
0:33:51sort of traditional computer model
0:33:54we need to look at that aspect if we really wanna work in the real
0:33:58with this data
0:34:01uh this a be yeah
0:34:02between
0:34:04what is the practise shouldn't traits correct
0:34:07in in in in in menu plating in of so called internet companies for example for the data
0:34:13and the work we do and the very at once work we do
0:34:16in a sophisticated methods of understanding that
0:34:20sometimes these things
0:34:22the gap is smaller
0:34:23sometimes it's much lot
0:34:25what a talk about
0:34:27a a and you with
0:34:28how a all the important aspects of this just then
0:34:33that we should take into account
0:34:35what we are writing our algorithm
0:34:36when we are
0:34:37you know building them for that the by well
0:34:40not building them for the
0:34:42uh
0:34:43well like my favourite was the or dataset that a over fit it so badly
0:34:49and list the city
0:34:50one of the things that has changed
0:34:52dramatic is that
0:34:54when we you building things we don't need
0:34:56bill them
0:34:57a in such a way that we have to uh make sure that the maximum requirement is somehow con
0:35:05because in the old days when you had a computer it had a certain amount of uh computing power
0:35:10said amount of memory
0:35:12and was a box some
0:35:14no L this C is an should
0:35:16of course that has sort of sneak in with clout
0:35:21so L L this T allows you to use dynamic to computation power or
0:35:26of a larger or a in today
0:35:28is such a way that you don't have to
0:35:30uh uh so the the
0:35:33three D term mine you model the competing power do you use
0:35:37no this is a to to the same
0:35:39development of you had in the uh i guess a into that read data structure of course
0:35:43where one has
0:35:45six table set
0:35:48and then the dynamic to
0:35:50and you defined find a dynamic table and you don't have to care about the the that that you will
0:35:54ever go wild
0:35:56so this
0:35:58feature where you need more data
0:36:01and you just grad
0:36:02more of the competing power
0:36:04and then
0:36:05at the same time of course this is meaningful for only because you have multiple users at the same time
0:36:11sharing
0:36:12this particular pull
0:36:14you will goal uh to a much lower T
0:36:16so in some sense this ls this city
0:36:19has um
0:36:22allowed us to do uh things which were not add all
0:36:26uh feasible in in the past
0:36:29it was not that long ago i think about
0:36:32seven eight years ago
0:36:34we were running multinomial pca out
0:36:38a on on eight i if fixed cluster for a search and stuff actually doing sort of
0:36:44uh search engine uh a probabilistic modeling all the the work of course is in in in a hundred million
0:36:52documents or more
0:36:53and a typical run
0:36:55there was
0:36:56deep limited by competing power that we have
0:36:59so we we had to run like three weeks in a role
0:37:02in a small cluster together get a model of like to fifteen million
0:37:05or twenty male and documents
0:37:07those you not be a a a no pca
0:37:10or mode only piece you knows that it sort of a under you do something clever
0:37:13it's actually quite computationally intensive thing
0:37:16and and there was no way you doing any kind of dynamic things so all our experiments where
0:37:22where sort of restrictive by to computing power at the university which was not that great
0:37:26so we we had to work on
0:37:29but the the L to sit B for as a historical remark it's to an to you very old knowledge
0:37:35and
0:37:36my
0:37:37okay very uh quotes from the science fiction church or
0:37:41uh as many of the computer size yeah ideas have a are very only in church or
0:37:45is this
0:37:46first
0:37:48uh commercially sold
0:37:50um
0:37:52story by our to see clock
0:37:54uh from this thing size fiction called
0:37:57uh a a rescue party
0:38:00and rescue party
0:38:02most telling
0:38:03about
0:38:05at race
0:38:06a call uh how do already a place called powered or with the race some power or
0:38:11yeah had a collective mine
0:38:12and depending on the problem in the universe
0:38:16the that race
0:38:17collected more minds
0:38:19dynamically to solve the problem
0:38:21so but in that today if you had to read a really big problem
0:38:24telepathic connection
0:38:26in the different places and the unit thus allowed to solve much harder problem
0:38:31this is nine this was really really are
0:38:33so fifty so the uh a sort of
0:38:35or was a used how
0:38:37well some of the site fix at uh and a or to reflects of a
0:38:41of the future
0:38:43a second aspect
0:38:45well as a list this C you might think okay so it's the cloud stuff
0:38:49is the robustness are
0:38:51rocks this argument easy a very complex one uh because it's basically depend
0:38:58on one um
0:38:59what do you wanna do
0:39:01so this is nice result by eric rule or from cow
0:39:05and this that depending on
0:39:07which time men she's you are interested in
0:39:09meeting you're accessing some data
0:39:12and you want to do a consistent access
0:39:14or you should be able to access when partitions are allowed in the network
0:39:19or yeah data should be always available if need
0:39:23you concept set is five or all of these at the same time
0:39:28of the
0:39:29you can only go in in this
0:39:31different
0:39:31or warners or a different of borders of the trying is such a way that
0:39:36if you wanna do for example search
0:39:39uh you're actually very petition for and
0:39:42at you have certain type of consistency but not all things are always a
0:39:48or you bit or end where you don't care about consistency at all
0:39:52you just
0:39:53basically are doing
0:39:55uh transfers but
0:39:56you very tolerant for petitions as as those who want to legally
0:40:01imposed against the this type of a part of C no
0:40:04uh and things have very available
0:40:07and on the other hand in this
0:40:08just just do body data bases that many many years ago and is working on them the consistency and availability
0:40:14a very important
0:40:16but they were not very tolerant for partition
0:40:19now why this is important this is important because we have talking about highly to but it just of remember
0:40:24i was talking about one billion
0:40:26devices
0:40:27that does the sitting in different parts of the well
0:40:30yeah can make it by different make words
0:40:32a or actually allowing different types of but uh
0:40:36uh mouth functions and there is them
0:40:39there is no
0:40:40consist then everything use up all the time
0:40:43notion i all
0:40:44so when you building you out rhythm
0:40:46they cannot be based
0:40:49bone
0:40:49getting all the necessary information but
0:40:53by necessity is you of course i all assume that you already figured out that in this is that the
0:40:57by well
0:40:58the outward have to be on line H
0:41:01batch algorithms of taking
0:41:04you know this twenty billion queries and running them and doing it
0:41:07it usually not the way to go because the response times
0:41:11uh for the problems that you want saul
0:41:13i'm not
0:41:14uh these
0:41:16so we had a ct and robust
0:41:18but the
0:41:19the one thing that is so dear to my heart
0:41:22these energy
0:41:23so i
0:41:24i preached just every place i'd bin
0:41:26now for the last two years i'm preaching it here too
0:41:30and i'm pointing out if five where a do that
0:41:33i would go to this field of energy
0:41:36efficient compute
0:41:37so my argument is that the current architecture as we will see
0:41:42these fun the mentally
0:41:44imposing the similar even theoretical call uh sort of boundaries
0:41:49and to to to computing ask we used to do with memory
0:41:54and computing
0:41:55energy
0:41:57is this so
0:41:58so let's look at the real life situation why of things become difficult it's it's stick exam O
0:42:04or of a a a a processing that you
0:42:08you know
0:42:08normally you used to this type of about
0:42:10processing where you have a single box now we are in a well what we a multiple devices that i
0:42:16connected to each other
0:42:17and you have your how or it your great algorithm and doing the video stuff
0:42:22has a choice
0:42:24where to go
0:42:25where to execute
0:42:28to let to look about this because it this problem
0:42:31in particular because E a very import
0:42:36if you to could the it look at the experience domains
0:42:39it it's experience i ancient
0:42:41you can of course running everything in in in the device itself
0:42:45which is the you know the this type of that mobile computer you could do the video editing here
0:42:50you get the video you at to here
0:42:53uh you get and the display
0:42:55well we know that uh in the case of a larger things
0:42:59this will be very very very slow all the user experience will be pretty bad and then in addition there
0:43:05might be some other
0:43:06sort of a user experience issues that yeah but it's basically little
0:43:11of course we can do peer to peer so we can steal somebody's else computing interesting idea
0:43:17taking a little bit more computational power or from the neighbourhood
0:43:20to do the video eating
0:43:23fine
0:43:25or
0:43:25what we can of course to ease that we think all okay we have this
0:43:29yeah last the city up there
0:43:31we have the clout so we send the P deal
0:43:34uh to uh the the actual data
0:43:36up to the cloud to be processed there where you can the fast
0:43:40process of course there are you add now talk
0:43:43to transmit
0:43:45so in the user experience die main should you one of the things that you want to use lee
0:43:50from a user perspective
0:43:52to uh
0:43:53uh just optimized is the time
0:43:55it and can't use it doesn't care where it happens you just
0:43:58as soon that these meeting
0:44:00we are living in a very be a
0:44:05well
0:44:05this is the pure experience per spec
0:44:07if you look at the economics perspective
0:44:10oh this one
0:44:12again
0:44:13if you to everything in the device itself not is that i now divided in that's a way that the
0:44:17law were in use the device upper in these the cloud in in the middle this of transmission
0:44:22somebody have to pay for the clout
0:44:24so but in the the day it usually doesn't come from
0:44:27from a from nowhere
0:44:29so
0:44:29you have to you have to pay for that economics of the cloud
0:44:32but form a a a a again at a user perspective
0:44:37you also paying on the transmission data com
0:44:40in most places
0:44:42though that will also be expressed
0:44:45but then we come to the very fundamental question
0:44:49if it do
0:44:51the video at eating on the device
0:44:54you're running out of energy in the car and
0:44:57mode very fast and we all know already that that even taking pictures
0:45:01not even at thing on the video well run out of our batteries very
0:45:05so okay
0:45:06not a very
0:45:07not a very feasible thing
0:45:11so what about if we just
0:45:12put it on the clock
0:45:13so i mean then we don't run out all sort
0:45:16the next thing is do we can steal our neighbours energy
0:45:20doesn't make you very popular
0:45:22because the the person who want to make a call next time and doesn't have any energy anymore
0:45:27probably doesn't like very much the idea that uh uh uh he or she has support your
0:45:31video at T
0:45:34if you put go to the cloud side
0:45:37we all know the problems already that
0:45:39this is like a
0:45:41no free lunch situation
0:45:43the cloud
0:45:45server from use is also and it might be a again G
0:45:49that is not immediately effect you
0:45:51but as a total off
0:45:54somebody will have a problem
0:45:55with the in G and we're talking about a green data centres nowadays a lot
0:46:00and
0:46:00this also fundamental lame
0:46:03which is an interesting question that how much can you concentrate
0:46:07in in a one place because you see in there
0:46:10bottom uh approach
0:46:12you're introducing new energy source every time you're uh introducing a new device that does to computing
0:46:18in the cloud
0:46:20the low energy load grows linearly
0:46:23at the and uh of the cloud side with respect to the customers it has to sit
0:46:28but even more fundamentally
0:46:30this trade
0:46:32and for D in real life
0:46:34tends to be controlled by the fact that sending bits
0:46:38but much more energy than computing
0:46:41part it is because of the the coding in that we have
0:46:45you know we are so far from the
0:46:48uh channel limit made that basically uh we are essentially
0:46:52uh i have to put
0:46:53much more power than we would need to
0:46:55to send uh the but be to to correct the noise errors that we have
0:46:59but basically
0:47:01this balance is good to did date
0:47:04almost all of the the the data manipulation i've been talking to do
0:47:08the balance between what do you compute locally and you energy
0:47:13where do you put where you have the energy of here
0:47:16and then what's the transmission energy
0:47:19did you are doing if you think about rendering for example watch screens directly from the class
0:47:26and
0:47:27i wanted to point out that um
0:47:29academically
0:47:31energy a a fundamental thing
0:47:33in a G something you cannot chi
0:47:35so it's very appealing theoretically
0:47:37in real life systems
0:47:39the the economics and experience
0:47:42are the ones
0:47:44that really also dictate what will be uh used in pratt
0:47:51so my question
0:47:53i sort of problems that i one lee would you
0:47:56is that
0:47:57basically
0:47:59we know
0:48:00and that shown that we have multiple sources of data available
0:48:05the question is that
0:48:06what's to a kid had talk so how do we capture parse and analyse on them on the fly meaning
0:48:11that that one do things on wine
0:48:14and radically different source sees this is um
0:48:17uh you know the sense sorry fusion he's one term that people use
0:48:21and different
0:48:22commune his we use different terminology
0:48:24to more me just the head of is attributes of source of that
0:48:28the second question ease the architecture question
0:48:31what what like take to build these socket to that actually a robust but
0:48:35and have strong elastic properties
0:48:37how do you right go out it means such a way that
0:48:39and
0:48:40even if you running in the signal much
0:48:42i would write the running T with what one billion device
0:48:46cross the uh
0:48:47maybe with the cloud component
0:48:50the third question
0:48:51is that
0:48:52how do we tackle this energy efficient computing
0:48:56because in fact
0:48:58and
0:48:59like i say
0:49:00and from the theoretical perspective almost like a the perspective that you had this after a four and model
0:49:06you can look at energy as a
0:49:08the different component there and do a lot of analysis
0:49:12even a like we need to balance environment concerns but this is practicalities
0:49:17and the in user experience
0:49:19for this and just
0:49:20now
0:49:22why Y in or G so fundamental to me
0:49:25it's fundamental for a reason that this first time
0:49:29in in my lifetime
0:49:31we are reaching the levels that then i can argue that i can foresee service sees
0:49:36that had not be bill not because of the call reasons but because of the reason that we don't not
0:49:41have enough energy on yeah
0:49:43the runs at to
0:49:44six billion user service
0:49:47what a run on the device and what is run on on the on the back
0:49:52so this is what i wanted to leave you a today
0:49:55i just one to remind you that my
0:49:58medic search
0:49:59for the medical data sets
0:50:02that quest actually has been filled
0:50:04i have always think than things and and a well
0:50:08exciting me there
0:50:10lying there
0:50:10and now i see that about twenty five years of my life
0:50:15i haven't solved very many things related but this
0:50:18this is
0:50:19i've sold a lot of things would related but i don't think
0:50:22would you guys in a community with the great P H student is that i've have the privilege to work
0:50:26with
0:50:27but basically
0:50:29i guess said
0:50:30i now know
0:50:32that we are facing an air a
0:50:34so there are almost need
0:50:37to our exist then is that we as a research communities
0:50:40need to address in a different way
0:50:43okay
0:50:44so
0:50:44that's what i want to say to stop thank you very much for your attention
0:50:55i Q for very exciting presentation
0:50:58we we have a very
0:51:00challenging research problems to work on with a whole community here
0:51:05so problem
0:51:07in in them
0:51:09scientific uh uh uh four
0:51:11as
0:51:13for
0:51:14global
0:51:14a local uh
0:51:18energy consumption and problem of the whole or
0:51:21and uh we have a time for couples or questions so please
0:51:30okay
0:51:31as mike so
0:51:32excellent
0:51:33i
0:51:33very much
0:51:34oh
0:51:35two
0:51:36two we
0:51:37a
0:51:37so i know it's your
0:51:38or good thing of be important in this environment
0:51:41our privacy and security
0:51:43and so i
0:51:44i so the like
0:51:46my own close
0:51:48excess
0:51:48there
0:51:50have
0:51:51yeah
0:51:51for all network
0:51:53we work
0:51:54yes
0:51:55and
0:51:56i deliberately chose
0:51:57not the say the work privacy and security
0:52:00as somebody characterised rice me a long time ago that henry
0:52:04if we reset aspect you always doing is violating
0:52:08i
0:52:08C really people's privacy and that's of them
0:52:11and no i think it very seriously yeah i just decided to leave it for the question because i knew
0:52:16that the question what car
0:52:17and
0:52:21of course
0:52:22first of all
0:52:23at so the that they can very sign this perspective
0:52:26the more we get or information
0:52:28the more i can reverse engineer
0:52:30that's a fundamental that and i could even do it in the ways that
0:52:34people think it uh you know you you and and mice thing sell you at run
0:52:38noise but if you can predict the noise model you can reverse engineer a lot of stuff and you could
0:52:42do very complex things
0:52:44now
0:52:45for me first of what privacy is always a trade off it strike a reliability really
0:52:50this is certain pay all you get from something
0:52:53and certain cost that you have
0:52:54if the cost
0:52:55is higher than the pay off you should not do it so the cost for your privacy
0:52:59and and the aspect you should be able to first of all use and always be able to opt out
0:53:04that that's that's a the the first
0:53:07but the second i of point out that that learned that and never thought so much was that the or
0:53:11a lot of things where you could do a a or trained
0:53:15how to get and alice
0:53:16without violating any kind of privacy ask
0:53:19you basically
0:53:20at hating like this traffic
0:53:22sure sure the ear
0:53:23that does that just totally and no anonymous
0:53:25no idea who is their don't will the individual points are followed
0:53:29in a sequence
0:53:30yeah just point in the time in the data say you don't know that they coming from the same source
0:53:35you can still do a meaningful alice
0:53:37that's the first good
0:53:39second good nice for research community that the are privacy preserving make is
0:53:44that you can build
0:53:45in this
0:53:46yeah sort of put the traffic
0:53:48ways of handling this thing so it's a future research brought
0:53:52and
0:53:52a third question is that
0:53:56i usually do the channel see how many you guys in this room has actually to of your cookies
0:54:02i mean just to be very popular
0:54:05just to be very popular but also this some if fit of not turning of them more relays in this
0:54:09and that again
0:54:10so i E each very cultural location only what happens but i hope we you
0:54:15and and my main point always is
0:54:17uh and it is nice
0:54:19you should know
0:54:21and use should be able to opt out
0:54:24but if this it twice
0:54:25if you want to uh a a a a get some benefit out of that information that is available
0:54:29security ease of different ish
0:54:32i i think everybody shares the the the question of the security problems uh the concept of the security problems
0:54:38that we have
0:54:40and and D sub very complex is used again called zero even regular to issues
0:54:45that we face a different ways in your a
0:54:47a in us any nation
0:54:49my lattes are working and i'm working on growth economies to
0:54:53was than well and i can tell you that these issues a very different in different
0:54:56mark
0:54:57security uh protocols are a good research topic do i think the the privacy consent tends to be the higher
0:55:04one among the people a we know all the problems that currently are
0:55:08face books and google another side facing it is but don't
0:55:12i decided not to it over in for size this because i was talking from a size perspective
0:55:18and thus a recess perspective but we have to be of course it it just about that a a work
0:55:23we do to
0:55:26right right no short question
0:55:28and hopefully a ask