0:00:13ends and multimedia signal processing
0:00:15uh i'm phil chow a
0:00:17i i the chair of the N and S P T C so i get to be the moderator today
0:00:22we have three panelists
0:00:24uh they are all from
0:00:26the sub committee on technical directions you know R T C
0:00:30a their responsibility is to look for that
0:00:33a new trend
0:00:34to make sure that T C remains
0:00:36a a relevant and vital
0:00:38uh
0:00:40and just go is the fact just the did that L i is the uh is the chair of that
0:00:44committee
0:00:45uh and his professor telecommunications at university of though
0:00:49is interests are and
0:00:50media retrieval and video analysis for behaviour understanding
0:00:54and among other things he was technical co chair of i set two thousand five and general it italy
0:01:00uh and since two thousand nine he's been coordinating a large european project on media retrieval
0:01:06a time box is a for professor
0:01:09and the department of electrical engineering and information technology
0:01:12at the technical university of munich
0:01:14"'cause" interests are any areas of audio haptic information processing and communication as well as network and other and interactive
0:01:22multimedia media systems
0:01:23among other things he was technical program coach
0:01:27for a workshop the and an S P
0:01:29workshop held and send a of france last year
0:01:32and then uh are we got he is uh associate professor in the department of electronic
0:01:38in the probably technical jury no in italy his interest are in compressed sensing remote sensing
0:01:44that resilience coding
0:01:46a distributed source coding and security for images and video
0:01:49and among other things he is technical co chair uh
0:01:52uh at the international conference on multimedia and expo i C any
0:01:56a thousand twelve in melbourne australia
0:02:00so uh
0:02:02the important thing about a panel in my opinion is to get a audience participation
0:02:06uh so we will leave some
0:02:07time for sure uh to have you interact with the panelists and poke them
0:02:12have separate pair you are
0:02:14questions and figure out which panelist you want to address
0:02:17um but i did want cover some basic ground so i ask each of that
0:02:21a for just about six minutes maximum
0:02:23each uh a and some of things that
0:02:26a compelling to an individually
0:02:28a and then will get to your
0:02:30your question
0:02:31so uh without for do what's
0:02:34go on to
0:02:36french
0:02:42and
0:02:43a a a a a time is that shot so i will have a just to two
0:02:47a weekly interviews uh some uh ideas but before but uh i want to tell you that the general idea
0:02:53that these be I his presentation
0:02:55uh that is uh
0:02:58a information of communication technologies but think that you got about the media
0:03:02a a more a more
0:03:03uh cost D C plenary and he's is probably that
0:03:06that a more and then trained of the last years
0:03:09um
0:03:10the D C it doesn't mean just uh
0:03:12uh between these C P source file
0:03:14uh i C D but also with the these P is that we are
0:03:19and that not use it to the a we if that's so for example
0:03:22a psychology or or or source of size is uh or cognitive size
0:03:26and uh we we should be aware that the
0:03:29a them uh we create the leading says synapses between is these different feels uh the more we can be
0:03:35in all that in the thing the future also in our uh are read about fields
0:03:38and that we give you some some uh
0:03:40um uh some hints about these uh the first one or a very to the field of uh a a
0:03:45a a media source three about
0:03:47uh we know that the signal processing community was able to provide the a great advances in the last years
0:03:53of providing in
0:03:54for example a a uh uh with descriptors for for this image is deals of music and all
0:04:00a and uh and also all
0:04:02uh more those uh ways matrix to make sure the that's also for
0:04:07uh this the and and the summit again
0:04:09but
0:04:10is that uh uh uh we need a much more for in the future if we want to solve the
0:04:15problem
0:04:16uh uh and the looking up the uh a most the research in the field the
0:04:21uh we can see that the the answer can be found the
0:04:24uh looking also to to do research as which are a and yep to house or even not so near
0:04:30to as uh and and of all of a for example a the representational knowledge
0:04:35uh so for example a out to represent the context of a uh which is a
0:04:39um uh
0:04:41and and to be
0:04:42lot of the possibilities to to where reach of all it's about the that
0:04:46uh but still uh there a low uh reliable more those for doing that
0:04:50um um
0:04:51a a the social norm to many people use working on the uh exploiting the knowledge contained anything social media
0:04:58for example is which all networks are two
0:05:00um i so you've that the gain information about the uh
0:05:04uh but they gain the uh uh a no with the made so solution is a variable you this field
0:05:09uh a i do we are but the uh a like this kind of course it's a to the uh
0:05:13a great techniques and will define so we don't use you more complex a more sophisticated more that's can account
0:05:20the
0:05:20uh these kind of information gain that allow
0:05:24um um solving problems like a scalar E or sort with the
0:05:29uh uh a media uh source of information like they the internet uh and and also dealing with their C
0:05:36so only be able not the find a near duplicates or similar image is but also
0:05:41uh image is a a or three deals or uh a music that represent the the call of the we
0:05:47have so in a very
0:05:48um i
0:05:50um diversified the a and and uh uh a a a for incomplete the way
0:05:55yeah a P K shall include the
0:05:57uh
0:05:58what media search engine or a uh uh a uh and and and other fields like for example more by
0:06:03sir or i and i i like
0:06:06you just some example that i don't have the time to come to be they'll about old of these the
0:06:11just let me make some uh but but went the uh uh is via say that we have a very
0:06:16large european project that that is dealing with this problem see few be it the site the
0:06:20uh W W that will do a couple out that the doctor you you can find additional information about the
0:06:27uh a other feel the of uh uh in the race the where
0:06:31uh in that beside that in the uh can be really
0:06:34the the way you of boosting the performance of core at is he's is that of uh a a a
0:06:38a um i'm get it is or
0:06:40a more recently a also called the a a a get the awareness less or called media i by roman
0:06:46so source also for
0:06:47that's in this means that we have a
0:06:49a lot of information available
0:06:51uh uh uh a a a a a we are
0:06:54but in the capture for the environment much more information on that a on a a a the information that
0:07:00you must an even perceive a wednesday in a in the environment but
0:07:05yeah about of these information is not able to do
0:07:09a but the get the between the knowledge at that you have of the word and that knowledge that the
0:07:13cease thing can have a for of the word
0:07:15again again or uh what what kind of uh a um uh a problems are open the and can be
0:07:21exploit is feel for example interaction between uh a a a people not just a understanding it is for example
0:07:28or but this painting get
0:07:30the view of C i i and this thing in the verse that they you are involved in more about
0:07:34one per of for example
0:07:36uh social bits and on
0:07:39um i understand the ml show
0:07:41same the P all of the people and and using price uh of course uh uh uh making it a
0:07:47is that the a a four or the or also
0:07:50but in in in that the you need models uh uh all very the rush shows so uh
0:07:54uh be of yours set so
0:07:57um
0:07:58well
0:08:01we are more or less a yeah eggs so i stiff the example of a a a a i like
0:08:04go to the uh last but
0:08:06but to which is interfacing interface ease is the last my so we have a a a a to feel
0:08:12the get between application which are a for very sophisticated the and the the user
0:08:17and are so in this case uh a a a a there are a lot of of all full but
0:08:21do you need is so for example uh for designing got the faces which are
0:08:25uh called that list up make sure uh a a uh uh a accessible and usable or
0:08:30uh and uh and can provide the also access to a showing the complex environment double or environment that all
0:08:37um
0:08:38have to very the get a a a a more complex them but
0:08:43the problem is stiff
0:08:44and uh
0:08:45but
0:08:46i i have to close
0:08:47uh uh what's makes a look at the
0:08:49a in the in the different life of but they are still there but
0:08:53uh the way of looking at the problem maybe is is changing from my
0:08:57a
0:08:59yeah
0:08:59sorry to russia
0:09:01time is
0:09:02uh
0:09:03a lot
0:09:08yeah thanks bill
0:09:10i want to add to to to trends to that's just discussion is of course my very biased personal opinion
0:09:15but these are
0:09:16that two current track
0:09:18so the first one dress as our quest for in most if could a communication
0:09:22and if you look at the past you as we seen tremendous progress in the area of tell a so
0:09:26system and this was was mainly driven by
0:09:29at and this in this play capturing device devices
0:09:32highly efficient or your video coding and the ever increasing uh and but from our communication networks
0:09:38but um if you look at these conversational services that we are run or but these tell a person and
0:09:42systems
0:09:43i but we consider them to be a a a P directional
0:09:47in fact if you look at the actual or audio short data communication it's really on the uni directional
0:09:52or whatever information flow was from left to right E and from right to left
0:09:56this more or less on relate
0:09:58i i i is that true only in of uh
0:10:01it communication require a stop are not only able to uh
0:10:04be present in a remote bottom environment
0:10:07but also that we are able to physically interact with objects or other people in this environment
0:10:12and this we can only
0:10:14meant to traditional or to and uh and sure
0:10:17um about the T used by the haptic modality which addresses dresses our sense of of touch
0:10:23J here
0:10:24the situation changes quite a to map because mouth the for what path and the back what pop and a
0:10:29longer independent
0:10:31but every information you send on on uh in one direction here
0:10:34has a direct impact on the information that flows back to you
0:10:38because the path a couple of for the environment or fruit the human
0:10:41and as many
0:10:43she
0:10:44and i but so that the and the ms P can to use an in a good position to address
0:10:49these challenges
0:10:50just a a a few as that might be interested for interesting for you to look at it
0:10:55it's the about up an M P three for have ticks yeah something that is exploiting the put set to
0:11:00a limitations of human haptic perception
0:11:02rather than what we are used to you and your
0:11:05that's to find objective quality metrics something that is to get a virtual in existing in a have the communication
0:11:12that's that the record and play back of a physical interaction session
0:11:16something that is fundamentally different from the easy recording and replay of audio and video
0:11:21but just to do with the couple of of the two path
0:11:23and other topics spread but were familiar with like our a control ever was so and C and set are
0:11:28and the them to to yeah have have to communication
0:11:33the second "'cause" trying and that what would like to mention he has my about or a localisation
0:11:38so basically the I
0:11:39to a couple of a local uh you location
0:11:42for which choose a be that you capture from a more about device that is equipped with a camera
0:11:47then to i would say the most promising approach in this eh
0:11:51is
0:11:52to infer your location by retrieving the most similar image
0:11:55from a to attack reference to that base so it boils down to a content based image retrieval problem
0:12:00but might say a content based image retrieval was something but has been studied extensively during the past menu first
0:12:06so what is it that the and the P community could contribute to that field
0:12:10yeah
0:12:11so that's take a
0:12:12a system perspective here
0:12:14and and of running such a system would be
0:12:17you extract salient features on your mobile terminal
0:12:20you by a compressed them you on mister of a farm in the network
0:12:23the prefer the actual image which tree will stop but um step in in on the server found use inference
0:12:29the back of visual words
0:12:31and try and location estimate to the mobile terminal
0:12:35if to look at the system perspective for the round trip time the output of compressed features
0:12:39uplink capacities is typically quite constraint
0:12:42and the process but in but F T
0:12:44we problem never be able to get this morning at a frame rate and which few location
0:12:49a a a twenty five
0:12:51um that's per second
0:12:52so to this community he could we up the system perspective
0:12:57and basically we consider the question
0:13:00a process should run at which point
0:13:02yeah what what is happening on the map about have an of what is happening in the network
0:13:06and what type of information is exchanged between the two D but to be able to get
0:13:11a push or can at a local solution
0:13:13uh i'm going to yeah we different
0:13:16but channel that the assumption so i'm that the at which is also it would be interesting to look
0:13:20that for must be community it is that now the query we in which that you have that you capture
0:13:26and the corresponding
0:13:28yeah
0:13:28i i for a or location
0:13:30that can be there sure but with different to just to illustrate that
0:13:34can to sit than in the names also required that is to kind of run most outside of our campus
0:13:39if you to it just what the see that chat was
0:13:42but cars but that's true
0:13:45to a sparse way
0:13:46we have uh a map out to buy so that might be motion blur
0:13:49the can image of the reference to that might be a recorded at different seasons went to and and spain
0:13:56um um
0:13:57the on the cover innovation
0:13:58that just in the scene might change
0:14:00so that two in just that match actually might be a to similar in most parts
0:14:05and you have to focus on the really similar parts
0:14:08different that
0:14:09oh some cooking
0:14:11a a program here
0:14:12i could the and the ms P and uh sessions i could identify if the papers that address this issue
0:14:19can be it after one in and the S P L two session
0:14:23and two already but presented us that day so i would say that is actually could quite some activity in
0:14:28the C of that with the like vector
0:14:30from
0:14:33thank you a card
0:14:43a so my presentation will be mostly focus on multimedia delivery
0:14:47i try to make a least of all D an important aspects for multimedia lever in in out of the
0:14:52lease is impressive because multimedia delivery actual response
0:14:56several different application domains some technology to make domains
0:14:59so we have a T V C for the signal presentations and network coding cross later
0:15:04optimization
0:15:05or or communication and working and H T T P and R to be for streaming
0:15:09several networks and several applications so
0:15:12for today i
0:15:14a couple of interesting things and so but i'll be talking about use mainly corporation be a P to few
0:15:19in working
0:15:20and had to the J at uh using uh and not that agenda to using H T T P stream
0:15:25so P to be peer has been around for some time now
0:15:29and essentially relates to exchanging information in the corporate way using a logical network connections are built on top of
0:15:36the physical network connection
0:15:38and a strong trend i'm see here that is going to significantly improve pure to pure performance is the design
0:15:43of better overlays to to look at the example of a left for example
0:15:47a a typical you the construction of an overlay use completely unaware all the underlying network topology and so peer
0:15:53might be connected to a not appear on the other side of the world and this increase still a and
0:15:58decrease performance
0:15:59well and it would be possible to use geographic graphic information to connect to use are close to each other
0:16:04you know to reduce the number of hops
0:16:07so
0:16:07that obviously proves the performance and
0:16:10this is the important thing and hours on going standardisation a force on that
0:16:14or the application layer traffic optimization or i'll to you know i E D S which is handled that
0:16:20and i to several essential use a database that takes as input a pair of to use
0:16:25and provides a put the set of metrics like band which and delay but allowed to estimate the each of
0:16:30the at the actual geographical be sounds between the two servers and the are
0:16:34more than that
0:16:35a P two P are also faces problems such fear reputation trust and security
0:16:40some of those problems of fun solution in the i'm source from the working area
0:16:45and so uh a a a a i i think interesting thing
0:16:49additional the social networking i us to peer to peer
0:16:52that's used as you know to help solve such problem
0:16:56have an important thing to use uh this is true for a to be a but also um P to
0:17:00P systems
0:17:01future systems should look at context information about the user such as a position direction and speed of the user
0:17:08because that information essentially defines a kind of informational content
0:17:12that we are interested in at a even time for example
0:17:16should take a train each turn uh try commute but not likely to be willing to watch a to our
0:17:21movie but rather something like but but a short content
0:17:24and that should be a to the search engines as well in such a way that's when we
0:17:29i get at least a but best matches for our query
0:17:32those best matches is are um
0:17:34um
0:17:34actually
0:17:35to to to you D current context of the user not just general preferences
0:17:40a the topic to to to be streaming means so for such a for for a long time multimedia delivery
0:17:45research has been focus on the use of the R T P protocol over U U P
0:17:49but the and had a proliferation of seven different networks like why max
0:17:53a three point five G but i five and so and and those networks are
0:17:58uh man it's by different for all but i was but to security problems of now have five males and
0:18:02five rows R
0:18:03let like to go to let U D P packet back as go through
0:18:07so this has given rise to a for research on H T T P stream
0:18:12just mark a different from a to P C you because in a typical the session
0:18:16you use to several that she's packets to the client
0:18:20we as as in H T P stream you use the kind that prince back it's from the server so
0:18:24that's
0:18:24that the different
0:18:25and could it's a cost to problems including for example you hand not at or control
0:18:30which is not so of is it more but it's client based
0:18:33so the the the have several challenges to to to H T to be stream
0:18:37we have to use that the it's to P stream systems are not quite as good performance wise yet
0:18:42uh with respect to conventional systems
0:18:45and if but if some proposed a new solution is that is it to demonstrate that is better than the
0:18:50existing solutions because it's difficult to perform live scale experiments on three sees
0:18:56a since it's so essential
0:18:59so and those a problem with multiple access where use or compared in for the same band
0:19:03"'cause" a user wants to get the maximum sure of the band and this can create
0:19:08problems like a a position all the available benny for time that has to be a
0:19:13and that's so its a a a a a problem of cross like optimization where
0:19:17yeah where for each T P to have a
0:19:20but i than the conventional T P three
0:19:23also so gives there are several standardisation you shoes like be and back dash for example
0:19:27and are interest
0:19:29so that a nice paper which just been published a couple of months ago on ieee into that computing
0:19:34that provides a nice introduction to these talk
0:19:36so these
0:19:38so so that in and there are many my and michael lease of touched upon some of them so
0:19:43my my will search
0:19:44because there are problems in multi applications of a cloud computing and comedy radio
0:19:50is the problem of quiet you the speech three and multimedia delivery and many more so that's just my
0:19:56a biased opinion only contract
0:19:58that's it
0:20:05so that we come to the fun part
0:20:07thank you very much
0:20:09a four
0:20:13i
0:20:14i your questions uh uh and uh i figured out who you who we want to address your questions to
0:20:20um
0:20:21would like to go for
0:20:23you could step up to the microphone are you could just raise your hand and
0:20:26she
0:20:36oh
0:20:42one of the challenges that you pointed out it is uh
0:20:45again gain again and found in all these applications
0:20:48is this an and you can't
0:20:50a he has been there for a lot of time
0:20:53so
0:20:54is a yeah and just and is if you don to to know what in the future we can
0:20:59at least three part of these K
0:21:01some of things that the
0:21:02such compute but
0:21:04so in the you models
0:21:05in the middle
0:21:06is a a a do a to sort of these is month again
0:21:10can have a you about yet
0:21:14is that are just anybody in any of them because is a a a a i think you calling problem
0:21:19when users in a loop first semantic
0:21:26a what but the or or at up and to to do that the uh in in in many possible
0:21:30way so uh uh uh not a uh a a in general uh what what they be a a a
0:21:35can be very in uh use of for the use uh is not too
0:21:39uh a single user in the loop but uh uh but
0:21:42the source show yeah the the the social show the the possibility of exploiting or the information contained it a
0:21:48social network is is a is a very problem arising this P
0:21:52um there are up then so of course a the big the big probably used these out to get the
0:21:57bills they them because the they that are not public of a
0:22:01hmmm
0:22:02a problems connected to the privacy of the the the so you can get the image of the like form
0:22:08but this the there is a lot of information for example the possibility of using dogs uh
0:22:13or or uh uh uh a site like a a we keep you get for example that that would be
0:22:17a relationship between feedings so uh and and all
0:22:20kind of information on that can be used at that
0:22:23in the statistical together stays the doing a each of the knowledge about the a about the B D a
0:22:27in general that and and call them
0:22:31you want to
0:22:33is
0:22:34is anybody used mechanical turk to uh
0:22:36do any multimedia processing
0:22:41oh so that's one way to put humans in the loop then
0:22:44i think i seen some uh some work in that direction
0:22:48other the questions are common
0:22:55right
0:23:01you see significant differences
0:23:03and the quality of
0:23:04lot is just blessed double the audible differences and haptic decoding
0:23:08the we normal and line users
0:23:11as a blind user that presumably using the
0:23:14sense a lot more
0:23:16and probably has no of first get used a haptic interfaces are a balance as the also that is as
0:23:21option
0:23:23yeah
0:23:24a a perfectly ripe there's lots of studies is that sure if
0:23:27but you know i thought up but one point for instance only to a loose their their side
0:23:32and so the first of the uh uh of a completely separate room or adaptation process
0:23:37and a rely more than on your haptic feedback for instance one gating
0:23:42and space since or or or you know interacting with with a
0:23:45and of course i think we we can see that with our work it's now with as they grow up
0:23:48with these touch display and to
0:23:51and the can you have already haptic interfaces that allow you to
0:23:54since for speed back by pensions a set on
0:23:57of of course when you grow up with that you can i have a very different at at you to
0:24:00put these types of signal
0:24:02for some of us and i'm not sure how many of you are in the room i have ever experienced
0:24:06haptic feedback
0:24:07to be not to many because we or
0:24:09quite similar
0:24:11but to the man is a natural to have that in in the can work for int
0:24:15so like
0:24:16i guess is a was like this next generation is much well much better prepared to do with these things
0:24:21and then we are
0:24:27and the other uh
0:24:28a more questions
0:24:36so i
0:24:37have one
0:24:38um
0:24:39so what is the um
0:24:41state state direction of work in a multimodal
0:24:44perception
0:24:46and its use in multimedia application
0:24:53it maybe i can say look at home
0:24:56to in the sense of what you be sure and and haptic
0:24:59and
0:25:00and to um
0:25:01because of the what's been known at least in the cycle cool
0:25:04a have the six were and it is you that if you have the sense of touch also being the
0:25:08true
0:25:09see
0:25:10but the degree of
0:25:12um
0:25:13i a few of to get the S
0:25:14that's one thing
0:25:16i a high degree of presence
0:25:18so you from a press in a virtual environment environment when the in a real environment
0:25:22but then there's also a cross modal effects
0:25:26but can exploit
0:25:27yeah
0:25:28is a to emissions been have a lot some the be input
0:25:33and the situation the maybe you men have realise for instance on the the sense of touch
0:25:38this really depends even with by an application on the specific state in
0:25:42just to an idea when you for instance approach
0:25:45a virtual environment with a virtual and if vector
0:25:48assume of
0:25:50oh have a tracking to of its the surface is something the use does
0:25:53you or
0:25:55as can touch with this sir first
0:25:57and the bush or a few bags and is not a member them and then
0:26:00and to experience the roughness and the you know the L as to city and and and and these kinds
0:26:06of characteristics of the surface
0:26:08takes a and if you would ask the you how you experience that environment no he would say normal i'm
0:26:13have lee experience
0:26:14think get so even with an application that can dramatically change from one time instant
0:26:19um um to another
0:26:21the as well so what score and um multimodal integration or perceptual learning
0:26:26were you basically learn to few
0:26:28modalities at different levels
0:26:30that's a level um
0:26:32since scene type of fusion but is happening
0:26:35but then does all so at the decision level
0:26:38uh and the kind of fusion
0:26:39but of this
0:26:40this particular a
0:26:42let's a observation is is is a and on the brochure sure but i have a contradicting stimulus in to
0:26:47have text
0:26:48and then it's very interesting how the plane actually decides what is really than the perception you have
0:26:53this is something that is the longer research or so in the new technology
0:26:57uh a main to find out
0:26:59it i'm the which seem to ancient which of the different modalities is really dominant and that might hold where
0:27:04or combined with all others
0:27:07think
0:27:13the questions
0:27:17does anyone working in multimedia networking
0:27:23no nobody's transmitting a media over networks
0:27:28oh now cloud computing anything
0:27:34okay so no questions for every cal
0:27:36i have
0:27:39uh i i have another question
0:27:41um
0:27:42i
0:27:43wondering at that
0:27:44a what's on the horizon in terms of
0:27:47a
0:27:48sensor is and also rendering
0:27:51devices
0:27:52that we ought to be aware of
0:27:57uh
0:28:03a a in sensors are really there's been a lot of interest on compressed sensing recently
0:28:08uh
0:28:09compressed sensing a lot
0:28:10a signals using random projections so that's a i mean
0:28:14a a simple and powerful way of representing signals
0:28:17so that's uh potentially interesting in the all sensor networks for example where
0:28:23or sensors are very power constraint you can take many computation
0:28:27i
0:28:27i be able to perform the compression on the sensor because you don't have
0:28:31that's
0:28:31a a computational power
0:28:33so in that area for example compressed sensing might help
0:28:37a
0:28:38construct a to be so can most for example
0:28:41that's it's are much uh easier to handle
0:28:43in terms of the processing than conventional cameras much less
0:28:46samples
0:28:47so much less memory computational requirements
0:28:50and so on
0:28:52and rendering
0:28:53anything
0:28:58the where new rendering devices
0:29:06okay any any um
0:29:07final don't questions
0:29:09is your opportunity
0:29:14i two so we've got a for thirty minutes uh a i hope that's uh and of some interest to
0:29:18you uh thanks very much for and