0:00:13 | oh |
---|
0:00:13 | welcome |
---|
0:00:15 | ladies and gentlemen to this |
---|
0:00:17 | experts session on trains in or are and acoustic signal processing |
---|
0:00:23 | and is |
---|
0:00:24 | the that so many of you came |
---|
0:00:27 | and thank you are but in advance for postponing a lunch break a bit |
---|
0:00:31 | um i hope to mount will make it interesting |
---|
0:00:34 | i i was just reason the fight that we could also use this opportunity what you need to to do |
---|
0:00:39 | some advertisement for our a T C which is the T C and |
---|
0:00:42 | or you an acoustic signal processing |
---|
0:00:44 | as i'm not to really prepared for this or page take the whole thing as advertisement |
---|
0:00:50 | for a our T C and whoever wants to get involved |
---|
0:00:53 | please contact us |
---|
0:00:55 | and |
---|
0:00:55 | there are various ways of getting involved in our activities |
---|
0:00:59 | and of course we first one would like to |
---|
0:01:01 | tell you about what this is |
---|
0:01:03 | so am |
---|
0:01:04 | i i i in my are role as a posture of this T C and |
---|
0:01:09 | i would like to process to to to experts which are also from our T C which present the |
---|
0:01:14 | acoustic signal processing community and the audio community |
---|
0:01:18 | uh uh a very specific and i think of very |
---|
0:01:21 | uh |
---|
0:01:21 | we now and way i would like first like to |
---|
0:01:24 | uh point to pet we can a plea please skunk step forward |
---|
0:01:28 | a that you can be C |
---|
0:01:30 | but we can a is |
---|
0:01:32 | the |
---|
0:01:33 | from the imperial college london |
---|
0:01:36 | and and i think uh is the most important thing about to right now is |
---|
0:01:40 | that he just recently "'cause" did did the first book on speech to reverberation |
---|
0:01:45 | a for everything and you might look at is that sides which has also very nice pictures like can |
---|
0:01:52 | and uh on the other hand i have not come |
---|
0:01:55 | with well known for |
---|
0:01:57 | the audio or and |
---|
0:01:59 | music especially community and co |
---|
0:02:02 | he's score course actually much beyond that |
---|
0:02:04 | he from a research |
---|
0:02:07 | uh i should not i forget to mention that actually path have ties |
---|
0:02:11 | to both words though |
---|
0:02:13 | not come is |
---|
0:02:14 | oh also teaching that stand for that and patrick also has |
---|
0:02:18 | and that's true nations |
---|
0:02:20 | so we that further do you i i would say uh i should stop |
---|
0:02:28 | well thanks very much for coming along to this uh session help is gonna be interesting to you |
---|
0:02:33 | um |
---|
0:02:34 | we um |
---|
0:02:36 | try to think about what you might expect from this kind of session |
---|
0:02:40 | and i have to say that's |
---|
0:02:42 | the idea of trends is a very personal thing |
---|
0:02:45 | so uh we can to present |
---|
0:02:47 | uh what we personally think uh hopefully interesting things |
---|
0:02:51 | but uh obviously in the time concerns we |
---|
0:02:54 | we can't cover everything so some of these things are like uh |
---|
0:02:58 | a easy to define like counting papers as a measure of activity |
---|
0:03:02 | or counting achievements maybe in terms of except papers rather than submitted by papers |
---|
0:03:07 | some of them are much less uh |
---|
0:03:09 | uh uh uh how do you own be list |
---|
0:03:12 | and uh that more uh uh soft the concepts but we try to to go around this we a little |
---|
0:03:17 | bit |
---|
0:03:18 | and see what we can find |
---|
0:03:21 | so the first thing we did was to look at the distribution of submissions to |
---|
0:03:25 | uh the transactions on uh audio speech and language processing |
---|
0:03:29 | and uh |
---|
0:03:30 | i the plot this out that's a lot of detail on this pie chart here |
---|
0:03:34 | but the thing to note from this |
---|
0:03:36 | is that there is some big |
---|
0:03:38 | uh subjects which are very active within a community in terms of the amount of effort |
---|
0:03:44 | going into them |
---|
0:03:45 | so speech enhancement is a big one and has been for a long time |
---|
0:03:50 | source separation continues to be very active |
---|
0:03:53 | uh we fat ica sessions he |
---|
0:03:55 | uh to icassp uh |
---|
0:03:58 | microphone array signal processing |
---|
0:04:00 | still very big and uh showing up something like thirteen percent of submissions |
---|
0:04:05 | a content based music processing that's just called it music processing |
---|
0:04:09 | music is huge for us now music is huge for us and continues to grow |
---|
0:04:15 | as race if not |
---|
0:04:17 | and um |
---|
0:04:18 | uh this is a a a uh real even lucien that we sing maybe even a revolution |
---|
0:04:23 | in our uh profile of activities is |
---|
0:04:26 | uh also we could look at audio analysis as a |
---|
0:04:29 | as a big topic |
---|
0:04:30 | the ones that i've highlighted they're are the ones that we can to try to focus on in this session |
---|
0:04:34 | as i mentioned we can't possibly focus on |
---|
0:04:37 | everything |
---|
0:04:39 | so that leads just to music |
---|
0:04:41 | so some music is um become very big here as as patrick mentioned and and this year at i cast |
---|
0:04:46 | there |
---|
0:04:47 | are three sessions as you can um see listed there |
---|
0:04:49 | there's a number of reasons i thought well worth highlighting just because the is in to see how the Q |
---|
0:04:53 | to develop |
---|
0:04:54 | um so the the reasons is that the you X which is how people describe would papers there many a |
---|
0:04:59 | describe described paper it's meeting |
---|
0:05:00 | to conference |
---|
0:05:01 | um was changed to include music as an absent so |
---|
0:05:05 | it's a rather bureaucratic |
---|
0:05:06 | um we same |
---|
0:05:08 | but it probably has a lot large much to do with the fact that there's some music papers now at |
---|
0:05:12 | icassp in M |
---|
0:05:13 | and was i think that's a good idea |
---|
0:05:15 | um a second reason is as a lot more content to work with um |
---|
0:05:18 | music six easy to work with as we you know we all own large collections |
---|
0:05:22 | um and and the third reason is is become a very commercially relevant in the last few years |
---|
0:05:27 | um so i tunes impact or are certain it's two examples |
---|
0:05:31 | of companies who are are making a a a large my money from |
---|
0:05:34 | from music um ideas |
---|
0:05:36 | um |
---|
0:05:37 | as the mention the the data is easy um we all have um large um C D collections |
---|
0:05:43 | and and |
---|
0:05:44 | one of the the things that |
---|
0:05:45 | that is difficult but music is a all copyrighted or all the stuff the wanna work with this operator |
---|
0:05:50 | yeah and one way that Q T out with this is by um doing a to some a talk what |
---|
0:05:55 | little bit |
---|
0:05:56 | but another way that that that you D has a as um work with these it is two |
---|
0:06:02 | create what's called the million song database |
---|
0:06:04 | um and the idea of this is to distribute features of the song not the not the actual |
---|
0:06:10 | copper the material |
---|
0:06:11 | and so um |
---|
0:06:13 | actual forget me if are i think it you a hundred features |
---|
0:06:16 | purse on and there over time to |
---|
0:06:18 | um |
---|
0:06:19 | and columbian an echo nist uh provide this database |
---|
0:06:22 | um at online |
---|
0:06:24 | and there's a of data there that that people when use and it's really available in it's a very large |
---|
0:06:29 | database |
---|
0:06:29 | and i expect we'll see more more papers |
---|
0:06:32 | um but uses database |
---|
0:06:34 | the the matrix is an is of been the the best um thing for the |
---|
0:06:39 | scientific if a component of music analysis music processing |
---|
0:06:42 | this is the you list of tasks |
---|
0:06:44 | that were that are being uh work done for the two thousand eleven competition same |
---|
0:06:48 | um as a matching it's a big issue and |
---|
0:06:51 | what the mean X people do um is |
---|
0:06:53 | provide an environment and universe you wanna or i where people can one are algorithms a large data base of |
---|
0:06:58 | of song |
---|
0:07:00 | so the songs never leave you know was so on or |
---|
0:07:02 | so instead of you know getting data and doing your algorithms and send results back |
---|
0:07:06 | you said you algorithm universe you on the white |
---|
0:07:08 | um in a particular environment java environment |
---|
0:07:11 | and they bought it a they could do about get the up to it for you |
---|
0:07:14 | and and then they run the algorithm and their machines and the clusters |
---|
0:07:17 | and give you like results |
---|
0:07:18 | i one to highlight um a three uh tasks |
---|
0:07:21 | that are so right here |
---|
0:07:23 | that are um very um important in very uh a popular |
---|
0:07:26 | what is audio tag um classification so how you tag audio with various things |
---|
0:07:30 | um is it happy use a blues |
---|
0:07:33 | um anything you think of can be a a attack |
---|
0:07:36 | and people were that very hard |
---|
0:07:38 | um what for fundamental frequency estimation tracking |
---|
0:07:40 | um has been popular a yeah i |
---|
0:07:42 | yeah i before merrick started |
---|
0:07:45 | but mirror X as i think of a coming database and and really up scientific level can not people can |
---|
0:07:51 | can compare things on around |
---|
0:07:53 | and the other one is a other get chord estimation |
---|
0:07:55 | so that sense a court is is to another tag |
---|
0:07:58 | but very specialised tearing |
---|
0:07:59 | and helps people understand a music and people work on a lot |
---|
0:08:03 | um something else as happen and spend very have it this year |
---|
0:08:06 | yeah is um a lower work can separation analysis |
---|
0:08:09 | and they are all very model different approaches |
---|
0:08:13 | so this particular um um graphical model um |
---|
0:08:17 | is for paper but um |
---|
0:08:19 | um |
---|
0:08:21 | my our open france a right |
---|
0:08:23 | and it's shows um a sequence the note along the top and so in this case a have a score |
---|
0:08:27 | in know what's what's being played and that's that hard information to get |
---|
0:08:30 | and then the generating um |
---|
0:08:33 | um data about the uh than the that harmonics |
---|
0:08:36 | um um from here so you have the the amplitude |
---|
0:08:39 | the free have no i and the variance of the of the of the gaussian in the spectral domain |
---|
0:08:43 | oops sorry that are combined |
---|
0:08:45 | and and then you have similar simple able in so these of the spectral slices |
---|
0:08:49 | in what you try to do what you trying to |
---|
0:08:51 | um given the note sequence you have um |
---|
0:08:53 | i'm sorry |
---|
0:08:55 | build a um or find the you the these |
---|
0:08:58 | um emission probabilities |
---|
0:09:00 | that describe a music |
---|
0:09:01 | and from that you can do a lot of um a very everything work |
---|
0:09:05 | um you can to do things like um tagging with to mentioned for things like a motion in john right |
---|
0:09:10 | and and uh uh um something that's kind of do to my heart but shows a the kind of work |
---|
0:09:15 | that's being done is area |
---|
0:09:16 | um some work i'm and morphing um |
---|
0:09:19 | and the question that um |
---|
0:09:21 | um quite a known and what they want to ask was |
---|
0:09:24 | what's the right way to think about um audio your perception |
---|
0:09:27 | and in morphing |
---|
0:09:29 | and so if you do more fink lee |
---|
0:09:31 | the |
---|
0:09:33 | the path in feature space should be a line |
---|
0:09:35 | so if you're morphing between one position another position |
---|
0:09:38 | that feature moves along a line in the will domain |
---|
0:09:40 | and you want the same sort of thing to happen in the auditory domain |
---|
0:09:44 | so |
---|
0:09:44 | the |
---|
0:09:45 | um |
---|
0:09:46 | the graph that shown here on the left them so put pro quality but just give you a sense of |
---|
0:09:50 | it |
---|
0:09:50 | or with or or a range of a line spectral free of frequency envelopes |
---|
0:09:56 | and then and the right hand side are |
---|
0:09:58 | all the perceptual measures that of been used there have been calculated based on these |
---|
0:10:03 | on these on L ourselves |
---|
0:10:05 | and what they're doing is final look for one that's a straight line would you can see and in the |
---|
0:10:08 | bill there |
---|
0:10:09 | and and um some pieces work better than others are i think that research is still being |
---|
0:10:14 | pursuit |
---|
0:10:17 | right so uh |
---|
0:10:18 | uh audio and acoustic signal processing T C |
---|
0:10:22 | covers was quite a wide range of areas um |
---|
0:10:25 | which are |
---|
0:10:26 | well |
---|
0:10:27 | i have to say that it to me there exciting i help you feel also that same excitement about said |
---|
0:10:32 | the technology that are being developed |
---|
0:10:34 | and and i think we see trends that a lot of the is this of being in the low archery |
---|
0:10:39 | for many years |
---|
0:10:41 | and now starting to come to the point of applications industrial applications |
---|
0:10:44 | and we for about some of these in the planner |
---|
0:10:47 | and and in that kind of context |
---|
0:10:50 | if we look at uh the research that we do |
---|
0:10:53 | um i ask a question of how much of it is driven by |
---|
0:10:57 | uh the that is i have for exciting applications |
---|
0:11:00 | and how much of it is fundamental how much of it |
---|
0:11:03 | underpins |
---|
0:11:04 | the |
---|
0:11:04 | technology with good algorithmic research |
---|
0:11:08 | um so i else you know is there a happy marriage here |
---|
0:11:14 | and uh i have the uh do you can touch is of cambridge will forgive me for using that photograph |
---|
0:11:19 | uh but there is a serious point a high this um but before we come to the series point |
---|
0:11:28 | um |
---|
0:11:29 | so uh of course prince william is very very pleased um having uh now find found is very fine bride |
---|
0:11:41 | so he's maximised is expectations |
---|
0:11:44 | um and uh i had a very uh happy day |
---|
0:11:48 | the there coming back to something a little bit more serious i think um things which look good have to |
---|
0:11:54 | be underpinned by |
---|
0:11:56 | excellent |
---|
0:11:57 | in uh algorithmic and fundamental research |
---|
0:12:00 | so if there is a trend perhaps |
---|
0:12:02 | two things that look great |
---|
0:12:04 | let's just not loose sight to the fact that the power |
---|
0:12:08 | behind them uh is |
---|
0:12:09 | uh the algorithms that we do |
---|
0:12:12 | okay |
---|
0:12:13 | so one of the areas of out grizzly research which is very hot and has been for a long time |
---|
0:12:18 | is in uh array signal processing is applied to |
---|
0:12:21 | microphones maybe also loudspeaker right |
---|
0:12:25 | and here we see um and even of applications hearing aids as been very busy for a long time |
---|
0:12:31 | and has a |
---|
0:12:32 | uh many applications as well as excellent underpinning technology |
---|
0:12:36 | i do see now a big brunch out into the living room |
---|
0:12:40 | and the living room means V |
---|
0:12:43 | it means entertainment perhaps it means an X box three sixty with a connects |
---|
0:12:47 | a microphone array uh perhaps it means sky T V |
---|
0:12:51 | and so these are new applications which are really coming on stream now |
---|
0:12:55 | and uh i think we'll start to shape |
---|
0:12:58 | the way that we do research |
---|
0:13:00 | at asks haven't to change that much we still want to do localization we still want to do tracking |
---|
0:13:05 | we still want to extract to decide source from any |
---|
0:13:08 | uh would be that noise or other tool "'cause" |
---|
0:13:11 | um and then and then you a pass a new task is to try to learn something about the acoustic |
---|
0:13:16 | environment |
---|
0:13:18 | from uh a by inferring it from the multichannel signals that we can obtain with the microphone right |
---|
0:13:24 | and this gives is a dish additional prior information on which we can condition estimation |
---|
0:13:30 | um |
---|
0:13:31 | know that it's you is what kind of microphone array should we use and how can we understand how it's |
---|
0:13:36 | gonna behave |
---|
0:13:38 | people started off perhaps looking at linear arrays |
---|
0:13:41 | um |
---|
0:13:41 | certainly extending it into play you and cylindrical and spherical even distributed or race that don't really have any geometry |
---|
0:13:48 | three |
---|
0:13:50 | and uh that's signed of such arrays including that's spacing |
---|
0:13:53 | of microphone elements and the orientation uh uh is uh an important an expanding topic i think |
---|
0:13:59 | people started off with linear arrays |
---|
0:14:01 | um |
---|
0:14:02 | a bunch of microphones in a line |
---|
0:14:04 | perhaps uh this is a well-known i can mike from M H acoustics |
---|
0:14:08 | uh thirty two sense on the surface of a rigid sphere a eight centimetres or so |
---|
0:14:13 | of the little bar or tree prototypes products |
---|
0:14:17 | the come now into real products you can buy |
---|
0:14:20 | and uh connect your T V sets sky T V |
---|
0:14:23 | as |
---|
0:14:24 | uh the opportunity to include microphone arrays |
---|
0:14:27 | for relatively low cost |
---|
0:14:28 | uh such that you can communicate uh using your living room equipment |
---|
0:14:33 | um |
---|
0:14:34 | for a a very low cost |
---|
0:14:35 | to |
---|
0:14:37 | communications and hardware well |
---|
0:14:39 | and the channel just here that you're probably sitting for me away from the microphone |
---|
0:14:44 | so uh uh uh this is going to be i think a really hot application for us |
---|
0:14:49 | in the future |
---|
0:14:52 | interestingly uh people are still doing fundamental research so i'm pleased to see that and that he's a paper i |
---|
0:14:57 | picked out uh |
---|
0:14:58 | i can't say a random but it caught my eye |
---|
0:15:01 | um he he's a problem given and the source is an M microphones |
---|
0:15:06 | where should you put the microphone |
---|
0:15:09 | and uh in this work which is some uh work i spotted from uh from the old about group |
---|
0:15:15 | i given a planar microphone array |
---|
0:15:17 | some analysis which enables one to predict |
---|
0:15:20 | the directivity index obtained for different geometries and therefore obviously then allows optimisation |
---|
0:15:26 | of those too much |
---|
0:15:29 | okay so source separation is uh another hot topic and has been for a while |
---|
0:15:34 | i thought i should say that's obviously trends |
---|
0:15:37 | start somewhere |
---|
0:15:38 | the trend |
---|
0:15:39 | has to begin with the trend setter |
---|
0:15:42 | and i put this photograph up of uh colin cherry |
---|
0:15:45 | um simply because i think he used to have the office which is above my office now so |
---|
0:15:50 | i also feel some kind of uh proximity effect |
---|
0:15:53 | um |
---|
0:15:54 | and uh his definition of the cocktail party in is nineteen fifties book on human communication has often is often |
---|
0:16:01 | quite it's in people's papers |
---|
0:16:03 | um and the early experiments were asking the question as to the behavior of listeners |
---|
0:16:08 | when they were receiving to almost simultaneous signals |
---|
0:16:11 | and uh |
---|
0:16:12 | cool that the cocktail party |
---|
0:16:14 | at the picture here i put it up on purpose because i don't think many people would really have a |
---|
0:16:19 | good image of what a cocktail party was in nineteen fifty |
---|
0:16:25 | and so i i guess it looks a bit different now a |
---|
0:16:29 | but anyway |
---|
0:16:30 | uh so |
---|
0:16:31 | progress in this area has led us to be able to handle cases where we have both that i mean |
---|
0:16:36 | and undeterred on to determine scenarios |
---|
0:16:39 | i'm clustering has been a very effective technique |
---|
0:16:42 | uh the permutation |
---|
0:16:44 | uh problem |
---|
0:16:46 | has been addressed uh with some great successes as well |
---|
0:16:49 | and now we're starting to see results in the practical context where we have reverberation as well |
---|
0:16:56 | the uh usual effect of reverberation is talked about in the context |
---|
0:17:00 | um of dereverberation algorithms for speech enhancement |
---|
0:17:04 | and uh this is something that i've uh myself tried to address |
---|
0:17:08 | and uh perhaps we now at the stage where there is a push to take some of the |
---|
0:17:13 | algorithms from the lab archery and start to roll them out into real world applications |
---|
0:17:19 | that's will then learn whether they work or not |
---|
0:17:22 | and uh we have to address the cases which are both single and channel case |
---|
0:17:27 | uh often by using acoustic channel inversion if we can estimate acoustic channel |
---|
0:17:33 | and although |
---|
0:17:35 | this is all |
---|
0:17:35 | a slight title speech enhancement of course reverberation |
---|
0:17:39 | uh is widely used |
---|
0:17:41 | both positively and has negative effects also in music so let's not lose sight of that |
---|
0:17:48 | the other factor which i wanted to touch on here was seen |
---|
0:17:52 | so |
---|
0:17:53 | and interdisciplinary research is often a favourites modality |
---|
0:17:57 | and did not community we can see some if it's coming from |
---|
0:18:01 | cross fertilisation of different topic areas |
---|
0:18:04 | for example |
---|
0:18:06 | all of uh dereverberation reverberation and blind source separation |
---|
0:18:09 | and we start to see papers where |
---|
0:18:11 | these are jointly |
---|
0:18:13 | uh uh uh addressed with some uh good leave each from both |
---|
0:18:17 | uh but types of techniques |
---|
0:18:19 | equally |
---|
0:18:20 | speech for uh dereverberation reverberation coupled with speech recognition |
---|
0:18:25 | where |
---|
0:18:26 | a classical speech recognizer is in hans |
---|
0:18:29 | uh such that it has knowledge of the models of clean speech but also |
---|
0:18:33 | has models for the reverberation |
---|
0:18:36 | and by combining these |
---|
0:18:37 | is able to make a a big improvements in a word accuracy |
---|
0:18:45 | so i want to talk a bit about a week or anything that i've been seeing or less two years |
---|
0:18:49 | um |
---|
0:18:50 | both in this community and an elsewhere but i thought i i'd and mention it here first and and |
---|
0:18:55 | and that's about sparsity |
---|
0:18:56 | um |
---|
0:18:57 | and and no we're not talking about my here |
---|
0:19:00 | um |
---|
0:19:03 | the |
---|
0:19:03 | first a i saw this um |
---|
0:19:05 | was in the matching pursuit work that was presented here and ninety seven i think that was first done and |
---|
0:19:10 | you know a signal processing |
---|
0:19:12 | a transactions and ninety three |
---|
0:19:14 | and um at the time i thought it was interesting but a dime idea |
---|
0:19:18 | um |
---|
0:19:20 | and so now i'm a crack myself |
---|
0:19:21 | um but it's own up a number of resting places um in in the work we that has been done |
---|
0:19:27 | um it i cast elsewhere |
---|
0:19:28 | um compressed sensing a a a few years ago um was a proper the best example |
---|
0:19:33 | um |
---|
0:19:34 | but in in this community um |
---|
0:19:36 | and we seen any can you know to sorry is still low just as deep belief network |
---|
0:19:41 | um |
---|
0:19:42 | sparsity D has been a big part of of the work that's been done on D of that works and |
---|
0:19:46 | in machine learning |
---|
0:19:47 | i think that's pen um you know sing |
---|
0:19:50 | and |
---|
0:19:51 | um in a lot of paper is that we saw this this year um |
---|
0:19:54 | L one regularization is a way of of providing solutions that that makes sense |
---|
0:20:00 | um |
---|
0:20:01 | when you have a very um go over determined um very complex um basis set |
---|
0:20:06 | and so i i |
---|
0:20:07 | i i title this or a spouse a D uh but it's probably better described a sparsity |
---|
0:20:12 | in combination with um over over complete basis sets |
---|
0:20:16 | and i think that combinations and resting |
---|
0:20:18 | oh one example of that um was talked about a little bit go |
---|
0:20:21 | and session before this |
---|
0:20:22 | um in the work by a um |
---|
0:20:24 | i i new in cr |
---|
0:20:26 | um using a cortical representation to um |
---|
0:20:30 | um |
---|
0:20:31 | to model sound |
---|
0:20:32 | and |
---|
0:20:33 | and courts is probably the original um |
---|
0:20:36 | a sparse representation |
---|
0:20:37 | um |
---|
0:20:38 | it predates all of us |
---|
0:20:40 | and and the idea is that you wanna represent sound with the least amount of of biological energy |
---|
0:20:46 | and what seems work well there is to use bikes there are |
---|
0:20:49 | represent of are very um |
---|
0:20:52 | a a distinct sound atoms and how the top put together is still a matter discussion |
---|
0:20:56 | but uh |
---|
0:20:57 | i think is the been gone be you know sing |
---|
0:20:59 | and the way a uh a new but and ch has been using that is two |
---|
0:21:03 | take noisy speech and input if you these kind of um this very overcomplete complete basis set |
---|
0:21:09 | and then |
---|
0:21:10 | um |
---|
0:21:12 | phil to it |
---|
0:21:13 | you and in we regions |
---|
0:21:15 | that that are |
---|
0:21:17 | likely to contain speech |
---|
0:21:19 | and so |
---|
0:21:20 | in a sense |
---|
0:21:21 | um it's a it's a wiener filter but it's in a very rich environment |
---|
0:21:25 | where it's very easy to separate um speech from noise and things like that |
---|
0:21:28 | and what's on the bottom is is noisy speech the kind of feel to that makes sense for speech |
---|
0:21:32 | which for example has a a lot of energy rather forwards modulation rate |
---|
0:21:36 | and then the clean clean speech on uh on the op |
---|
0:21:40 | um |
---|
0:21:40 | the deep belief networks are are you know thing um i think um for similar reason this all ties together |
---|
0:21:46 | um |
---|
0:21:46 | was shown in the left hand side it is um |
---|
0:21:49 | um |
---|
0:21:50 | is a little bit of a waveform that's been applied to a a |
---|
0:21:54 | a restricted boltzmann scene |
---|
0:21:56 | which is just a way of saying that they have a their legal learn weight matrix |
---|
0:21:59 | the transforms the input |
---|
0:22:01 | on the bottom here |
---|
0:22:03 | to an output |
---|
0:22:04 | uh so on top there |
---|
0:22:05 | few um a a a a make a weight matrix |
---|
0:22:08 | and is a what little bit of a nonlinear you there |
---|
0:22:11 | in a can learn these things in a way that um |
---|
0:22:14 | um |
---|
0:22:16 | can we construct input so find too |
---|
0:22:18 | find a basis vectors um on the side what where is that by the way picks vector X |
---|
0:22:23 | so that give "'em" of these guys they can we construct the the visible units it sorry |
---|
0:22:28 | um |
---|
0:22:28 | and these are some they been doing this for image processing domain for a long time |
---|
0:22:32 | and these are some results |
---|
0:22:33 | in the waveform domain there are there are new this year |
---|
0:22:36 | and there's a bunch of thing um things that often look like um |
---|
0:22:40 | uh gabor is a very sizes |
---|
0:22:42 | but the one thing as an or things you have to see some very complex features so this in the |
---|
0:22:46 | fixed a domain |
---|
0:22:47 | and you got these things that have to frequency P |
---|
0:22:49 | which you know might be akin to formants |
---|
0:22:52 | um |
---|
0:22:53 | and so they will applying that to to speech recognition and i think that's in sing direction |
---|
0:22:58 | i'm gonna limb here because um |
---|
0:23:00 | i think the reason that um |
---|
0:23:02 | suppose C D's important |
---|
0:23:04 | is it because it gives this a way of of representing things that we can't do with that we can't |
---|
0:23:08 | do was well in other domains |
---|
0:23:10 | so we have grew up with the voice transform domain and what's on an and a left can side at |
---|
0:23:14 | two basis functions |
---|
0:23:15 | is one a basis to just to frequencies |
---|
0:23:18 | and with those two basis functions you can represent the entire subspace space |
---|
0:23:22 | so that point that's shown there to be anyone that subspace and and you can do all those things |
---|
0:23:26 | and it's a very which representation is a as we all know |
---|
0:23:29 | you know as is a satisfy the nyquist criteria you can you can do anything |
---|
0:23:33 | but |
---|
0:23:34 | i think that's the problem with |
---|
0:23:35 | with |
---|
0:23:36 | a dense representation like that |
---|
0:23:37 | and alternative is to you is you look at something like an overcomplete bases |
---|
0:23:41 | and and just pick out elements at you've seen before |
---|
0:23:44 | so you you just as some synthetic formants |
---|
0:23:47 | but the way i like to think about these things working is that |
---|
0:23:50 | if you train um if you if you build a system that that it exploits um sparseness |
---|
0:23:55 | whether but belief network whether be matching pursuit |
---|
0:23:58 | um whatever your favourite implementation technology as |
---|
0:24:01 | you can learn patterns that look like these formants and so what's on the left is is one of all |
---|
0:24:06 | with different vocal tract lang |
---|
0:24:08 | and uh on the second and a and the right hand side as a different valid different vocal tract length |
---|
0:24:13 | and |
---|
0:24:15 | the system on the right with a sparse overcomplete representation is just gonna learn these kinds of things |
---|
0:24:20 | it's goal balls with different vocal tract length |
---|
0:24:22 | it's not colour need entire space |
---|
0:24:24 | and so that if you wanna process things |
---|
0:24:26 | if you working in this space |
---|
0:24:28 | then only things that are valid sound sounds it you seen before |
---|
0:24:31 | will be represented by the sparse basis fact |
---|
0:24:33 | but a basis that |
---|
0:24:34 | and it can do |
---|
0:24:35 | yeah useful things and so i think that's where it's can be an important trend in a port direction for |
---|
0:24:39 | unity |
---|
0:24:44 | so one of the things we wanted to do is to get out to different sectors of a a topic |
---|
0:24:48 | area and uh put in some uh i hopefully interesting quotations from |
---|
0:24:53 | uh i just in those field so |
---|
0:24:55 | and he's one that comes from um |
---|
0:24:58 | from T T so he we have telecommunications company |
---|
0:25:01 | uh thank you for uh to here here not at E |
---|
0:25:04 | for this code remaining challenges in source separation |
---|
0:25:08 | could include blind source separation for an unknown or dynamic |
---|
0:25:12 | number of source |
---|
0:25:14 | it is that i artificially officially in it's cherry jerry chair uh a photograph on the wall of the large |
---|
0:25:22 | uh into the E how what areas so if we think about mixed signal I sees |
---|
0:25:27 | uh the the guys at the working on those uh |
---|
0:25:31 | functionalities |
---|
0:25:32 | really support what we want to do |
---|
0:25:34 | uh so i think that that's important to to listen to the heart guys as well |
---|
0:25:39 | so from uh we'll so micro electronics |
---|
0:25:41 | uh most lower is driving dsp P speed and memory compacity and they billing implementation of sophisticated dsp functions |
---|
0:25:49 | resulting from me is of research |
---|
0:25:51 | the end user experience |
---|
0:25:53 | uh maybe this is a which rather than the reality of the moment |
---|
0:25:56 | the end user experience is one of natural white and voice communications devoid |
---|
0:26:01 | of acoustic background noise and unwanted artifacts |
---|
0:26:04 | seems to me like the hardware manufacturers are on our side |
---|
0:26:09 | um um we had uh a little bit this morning about the uh X box connect |
---|
0:26:13 | uh you found a have |
---|
0:26:15 | thanks |
---|
0:26:15 | for this uh a contribution here of the applications of sound capture and enhancement and processing technologies shift |
---|
0:26:23 | oh he's a paradigm shift |
---|
0:26:24 | shift gradually from communications |
---|
0:26:28 | which is where they |
---|
0:26:29 | where region eight isn't half the home |
---|
0:26:31 | mostly a towards mostly recognition and building natural human-machine interface |
---|
0:26:38 | uh and he highlights mobile devices |
---|
0:26:41 | "'cause" and living rooms |
---|
0:26:42 | i key application at |
---|
0:26:45 | malcolm you get the last word |
---|
0:26:46 | well i i don't the last word but but we we have one more slide and we can decide whether |
---|
0:26:50 | this is the last word from |
---|
0:26:51 | i'm steve jobs or from with a ga got a |
---|
0:26:54 | but in either case the message is same and this large commercial applications for the work that we're doing |
---|
0:26:59 | it started with um M P three which enable this market |
---|
0:27:03 | but this still a lot of things we done in terms of finding music |
---|
0:27:06 | um |
---|
0:27:07 | adding adding to things um understanding |
---|
0:27:09 | what what people's a team a needs are so we really haven't talked but that very much |
---|
0:27:12 | but |
---|
0:27:13 | um |
---|
0:27:14 | this is an information but this does not information retrieval task you know people looking for things that are chain |
---|
0:27:18 | themselves some whether be songs or or or or or music or whatever |
---|
0:27:22 | um i'm you signals and and working with them is an important thing to do |
---|
0:27:25 | and so |
---|
0:27:26 | um i think both lately got got and see jobs can have a final word |
---|
0:27:30 | so thank you |
---|
0:27:39 | so |
---|
0:27:40 | thank you |
---|
0:27:41 | my come and |
---|
0:27:42 | patrick rate |
---|
0:27:43 | a now we have very little time for discussion but we certainly should not miss this up you need T |
---|
0:27:49 | to hear other the voices as well as that we mentioned |
---|
0:27:52 | obviously these views are not completely balance |
---|
0:27:56 | how could it they be |
---|
0:27:58 | so maybe somebody in the for a would like to add some but |
---|
0:28:01 | something and we can |
---|
0:28:03 | a we have a little discussion on more |
---|
0:28:06 | anybody |
---|
0:28:08 | yeah |
---|
0:28:13 | a thank you for that great summary |
---|
0:28:15 | uh i just want to add one more thing i think up |
---|
0:28:18 | we have to a isn't two years and the work together |
---|
0:28:21 | and i think cross model issues are |
---|
0:28:24 | a likely to be very important the |
---|
0:28:27 | i eyes did act that you has and the years detect the eyes and so on and |
---|
0:28:30 | likewise i think uh audition audio research and B vision suck should not |
---|
0:28:35 | proceed separately |
---|
0:28:37 | thanks |
---|
0:28:38 | the money for this comment uh |
---|
0:28:41 | this is certainly something which we highly appreciate and we always like to be in touch with the |
---|
0:28:46 | multimedia guys would don C uh audio as a media |
---|
0:28:50 | um |
---|
0:28:51 | but uh uh |
---|
0:28:53 | certainly we uh there are many applications where we actually closely working |
---|
0:28:58 | with with your persons just think about |
---|
0:29:01 | uh celeste tracking |
---|
0:29:03 | so if you want to track some acoustic sources |
---|
0:29:06 | and the source a silent then you're a the uh you better use you camera |
---|
0:29:11 | so they are |
---|
0:29:12 | a quite a few applications with this is quite natural to joint for |
---|
0:29:19 | i i you know just a |
---|
0:29:21 | to reinforce that there was a nice people saw us to remember who did it |
---|
0:29:24 | with their looking for joint source |
---|
0:29:26 | joint audiovisual sources and i think that's |
---|
0:29:29 | it's important and |
---|
0:29:30 | it can be easier i mean |
---|
0:29:31 | the signals are no longer a big deal |
---|
0:29:34 | so it's easy to get to the space commuter power is pretty easy |
---|
0:29:37 | it would be fun |
---|
0:29:42 | followed that uh people have to |
---|
0:29:44 | okay follow that talks about four years |
---|
0:29:47 | uh is there any research uh |
---|
0:29:49 | well i use a pen binaural a single person sinful |
---|
0:29:53 | binaural uh for musical signal processing |
---|
0:29:59 | i don't i don't heat so the question was whether is any binaural music research um |
---|
0:30:03 | i don't know of any i mean people certainly worry about um synthesizing um hi |
---|
0:30:08 | um high fidelity sound fields |
---|
0:30:11 | so um |
---|
0:30:13 | um |
---|
0:30:14 | the fun of a group for example from working on on synthesizing |
---|
0:30:17 | you know sound field a sound good no matter where you are |
---|
0:30:20 | and and so you know work with people stand for |
---|
0:30:22 | where various in in computing in in creating three D sound fields |
---|
0:30:26 | for musical experiences |
---|
0:30:28 | um |
---|
0:30:29 | um but i much or where X i go yeah |
---|
0:30:33 | i mean i i i if you'd S be ten use you whether we have five point one speakers in |
---|
0:30:36 | the living room |
---|
0:30:37 | i was set no |
---|
0:30:38 | but |
---|
0:30:39 | look what's happened |
---|
0:30:40 | so we we better |
---|
0:30:46 | else before lunch |
---|
0:30:52 | okay you talked about uh five point ones because the living room but um |
---|
0:30:56 | or thing a lot of new algorithms that a little do uh microphone array processing |
---|
0:31:01 | well would be saying devices that let us do it |
---|
0:31:03 | i mean like soft connect has a a a a few microphones i've seen a few um |
---|
0:31:08 | cell phones that have multiple microphones on for noise cancellation will have more devices allow us to |
---|
0:31:14 | a better processing algorithm |
---|
0:31:16 | yeah so the question was what what we have devices that will have uh |
---|
0:31:19 | uh the ability to allow us to implement |
---|
0:31:23 | yeah |
---|
0:31:24 | so i P eyes |
---|
0:31:25 | so on so forth |
---|
0:31:26 | i i i understand from this morning talks that day be a a um a guys will be a software |
---|
0:31:30 | development kits will be available for connect |
---|
0:31:32 | um and that could be a lot of fun |
---|
0:31:34 | um i think uh the hardware is that to enable us to do it and |
---|
0:31:38 | the key point at of this i think is one of the trends that uh |
---|
0:31:43 | uh we use C which is a move |
---|
0:31:46 | in audio from single to multichannel |
---|
0:31:48 | that's been happening for a while and that is their sign of its stopping |
---|
0:31:52 | as so the of we would expect the facilities |
---|
0:31:54 | uh the processing power |
---|
0:31:56 | the uh inter operability and software development kits to come with that as well |
---|
0:32:05 | near the question |
---|
0:32:07 | comments |
---|
0:32:09 | i have one uh |
---|
0:32:10 | final remark which came mark |
---|
0:32:13 | increasingly uh |
---|
0:32:15 | and that would like to put that as a channel a challenge because |
---|
0:32:18 | uh they're sensor networks are out there and they are |
---|
0:32:21 | uh in discussion on |
---|
0:32:24 | in many papers where a nice uh |
---|
0:32:28 | algorithms are provided all ways based on the assumption that all the senses are synchronise |
---|
0:32:35 | um |
---|
0:32:36 | this is a |
---|
0:32:37 | tough problem actually so |
---|
0:32:39 | and we feel in the audio community we could a |
---|
0:32:43 | if a lot if somebody could really built devices which make sure that all the audio front ends in |
---|
0:32:49 | distributed to beauty work |
---|
0:32:51 | synchrony a the synchronise |
---|
0:32:53 | uh the underlying problem is simply the |
---|
0:32:57 | once you |
---|
0:32:58 | correlates signals of different senses that |
---|
0:33:01 | um have |
---|
0:33:03 | not exactly synchronous clocks |
---|
0:33:06 | the what uh this |
---|
0:33:08 | correlation |
---|
0:33:09 | will fall apart |
---|
0:33:11 | and |
---|
0:33:11 | just look at all your optimize nation and all the adaptive filtering stuff that we have |
---|
0:33:16 | it's always based on correlation and |
---|
0:33:18 | even higher orders the |
---|
0:33:20 | but then uh |
---|
0:33:22 | this problem has to be solved |
---|
0:33:24 | and so if you want to do something really |
---|
0:33:27 | uh a good for us then please solve this problem |
---|
0:33:32 | as a have after once |
---|
0:33:34 | after lunch okay |
---|
0:33:36 | thank you were much for attending |
---|