0:00:13 | and run uh uh and uh you know functional |
---|
0:00:16 | um uh for that |
---|
0:00:17 | for thought of that |
---|
0:00:19 | but are |
---|
0:00:19 | uh |
---|
0:00:21 | you |
---|
0:00:22 | tech |
---|
0:00:23 | the right |
---|
0:00:24 | and going |
---|
0:00:24 | for a a |
---|
0:00:27 | know |
---|
0:00:27 | oh |
---|
0:00:28 | and like or use a course there are |
---|
0:00:31 | yeah i and we talk that the real problem uh |
---|
0:00:34 | i wouldn't never seen although all |
---|
0:00:36 | yeah |
---|
0:00:37 | so would change from code work |
---|
0:00:40 | but not what is a type or in the uh D |
---|
0:00:43 | if not from that but |
---|
0:00:45 | and uh we really want you know are still uh try |
---|
0:00:50 | oh a pitch detection the is uh |
---|
0:00:53 | and essential for in |
---|
0:00:55 | compute a know |
---|
0:00:56 | all to racine |
---|
0:00:57 | and not |
---|
0:00:58 | uh |
---|
0:00:59 | for to a meeting |
---|
0:01:01 | but it all the tears system |
---|
0:01:04 | mean a a out one source model don |
---|
0:01:07 | you know that machines and can to use |
---|
0:01:10 | really D |
---|
0:01:11 | a a a a really can |
---|
0:01:13 | do it when they |
---|
0:01:14 | so uh |
---|
0:01:16 | i'm the |
---|
0:01:17 | oh |
---|
0:01:18 | it what do you by the yet very easy uh an a as the cocktail party problem |
---|
0:01:23 | and they are on uh |
---|
0:01:25 | and it's very common from a you |
---|
0:01:27 | uh |
---|
0:01:28 | a a at work |
---|
0:01:30 | vol |
---|
0:01:31 | and to the model |
---|
0:01:32 | a chat or uh |
---|
0:01:33 | and |
---|
0:01:34 | or or be |
---|
0:01:35 | computational of the dreams |
---|
0:01:37 | and how |
---|
0:01:38 | so the real to replicate the |
---|
0:01:41 | yeah |
---|
0:01:41 | oh |
---|
0:01:42 | so once or |
---|
0:01:44 | one day |
---|
0:01:45 | she |
---|
0:01:46 | so that in the uh |
---|
0:01:48 | a a a a a a a model |
---|
0:01:50 | face |
---|
0:01:51 | we have several stages a frequency the analysis |
---|
0:01:54 | and and are looking for discriminative features |
---|
0:01:57 | a model speakers namely each frequency once that all of that |
---|
0:02:02 | spatial diversity |
---|
0:02:03 | all that but with and then |
---|
0:02:05 | some some of the |
---|
0:02:06 | and grouping |
---|
0:02:07 | at the end |
---|
0:02:08 | have some um |
---|
0:02:10 | a a well we're |
---|
0:02:11 | a a one one or |
---|
0:02:13 | oh i really would like to apply a one mask when we by this one by D |
---|
0:02:19 | spectrogram of the mixture sure we can call over the underlying source |
---|
0:02:23 | oh are we are interested in a single channel speech separation |
---|
0:02:27 | and you have two sources and one is speaker |
---|
0:02:30 | and we don't have any special favours in nation for T V |
---|
0:02:36 | oh |
---|
0:02:37 | so |
---|
0:02:38 | and and is based on our previous work |
---|
0:02:41 | we we ah i have a i don't know do you mention to the right model |
---|
0:02:45 | basically uh |
---|
0:02:46 | also uh |
---|
0:02:48 | we propose |
---|
0:02:49 | well well track information |
---|
0:02:51 | another other their discriminative feature you small |
---|
0:02:54 | sort all but there and semantic information you move them more right and we use them when we |
---|
0:03:00 | separate um |
---|
0:03:01 | so these are some prior knowledge can be trained |
---|
0:03:04 | uh |
---|
0:03:06 | i i one |
---|
0:03:07 | so that |
---|
0:03:09 | using um |
---|
0:03:10 | um |
---|
0:03:11 | but and that would be a record two sources |
---|
0:03:13 | so |
---|
0:03:14 | i |
---|
0:03:14 | method that very well but |
---|
0:03:17 | we did a |
---|
0:03:18 | a good was that in order to have some estimate from on the line |
---|
0:03:22 | each one |
---|
0:03:23 | so |
---|
0:03:26 | mean streets and we work and this uh well |
---|
0:03:29 | uh |
---|
0:03:29 | and then lot |
---|
0:03:31 | a P to which has several |
---|
0:03:33 | feature |
---|
0:03:34 | a |
---|
0:03:35 | five |
---|
0:03:36 | i |
---|
0:03:37 | yeah |
---|
0:03:37 | pitch contours and also a them to individual source |
---|
0:03:41 | so many our proposed works just only |
---|
0:03:45 | it that the contour and |
---|
0:03:46 | do want to |
---|
0:03:47 | on the a |
---|
0:03:49 | contour |
---|
0:03:49 | two |
---|
0:03:50 | individual |
---|
0:03:52 | and uh |
---|
0:03:53 | that is it is assumed that a one one of the on the line sort of is always |
---|
0:03:58 | which make it prediction very |
---|
0:04:00 | and also but that use a different from a a lot of papers |
---|
0:04:05 | essentially for a |
---|
0:04:07 | a music signal in which T |
---|
0:04:09 | time |
---|
0:04:10 | frequency continued to use more pronounced |
---|
0:04:12 | such that the could and is very easy |
---|
0:04:16 | or speech |
---|
0:04:17 | and also this these uh uh uh what is it should be |
---|
0:04:20 | we from the |
---|
0:04:21 | a rebel |
---|
0:04:23 | pitch action for single |
---|
0:04:25 | speaker |
---|
0:04:26 | but |
---|
0:04:26 | we have another source |
---|
0:04:29 | uh |
---|
0:04:30 | and and have some sort of the |
---|
0:04:32 | uh |
---|
0:04:33 | but in each or and we need we do need to recover both |
---|
0:04:37 | so |
---|
0:04:38 | but you a high level |
---|
0:04:40 | well |
---|
0:04:41 | the i don't from and and D tracker |
---|
0:04:43 | uh |
---|
0:04:45 | a a for uh stages |
---|
0:04:47 | section grouping separate |
---|
0:04:50 | interpolation |
---|
0:04:51 | but it can be a little what |
---|
0:04:53 | uh oh |
---|
0:04:54 | but was by we need only to resist |
---|
0:04:56 | also |
---|
0:04:56 | some some sort of a |
---|
0:04:58 | ah |
---|
0:04:59 | to making feature |
---|
0:05:01 | P |
---|
0:05:01 | and |
---|
0:05:02 | so a separate |
---|
0:05:03 | i i and |
---|
0:05:04 | in interpolating for a ah |
---|
0:05:06 | we uh one |
---|
0:05:10 | oh uh i think there's |
---|
0:05:11 | stage a a uh a a more pitch detection |
---|
0:05:15 | oh i mean why by the work of a client |
---|
0:05:19 | oh |
---|
0:05:19 | he's group but that propose a distortion measure for in |
---|
0:05:25 | so uh basically |
---|
0:05:27 | uh a as a why uh |
---|
0:05:29 | there is at all |
---|
0:05:30 | it of course so you in but that thing think the white one |
---|
0:05:36 | section |
---|
0:05:36 | we we our goal is to me white these source and B the new signal is the text of the |
---|
0:05:43 | the of the signal |
---|
0:05:44 | and D or its deviation of the spectral densities for on the line sources |
---|
0:05:49 | and they have a i |
---|
0:05:50 | but are aggressive or model |
---|
0:05:53 | two source are and then |
---|
0:05:55 | we need mean one |
---|
0:05:56 | that that in order to record the each |
---|
0:05:59 | a line |
---|
0:06:00 | a time |
---|
0:06:01 | yeah why the same concept |
---|
0:06:03 | i |
---|
0:06:04 | instead of uh you a C I |
---|
0:06:07 | you with a sinusoidal model and which are more suitable for the past well |
---|
0:06:11 | that's |
---|
0:06:12 | so we yeah |
---|
0:06:13 | she thought what |
---|
0:06:14 | you know the new signal |
---|
0:06:16 | yeah |
---|
0:06:17 | uh so one |
---|
0:06:18 | yeah it's of two |
---|
0:06:20 | a a you know the one source a |
---|
0:06:23 | and our goal |
---|
0:06:24 | for |
---|
0:06:25 | detection they |
---|
0:06:26 | to minimize these |
---|
0:06:28 | a distortion |
---|
0:06:30 | so uh |
---|
0:06:32 | for this that for the |
---|
0:06:34 | the uh uh uh a classic paper by mac will like they show that |
---|
0:06:37 | we can |
---|
0:06:39 | a group symmetry uh "'cause" that this that the in terms of sinusoidal modeling using |
---|
0:06:45 | some of some |
---|
0:06:46 | a a sound a does signals |
---|
0:06:49 | or or a a a a thing to of peaks |
---|
0:06:51 | the spectrum |
---|
0:06:52 | and that we we present |
---|
0:06:54 | i a a a you all uh be though |
---|
0:06:57 | and that |
---|
0:06:59 | that the you of L O I E the uh |
---|
0:07:02 | location of P |
---|
0:07:04 | for the presentation or the sinusoidal model |
---|
0:07:07 | but the the peaks |
---|
0:07:09 | don't occur exactly i at in with bit integer all |
---|
0:07:13 | uh |
---|
0:07:14 | fundamental frequency yeah another to out where here |
---|
0:07:17 | and B to a parameter in order to to a exactly match |
---|
0:07:23 | so uh |
---|
0:07:24 | you we have to and i don't for and to uh the location of the |
---|
0:07:29 | and so |
---|
0:07:31 | because we do not have access to the location along the line source P |
---|
0:07:35 | so we apply these |
---|
0:07:36 | approximation which we found the what pretty well in right |
---|
0:07:40 | so |
---|
0:07:41 | and to uh |
---|
0:07:44 | or are bits that separate |
---|
0:07:46 | say |
---|
0:07:47 | i |
---|
0:07:47 | a and then you paris |
---|
0:07:49 | and then be assign peak |
---|
0:07:51 | each data source |
---|
0:07:53 | and then they are very close the |
---|
0:07:55 | sign no ha of the peak to each individual sources |
---|
0:07:58 | and then and |
---|
0:07:59 | oh |
---|
0:07:59 | the |
---|
0:08:01 | only problem to the the to me might you the two pitch a |
---|
0:08:05 | a uh |
---|
0:08:06 | uh |
---|
0:08:07 | points |
---|
0:08:07 | so we we we my station |
---|
0:08:09 | and we got some um |
---|
0:08:11 | estimation for the |
---|
0:08:12 | i but one source for each |
---|
0:08:16 | a are you uh |
---|
0:08:17 | yeah idea of how |
---|
0:08:18 | to uh because that |
---|
0:08:20 | a whole one a for more speak to a the signal |
---|
0:08:24 | ah |
---|
0:08:24 | we have |
---|
0:08:25 | a source here |
---|
0:08:26 | a first one had a week one up to twenty eight |
---|
0:08:29 | the second one |
---|
0:08:30 | nine three |
---|
0:08:31 | i think and he he's there cool are |
---|
0:08:34 | so with the and the more people are integer all |
---|
0:08:37 | pitch frequency |
---|
0:08:38 | are not exactly a query in the more people are integer of the fundamental frequency |
---|
0:08:43 | mean and the white out uh from to you play around with these |
---|
0:08:48 | and the order to get these uh |
---|
0:08:51 | and do the thinking for that in |
---|
0:08:54 | a a a a a a can with these uh |
---|
0:08:57 | uh |
---|
0:08:58 | and and sort them than the signal we |
---|
0:09:00 | uh to minimizing |
---|
0:09:03 | i the second |
---|
0:09:03 | stop |
---|
0:09:04 | it |
---|
0:09:04 | i of the power was also |
---|
0:09:06 | a we detect a peak detection now or |
---|
0:09:09 | grouping |
---|
0:09:10 | that |
---|
0:09:11 | pitch a a a a a a a one |
---|
0:09:13 | so |
---|
0:09:13 | but yeah he is a large a a a a i don't |
---|
0:09:16 | long detection |
---|
0:09:17 | i in two D can "'cause" you don't want to point |
---|
0:09:20 | you want to be the curve to it |
---|
0:09:23 | so |
---|
0:09:23 | well you here uh that's you in the first frame |
---|
0:09:27 | but we search for a uh and these two or more |
---|
0:09:31 | the second row |
---|
0:09:32 | i |
---|
0:09:33 | or and that the reference of any P |
---|
0:09:36 | i |
---|
0:09:36 | one one in in it or what any um |
---|
0:09:39 | pitch and be |
---|
0:09:40 | we group and to get a |
---|
0:09:42 | and re |
---|
0:09:44 | oh |
---|
0:09:45 | a a a a a very |
---|
0:09:47 | to be for another |
---|
0:09:48 | and |
---|
0:09:50 | or not |
---|
0:09:51 | it can not be grouped into used uh |
---|
0:09:53 | um |
---|
0:09:54 | first |
---|
0:09:55 | so no one one another core |
---|
0:09:57 | a very uh uh uh |
---|
0:09:59 | uh uh |
---|
0:10:00 | that that for you try to each candidate |
---|
0:10:03 | and |
---|
0:10:04 | and and i got from five like |
---|
0:10:08 | now the second stage is that |
---|
0:10:10 | separate |
---|
0:10:11 | so uh |
---|
0:10:13 | we or that |
---|
0:10:14 | do the separation be uh |
---|
0:10:16 | she mean not be track |
---|
0:10:19 | and then compared the |
---|
0:10:20 | we you know we will try |
---|
0:10:23 | if the longest track |
---|
0:10:24 | yeah have in these uh |
---|
0:10:26 | to to the a representation |
---|
0:10:28 | i is that we do we do a right |
---|
0:10:32 | and and the longest track |
---|
0:10:33 | smaller than a threshold |
---|
0:10:35 | he |
---|
0:10:36 | sound in to one group |
---|
0:10:37 | you |
---|
0:10:38 | if not then you read a a of them to the sec |
---|
0:10:41 | i we we basically |
---|
0:10:43 | separate the uh |
---|
0:10:45 | individual tracks |
---|
0:10:46 | two source |
---|
0:10:49 | and that the that |
---|
0:10:50 | state |
---|
0:10:51 | yeah |
---|
0:10:52 | because |
---|
0:10:52 | we have to the |
---|
0:10:54 | you know |
---|
0:10:55 | we have a problem my |
---|
0:10:57 | there for the a or some sort of interpolation in order to the record me stand the mean |
---|
0:11:03 | pitch frequencies |
---|
0:11:04 | and some time here |
---|
0:11:05 | you here that |
---|
0:11:07 | that might be you to i'm voice signal power like is like that |
---|
0:11:13 | about that |
---|
0:11:14 | you to a second and the these uh |
---|
0:11:17 | and data |
---|
0:11:17 | using the relation |
---|
0:11:19 | requiring covering |
---|
0:11:20 | max |
---|
0:11:21 | uh uh |
---|
0:11:23 | so |
---|
0:11:24 | oh |
---|
0:11:25 | or are some than those from |
---|
0:11:28 | we |
---|
0:11:29 | tire uh a |
---|
0:11:31 | he's try |
---|
0:11:32 | and here this is another uh nice |
---|
0:11:35 | frequency |
---|
0:11:36 | ah |
---|
0:11:36 | and uh we also have another uh a heuristic parameters that with |
---|
0:11:42 | track or the overlapping they can be |
---|
0:11:44 | don't to want source |
---|
0:11:46 | so you the presence of two source |
---|
0:11:48 | and i a lot of |
---|
0:11:50 | the |
---|
0:11:50 | oh |
---|
0:11:51 | the to make the pitch contour which of the exact to the uh uh uh a reference uh one |
---|
0:11:58 | no so uh that's still |
---|
0:12:00 | i |
---|
0:12:01 | that a E |
---|
0:12:02 | can and detect on the line each one |
---|
0:12:05 | so but yeah |
---|
0:12:07 | but not well um |
---|
0:12:08 | results |
---|
0:12:09 | we are sure |
---|
0:12:11 | i one ninety |
---|
0:12:13 | oh |
---|
0:12:15 | like you know |
---|
0:12:16 | um i |
---|
0:12:17 | a combination of gender |
---|
0:12:18 | me mail |
---|
0:12:20 | in a email |
---|
0:12:21 | maybe a met |
---|
0:12:22 | a we with that the uh |
---|
0:12:25 | are be to interference rate you to zero to eighteen db |
---|
0:12:29 | uh hamming i mean window of black |
---|
0:12:31 | i it is that |
---|
0:12:32 | to bring the of the ten millisecond |
---|
0:12:34 | where live |
---|
0:12:36 | a a a new to segment the signal |
---|
0:12:38 | a reference speech |
---|
0:12:39 | uh uh are a using the uh |
---|
0:12:42 | talking method which is very what was five |
---|
0:12:44 | and accurate |
---|
0:12:45 | and uh the uh the white three |
---|
0:12:48 | a very or uh maybe ross or right |
---|
0:12:51 | and a your mention |
---|
0:12:52 | a previously |
---|
0:12:53 | a a voiced unvoiced or rate |
---|
0:12:56 | and |
---|
0:12:56 | separation error |
---|
0:12:57 | we compare this the with the uh one of the or a the to was back |
---|
0:13:02 | de leon wiring groups that have |
---|
0:13:05 | um |
---|
0:13:06 | um there |
---|
0:13:07 | have a have applied some sort of gammatone few trained with a channel |
---|
0:13:11 | and and another at that or or or or a a a proposed by captain in night |
---|
0:13:15 | for |
---|
0:13:16 | of course |
---|
0:13:16 | a sort of a |
---|
0:13:18 | harmonic suppression or |
---|
0:13:20 | for the so |
---|
0:13:21 | yeah |
---|
0:13:22 | or or or a result uh |
---|
0:13:24 | for error rate versus the target to interference ratio |
---|
0:13:28 | ah |
---|
0:13:29 | have three sets of lot |
---|
0:13:31 | which how one so the result for target |
---|
0:13:35 | and the battle for |
---|
0:13:37 | we are and |
---|
0:13:38 | um |
---|
0:13:39 | we have to lines here a dynamo you go |
---|
0:13:42 | and the uh |
---|
0:13:44 | so that one and um and and uh |
---|
0:13:46 | for one of those and |
---|
0:13:48 | uh the proposed method |
---|
0:13:50 | uh |
---|
0:13:50 | the the that we but that's stand for the uh |
---|
0:13:54 | hmmm |
---|
0:13:55 | and |
---|
0:13:55 | i have some good |
---|
0:13:57 | you know but |
---|
0:13:58 | and it can for all |
---|
0:14:00 | can be nation of mixtures |
---|
0:14:02 | a and five factor |
---|
0:14:04 | a a a from the two other techniques um |
---|
0:14:07 | so if we can see |
---|
0:14:08 | uh |
---|
0:14:09 | and there for the target |
---|
0:14:11 | if we a signal |
---|
0:14:12 | L |
---|
0:14:13 | so |
---|
0:14:14 | a a you is to a a voice as all |
---|
0:14:17 | uh |
---|
0:14:19 | he to incorporate these he met that you can be it and propose anything for |
---|
0:14:23 | the in the unvoiced the only work who worked know so we have to in here |
---|
0:14:29 | um |
---|
0:14:30 | a |
---|
0:14:31 | kinetic |
---|
0:14:31 | see |
---|
0:14:32 | a a a and it's factor uh |
---|
0:14:35 | oh a very well fit to the other that |
---|
0:14:37 | for all combination me male |
---|
0:14:40 | in in a minute |
---|
0:14:41 | and male email |
---|
0:14:42 | um week |
---|
0:14:44 | and the point only in terms of uh |
---|
0:14:47 | separation error |
---|
0:14:49 | uh |
---|
0:14:50 | we we see a you |
---|
0:14:51 | i two method |
---|
0:14:53 | uh |
---|
0:14:53 | how our method |
---|
0:14:55 | very robust against one |
---|
0:14:57 | a a a a a sort of the |
---|
0:14:59 | to like or |
---|
0:15:00 | and uh uh uh you get in separation performance for to uh |
---|
0:15:05 | a method |
---|
0:15:07 | yeah so uh |
---|
0:15:09 | there are a number of issues that should be risk |
---|
0:15:12 | a about this for uh so |
---|
0:15:15 | i have a problem but two pitch contours are are crossing each other how |
---|
0:15:19 | we can assign and two different sorts of than the pitch contours are very close |
---|
0:15:24 | a green |
---|
0:15:25 | i don't know even |
---|
0:15:26 | oh uh uh are the to system can separate them are our uh we are working to improve the performance |
---|
0:15:32 | in by applying some prior knowledge about a speakers |
---|
0:15:36 | i believe we can also apply the spatial diversity another at another clue |
---|
0:15:41 | we can uh yeah i meant to to do small and improve the for one |
---|
0:15:45 | and some prior knowledge about |
---|
0:15:47 | the |
---|
0:15:48 | there uh |
---|
0:15:49 | i been working on a bayesian inference method |
---|
0:15:52 | the performance |
---|
0:15:53 | yeah |
---|
0:15:53 | i would like to time is uh |
---|
0:15:56 | a a called me |
---|
0:15:58 | not |
---|
0:15:58 | and the only one who provided codes |
---|
0:16:01 | i three D for a a a a researcher or so |
---|
0:16:04 | for really how to compare |
---|
0:16:07 | a with a |
---|
0:16:08 | so |
---|
0:16:09 | and that |
---|
0:16:09 | ah |
---|
0:16:11 | right now week |
---|
0:16:12 | the whole of the code it's some demos from a my page |
---|
0:16:16 | i is |
---|
0:16:17 | really |
---|
0:16:18 | and that one so |
---|
0:16:19 | and now finally to you uh |
---|
0:16:22 | a to taking any question or comment about |
---|
0:16:32 | i |
---|
0:16:48 | but you come in a little bit i |
---|
0:16:50 | do you mean in terms of separation |
---|
0:16:52 | a sequential grouping problem of interest |
---|
0:16:55 | i think that that just to they're not actual |
---|
0:16:57 | separation and six that's right means voice |
---|
0:16:59 | yes |
---|
0:17:00 | the uh |
---|
0:17:02 | a pitch you want to do |
---|
0:17:05 | a speaker |
---|
0:17:06 | and and tracker as signs it to the seconds |
---|
0:17:09 | that as the separation |
---|
0:17:11 | yeah i a this classification be two class |
---|
0:17:14 | and i i i i are or correctly that we would also some that's make any attempt |
---|
0:17:18 | to |
---|
0:17:18 | to solve the problem |
---|
0:17:19 | that |
---|
0:17:21 | a i i i i one and that method that it if you need to do a different at contours |
---|
0:17:26 | for |
---|
0:17:27 | to |
---|
0:17:28 | no |
---|
0:17:29 | sure |
---|
0:17:30 | and |
---|
0:17:31 | yeah yeah maybe a |
---|
0:17:33 | and a |
---|
0:17:34 | we then and can the feast five |
---|
0:17:37 | the fact that you're are getting two contours from two speakers i got it at home |
---|
0:17:42 | it sounds |
---|
0:17:43 | okay |
---|
0:17:47 | so something about two you of to look at it that you like to |
---|
0:17:50 | translating to do most mcclay speak |
---|
0:17:53 | a charter |
---|
0:17:53 | to to look at the model look at that you |
---|
0:17:56 | use |
---|
0:17:57 | was that translates to like C |
---|
0:17:59 | and is that being online have uh |
---|
0:18:04 | just it something i duration of tracks so you you okay yeah i i i i |
---|
0:18:09 | uh marking cooking of S |
---|
0:18:11 | a i think it duration of sentences are about seven |
---|
0:18:15 | to two sec |
---|
0:18:17 | two to sec |
---|
0:18:22 | yeah |
---|
0:18:23 | oh |
---|
0:18:26 | i |
---|
0:18:30 | i |
---|
0:18:31 | i |
---|
0:18:37 | but the method automatically a as the and voices |
---|
0:18:41 | so when you have these uh |
---|
0:18:46 | five |
---|
0:18:48 | the |
---|
0:18:49 | so uh |
---|
0:18:50 | yeah as and what |
---|
0:18:52 | so we |
---|
0:18:54 | reference |
---|
0:18:55 | a a a a a i i the uh |
---|
0:18:58 | and with anything specifically to recognise and work |
---|
0:19:02 | a fact that you don't have a |
---|
0:19:04 | a a to here |
---|
0:19:05 | uh |
---|
0:19:06 | yeah i that we don't have any |
---|
0:19:08 | yeah |
---|
0:19:09 | which |
---|
0:19:11 | i think yeah |
---|
0:19:13 | i thank you very much your |
---|