0:00:13and run uh uh and uh you know functional
0:00:16um uh for that
0:00:17for thought of that
0:00:19but are
0:00:19uh
0:00:21you
0:00:22tech
0:00:23the right
0:00:24and going
0:00:24for a a
0:00:27know
0:00:27oh
0:00:28and like or use a course there are
0:00:31yeah i and we talk that the real problem uh
0:00:34i wouldn't never seen although all
0:00:36yeah
0:00:37so would change from code work
0:00:40but not what is a type or in the uh D
0:00:43if not from that but
0:00:45and uh we really want you know are still uh try
0:00:50oh a pitch detection the is uh
0:00:53and essential for in
0:00:55compute a know
0:00:56all to racine
0:00:57and not
0:00:58uh
0:00:59for to a meeting
0:01:01but it all the tears system
0:01:04mean a a out one source model don
0:01:07you know that machines and can to use
0:01:10really D
0:01:11a a a a really can
0:01:13do it when they
0:01:14so uh
0:01:16i'm the
0:01:17oh
0:01:18it what do you by the yet very easy uh an a as the cocktail party problem
0:01:23and they are on uh
0:01:25and it's very common from a you
0:01:27uh
0:01:28a a at work
0:01:30vol
0:01:31and to the model
0:01:32a chat or uh
0:01:33and
0:01:34or or be
0:01:35computational of the dreams
0:01:37and how
0:01:38so the real to replicate the
0:01:41yeah
0:01:41oh
0:01:42so once or
0:01:44one day
0:01:45she
0:01:46so that in the uh
0:01:48a a a a a a a model
0:01:50face
0:01:51we have several stages a frequency the analysis
0:01:54and and are looking for discriminative features
0:01:57a model speakers namely each frequency once that all of that
0:02:02spatial diversity
0:02:03all that but with and then
0:02:05some some of the
0:02:06and grouping
0:02:07at the end
0:02:08have some um
0:02:10a a well we're
0:02:11a a one one or
0:02:13oh i really would like to apply a one mask when we by this one by D
0:02:19spectrogram of the mixture sure we can call over the underlying source
0:02:23oh are we are interested in a single channel speech separation
0:02:27and you have two sources and one is speaker
0:02:30and we don't have any special favours in nation for T V
0:02:36oh
0:02:37so
0:02:38and and is based on our previous work
0:02:41we we ah i have a i don't know do you mention to the right model
0:02:45basically uh
0:02:46also uh
0:02:48we propose
0:02:49well well track information
0:02:51another other their discriminative feature you small
0:02:54sort all but there and semantic information you move them more right and we use them when we
0:03:00separate um
0:03:01so these are some prior knowledge can be trained
0:03:04uh
0:03:06i i one
0:03:07so that
0:03:09using um
0:03:10um
0:03:11but and that would be a record two sources
0:03:13so
0:03:14i
0:03:14method that very well but
0:03:17we did a
0:03:18a good was that in order to have some estimate from on the line
0:03:22each one
0:03:23so
0:03:26mean streets and we work and this uh well
0:03:29uh
0:03:29and then lot
0:03:31a P to which has several
0:03:33feature
0:03:34a
0:03:35five
0:03:36i
0:03:37yeah
0:03:37pitch contours and also a them to individual source
0:03:41so many our proposed works just only
0:03:45it that the contour and
0:03:46do want to
0:03:47on the a
0:03:49contour
0:03:49two
0:03:50individual
0:03:52and uh
0:03:53that is it is assumed that a one one of the on the line sort of is always
0:03:58which make it prediction very
0:04:00and also but that use a different from a a lot of papers
0:04:05essentially for a
0:04:07a music signal in which T
0:04:09time
0:04:10frequency continued to use more pronounced
0:04:12such that the could and is very easy
0:04:16or speech
0:04:17and also this these uh uh uh what is it should be
0:04:20we from the
0:04:21a rebel
0:04:23pitch action for single
0:04:25speaker
0:04:26but
0:04:26we have another source
0:04:29uh
0:04:30and and have some sort of the
0:04:32uh
0:04:33but in each or and we need we do need to recover both
0:04:37so
0:04:38but you a high level
0:04:40well
0:04:41the i don't from and and D tracker
0:04:43uh
0:04:45a a for uh stages
0:04:47section grouping separate
0:04:50interpolation
0:04:51but it can be a little what
0:04:53uh oh
0:04:54but was by we need only to resist
0:04:56also
0:04:56some some sort of a
0:04:58ah
0:04:59to making feature
0:05:01P
0:05:01and
0:05:02so a separate
0:05:03i i and
0:05:04in interpolating for a ah
0:05:06we uh one
0:05:10oh uh i think there's
0:05:11stage a a uh a a more pitch detection
0:05:15oh i mean why by the work of a client
0:05:19oh
0:05:19he's group but that propose a distortion measure for in
0:05:25so uh basically
0:05:27uh a as a why uh
0:05:29there is at all
0:05:30it of course so you in but that thing think the white one
0:05:36section
0:05:36we we our goal is to me white these source and B the new signal is the text of the
0:05:43the of the signal
0:05:44and D or its deviation of the spectral densities for on the line sources
0:05:49and they have a i
0:05:50but are aggressive or model
0:05:53two source are and then
0:05:55we need mean one
0:05:56that that in order to record the each
0:05:59a line
0:06:00a time
0:06:01yeah why the same concept
0:06:03i
0:06:04instead of uh you a C I
0:06:07you with a sinusoidal model and which are more suitable for the past well
0:06:11that's
0:06:12so we yeah
0:06:13she thought what
0:06:14you know the new signal
0:06:16yeah
0:06:17uh so one
0:06:18yeah it's of two
0:06:20a a you know the one source a
0:06:23and our goal
0:06:24for
0:06:25detection they
0:06:26to minimize these
0:06:28a distortion
0:06:30so uh
0:06:32for this that for the
0:06:34the uh uh uh a classic paper by mac will like they show that
0:06:37we can
0:06:39a group symmetry uh "'cause" that this that the in terms of sinusoidal modeling using
0:06:45some of some
0:06:46a a sound a does signals
0:06:49or or a a a a thing to of peaks
0:06:51the spectrum
0:06:52and that we we present
0:06:54i a a a you all uh be though
0:06:57and that
0:06:59that the you of L O I E the uh
0:07:02location of P
0:07:04for the presentation or the sinusoidal model
0:07:07but the the peaks
0:07:09don't occur exactly i at in with bit integer all
0:07:13uh
0:07:14fundamental frequency yeah another to out where here
0:07:17and B to a parameter in order to to a exactly match
0:07:23so uh
0:07:24you we have to and i don't for and to uh the location of the
0:07:29and so
0:07:31because we do not have access to the location along the line source P
0:07:35so we apply these
0:07:36approximation which we found the what pretty well in right
0:07:40so
0:07:41and to uh
0:07:44or are bits that separate
0:07:46say
0:07:47i
0:07:47a and then you paris
0:07:49and then be assign peak
0:07:51each data source
0:07:53and then they are very close the
0:07:55sign no ha of the peak to each individual sources
0:07:58and then and
0:07:59oh
0:07:59the
0:08:01only problem to the the to me might you the two pitch a
0:08:05a uh
0:08:06uh
0:08:07points
0:08:07so we we we my station
0:08:09and we got some um
0:08:11estimation for the
0:08:12i but one source for each
0:08:16a are you uh
0:08:17yeah idea of how
0:08:18to uh because that
0:08:20a whole one a for more speak to a the signal
0:08:24ah
0:08:24we have
0:08:25a source here
0:08:26a first one had a week one up to twenty eight
0:08:29the second one
0:08:30nine three
0:08:31i think and he he's there cool are
0:08:34so with the and the more people are integer all
0:08:37pitch frequency
0:08:38are not exactly a query in the more people are integer of the fundamental frequency
0:08:43mean and the white out uh from to you play around with these
0:08:48and the order to get these uh
0:08:51and do the thinking for that in
0:08:54a a a a a a can with these uh
0:08:57uh
0:08:58and and sort them than the signal we
0:09:00uh to minimizing
0:09:03i the second
0:09:03stop
0:09:04it
0:09:04i of the power was also
0:09:06a we detect a peak detection now or
0:09:09grouping
0:09:10that
0:09:11pitch a a a a a a a one
0:09:13so
0:09:13but yeah he is a large a a a a i don't
0:09:16long detection
0:09:17i in two D can "'cause" you don't want to point
0:09:20you want to be the curve to it
0:09:23so
0:09:23well you here uh that's you in the first frame
0:09:27but we search for a uh and these two or more
0:09:31the second row
0:09:32i
0:09:33or and that the reference of any P
0:09:36i
0:09:36one one in in it or what any um
0:09:39pitch and be
0:09:40we group and to get a
0:09:42and re
0:09:44oh
0:09:45a a a a a very
0:09:47to be for another
0:09:48and
0:09:50or not
0:09:51it can not be grouped into used uh
0:09:53um
0:09:54first
0:09:55so no one one another core
0:09:57a very uh uh uh
0:09:59uh uh
0:10:00that that for you try to each candidate
0:10:03and
0:10:04and and i got from five like
0:10:08now the second stage is that
0:10:10separate
0:10:11so uh
0:10:13we or that
0:10:14do the separation be uh
0:10:16she mean not be track
0:10:19and then compared the
0:10:20we you know we will try
0:10:23if the longest track
0:10:24yeah have in these uh
0:10:26to to the a representation
0:10:28i is that we do we do a right
0:10:32and and the longest track
0:10:33smaller than a threshold
0:10:35he
0:10:36sound in to one group
0:10:37you
0:10:38if not then you read a a of them to the sec
0:10:41i we we basically
0:10:43separate the uh
0:10:45individual tracks
0:10:46two source
0:10:49and that the that
0:10:50state
0:10:51yeah
0:10:52because
0:10:52we have to the
0:10:54you know
0:10:55we have a problem my
0:10:57there for the a or some sort of interpolation in order to the record me stand the mean
0:11:03pitch frequencies
0:11:04and some time here
0:11:05you here that
0:11:07that might be you to i'm voice signal power like is like that
0:11:13about that
0:11:14you to a second and the these uh
0:11:17and data
0:11:17using the relation
0:11:19requiring covering
0:11:20max
0:11:21uh uh
0:11:23so
0:11:24oh
0:11:25or are some than those from
0:11:28we
0:11:29tire uh a
0:11:31he's try
0:11:32and here this is another uh nice
0:11:35frequency
0:11:36ah
0:11:36and uh we also have another uh a heuristic parameters that with
0:11:42track or the overlapping they can be
0:11:44don't to want source
0:11:46so you the presence of two source
0:11:48and i a lot of
0:11:50the
0:11:50oh
0:11:51the to make the pitch contour which of the exact to the uh uh uh a reference uh one
0:11:58no so uh that's still
0:12:00i
0:12:01that a E
0:12:02can and detect on the line each one
0:12:05so but yeah
0:12:07but not well um
0:12:08results
0:12:09we are sure
0:12:11i one ninety
0:12:13oh
0:12:15like you know
0:12:16um i
0:12:17a combination of gender
0:12:18me mail
0:12:20in a email
0:12:21maybe a met
0:12:22a we with that the uh
0:12:25are be to interference rate you to zero to eighteen db
0:12:29uh hamming i mean window of black
0:12:31i it is that
0:12:32to bring the of the ten millisecond
0:12:34where live
0:12:36a a a new to segment the signal
0:12:38a reference speech
0:12:39uh uh are a using the uh
0:12:42talking method which is very what was five
0:12:44and accurate
0:12:45and uh the uh the white three
0:12:48a very or uh maybe ross or right
0:12:51and a your mention
0:12:52a previously
0:12:53a a voiced unvoiced or rate
0:12:56and
0:12:56separation error
0:12:57we compare this the with the uh one of the or a the to was back
0:13:02de leon wiring groups that have
0:13:05um
0:13:06um there
0:13:07have a have applied some sort of gammatone few trained with a channel
0:13:11and and another at that or or or or a a a proposed by captain in night
0:13:15for
0:13:16of course
0:13:16a sort of a
0:13:18harmonic suppression or
0:13:20for the so
0:13:21yeah
0:13:22or or or a result uh
0:13:24for error rate versus the target to interference ratio
0:13:28ah
0:13:29have three sets of lot
0:13:31which how one so the result for target
0:13:35and the battle for
0:13:37we are and
0:13:38um
0:13:39we have to lines here a dynamo you go
0:13:42and the uh
0:13:44so that one and um and and uh
0:13:46for one of those and
0:13:48uh the proposed method
0:13:50uh
0:13:50the the that we but that's stand for the uh
0:13:54hmmm
0:13:55and
0:13:55i have some good
0:13:57you know but
0:13:58and it can for all
0:14:00can be nation of mixtures
0:14:02a and five factor
0:14:04a a a from the two other techniques um
0:14:07so if we can see
0:14:08uh
0:14:09and there for the target
0:14:11if we a signal
0:14:12L
0:14:13so
0:14:14a a you is to a a voice as all
0:14:17uh
0:14:19he to incorporate these he met that you can be it and propose anything for
0:14:23the in the unvoiced the only work who worked know so we have to in here
0:14:29um
0:14:30a
0:14:31kinetic
0:14:31see
0:14:32a a a and it's factor uh
0:14:35oh a very well fit to the other that
0:14:37for all combination me male
0:14:40in in a minute
0:14:41and male email
0:14:42um week
0:14:44and the point only in terms of uh
0:14:47separation error
0:14:49uh
0:14:50we we see a you
0:14:51i two method
0:14:53uh
0:14:53how our method
0:14:55very robust against one
0:14:57a a a a a sort of the
0:14:59to like or
0:15:00and uh uh uh you get in separation performance for to uh
0:15:05a method
0:15:07yeah so uh
0:15:09there are a number of issues that should be risk
0:15:12a about this for uh so
0:15:15i have a problem but two pitch contours are are crossing each other how
0:15:19we can assign and two different sorts of than the pitch contours are very close
0:15:24a green
0:15:25i don't know even
0:15:26oh uh uh are the to system can separate them are our uh we are working to improve the performance
0:15:32in by applying some prior knowledge about a speakers
0:15:36i believe we can also apply the spatial diversity another at another clue
0:15:41we can uh yeah i meant to to do small and improve the for one
0:15:45and some prior knowledge about
0:15:47the
0:15:48there uh
0:15:49i been working on a bayesian inference method
0:15:52the performance
0:15:53yeah
0:15:53i would like to time is uh
0:15:56a a called me
0:15:58not
0:15:58and the only one who provided codes
0:16:01i three D for a a a a researcher or so
0:16:04for really how to compare
0:16:07a with a
0:16:08so
0:16:09and that
0:16:09ah
0:16:11right now week
0:16:12the whole of the code it's some demos from a my page
0:16:16i is
0:16:17really
0:16:18and that one so
0:16:19and now finally to you uh
0:16:22a to taking any question or comment about
0:16:32i
0:16:48but you come in a little bit i
0:16:50do you mean in terms of separation
0:16:52a sequential grouping problem of interest
0:16:55i think that that just to they're not actual
0:16:57separation and six that's right means voice
0:16:59yes
0:17:00the uh
0:17:02a pitch you want to do
0:17:05a speaker
0:17:06and and tracker as signs it to the seconds
0:17:09that as the separation
0:17:11yeah i a this classification be two class
0:17:14and i i i i are or correctly that we would also some that's make any attempt
0:17:18to
0:17:18to solve the problem
0:17:19that
0:17:21a i i i i one and that method that it if you need to do a different at contours
0:17:26for
0:17:27to
0:17:28no
0:17:29sure
0:17:30and
0:17:31yeah yeah maybe a
0:17:33and a
0:17:34we then and can the feast five
0:17:37the fact that you're are getting two contours from two speakers i got it at home
0:17:42it sounds
0:17:43okay
0:17:47so something about two you of to look at it that you like to
0:17:50translating to do most mcclay speak
0:17:53a charter
0:17:53to to look at the model look at that you
0:17:56use
0:17:57was that translates to like C
0:17:59and is that being online have uh
0:18:04just it something i duration of tracks so you you okay yeah i i i i
0:18:09uh marking cooking of S
0:18:11a i think it duration of sentences are about seven
0:18:15to two sec
0:18:17two to sec
0:18:22yeah
0:18:23oh
0:18:26i
0:18:30i
0:18:31i
0:18:37but the method automatically a as the and voices
0:18:41so when you have these uh
0:18:46five
0:18:48the
0:18:49so uh
0:18:50yeah as and what
0:18:52so we
0:18:54reference
0:18:55a a a a a i i the uh
0:18:58and with anything specifically to recognise and work
0:19:02a fact that you don't have a
0:19:04a a to here
0:19:05uh
0:19:06yeah i that we don't have any
0:19:08yeah
0:19:09which
0:19:11i think yeah
0:19:13i thank you very much your