0:00:13a
0:00:21okay good the uh would have to on it everyone
0:00:24i i uh or main come from the a part that can i will present you my work
0:00:29a lot scoring
0:00:30or you know
0:00:31operation
0:00:32which of them we one but no and up on that
0:00:37so uh but buddy
0:00:39basically this plot
0:00:41uh deals we monaural source of separation
0:00:44in uh music as uh spectrogram
0:00:46so we just try to super rights
0:00:49the the signal of
0:00:50inch
0:00:51each uh instrument in uh the mixture
0:00:54but we don't do it's blindly we try to add some extra information
0:01:01which is extracted from the score of the feast
0:01:04and this
0:01:05uh in an extra information is used to get the separation process
0:01:10and
0:01:11yeah the score is a me file which is a line
0:01:15on which is you to be at a line is an yeah that and we do not deal
0:01:19we uh
0:01:20alignment matters so there is a lot of
0:01:23it our job of this and
0:01:25uh will walking system should uh in Q and a preprocessing pre-processing step
0:01:31uh that
0:01:32uh
0:01:33we do is that months between
0:01:35the score and the signal
0:01:38um and more a we only deal with a harmonic instruments
0:01:41so
0:01:42uh we X to them uh modeling up
0:01:45uh back to see instruments in this model
0:01:49so um basically or system is based on a parametric spectrogram model which is derived right
0:01:56from non-negative matrix factorisation
0:01:58and we use
0:02:00these uh parametric metric spectrogram the to decompose the mixture
0:02:03a spectrogram
0:02:05and
0:02:06uh it consists in uh
0:02:08but a tree time-frequency mask which are computed for
0:02:12uh each instruments
0:02:14and which are initialised that
0:02:16and constraints that we as the information in the score and then finally estimated in
0:02:22a very us as a similar way as landing the
0:02:26a that as a uh and then F
0:02:28uh and and so it's
0:02:29to me to feed in the mixture your spectrogram
0:02:33and is then uh there's are metric mask
0:02:36are used to support to
0:02:38this in out of
0:02:39instruments uh a using a no filtering
0:02:43so uh before for uh beginning uh the torque
0:02:47uh i will the try to as well as the question why use a scroll me uh because maybe some
0:02:53people uh will argues that's
0:02:55it is cheating because actually
0:02:57we the signal you don't at the score
0:03:00but actually
0:03:01there is a
0:03:03that's a base
0:03:04of
0:03:05uh me files on Z and ten it and basically you can find
0:03:08almost
0:03:09um and me a you wants a about
0:03:13i ni
0:03:14uh not any but of a lot
0:03:17of uh
0:03:18of of uh a is of music on the net
0:03:22and it's it's a very compact
0:03:24uh description of so you
0:03:26so if you all uh it's much more compact that's the audio it set so if you can store somewhere
0:03:33uh the audio you will be able to store also it is very little extra information
0:03:38and more of a i i would say that's in some cases
0:03:42blind the separation
0:03:44or remains very difficult S and sometimes
0:03:47or place because if you want to separate
0:03:49uh for example
0:03:51uh two voices is of
0:03:53the same instruments
0:03:54you won't be able to uh do we blindly
0:03:59so you're is the outline of my at
0:04:02so first i will remain you the basic principle of
0:04:05and then they get to matrix factorisation
0:04:07and then a a uh i will present use apartment tick spectrogram the L that we do right from
0:04:13uh and then F
0:04:15and a last i we present you or uh wall score informed source separation system
0:04:21and i will present you some results
0:04:23off
0:04:24this system
0:04:25so first let's talk a bit about
0:04:27and the map
0:04:28so and M F is a very powerful row a wrong prediction a C is um
0:04:34are the reason
0:04:35uh as that's a lows
0:04:36to extract
0:04:38read and don't patterns in a negative that a
0:04:41so yeah uh or a nonnegative that that is
0:04:45the um
0:04:47uh
0:04:48the
0:04:50amplitude spectrogram amplitude of or or spectrogram V
0:04:53here it can see
0:04:55on uh a first mode which is play alone and then
0:04:59a second one and then the both play to get a
0:05:02and so if you try to decompose the spectrogram
0:05:05uh we've and non-negative matrix factorisation
0:05:09you when you get to a matrices which is
0:05:12the first one is the at so matrices and you we extract
0:05:17that's some plate of one note
0:05:19so is don't eight which is very read and in this
0:05:22uh uh that yeah
0:05:23and the don't plate of your are nodes and
0:05:27the other my trees yeah as a the matrix
0:05:29H
0:05:30contains the information formation the temporal information so where but it is the notes are
0:05:36and there is um negativity constraints on both
0:05:40this matrices
0:05:42and there is a second constraints
0:05:44uh uh which is uh
0:05:46the rank of the product should be uh
0:05:50uh a lower is and
0:05:51so wrong got the original that data
0:05:56so
0:05:57and F is very powerful to extract them on buttons from the data such as a nodes as a a
0:06:03i are shown
0:06:05and the fundamental probability
0:06:07uh R
0:06:07this technique is the non negativity constraint
0:06:10which was shown to uh provide
0:06:13uh very best it fit up it chill description of
0:06:17the data
0:06:18so we will use this
0:06:20a a non negativity constraints in a or problem a tree yes make the role model
0:06:25uh to a this plastic job uh D mentioned in uh our algorithm
0:06:30so there are some limitation
0:06:32uh we've
0:06:33and then F
0:06:34the first one is
0:06:36that
0:06:36it does not mean
0:06:38to deal uh
0:06:39efficiently we've
0:06:41time-frequency evaluation uh
0:06:44a a as an example when you are speech by edition over time it's very difficult to
0:06:49um of that it's actually right she with and F and so you cannot
0:06:53efficiently model uh
0:06:55phenomena as
0:06:56uh T though
0:06:58and
0:06:59as a on problem for
0:07:01our our approach is uh is that
0:07:04to do some
0:07:05um
0:07:07a score in phone uh
0:07:09a source separation
0:07:12we need a a representation which is more to the
0:07:15uh a the parameters
0:07:16of of
0:07:17interest which are yeah the fundamental frequency of
0:07:21the note
0:07:24so we decided to um
0:07:27develop a a parametric spectrogram more then
0:07:30so
0:07:31our parametric spectrogram of that is based on a pretty use one that we presented
0:07:36in this paper
0:07:38and which is a parameter E spectrogram of that for only one instruments also so for single instrument
0:07:45and
0:07:46to two but
0:07:46this model we just as
0:07:48why does and that's on uh look like and musical spectrogram when you are when most of the amounts
0:07:54or uh
0:07:55instruments notes
0:07:57and
0:07:57well i uh you don't have
0:07:59uh to to back "'cause" use the
0:08:02don't
0:08:03so
0:08:04most of
0:08:05this elements are are money H
0:08:08so
0:08:09the part and at you
0:08:10the buttons that you we dig extract eve and then F will be also so a money so we decided
0:08:16just to put
0:08:17ah
0:08:18a one each atoms directly in the negative much factorisation
0:08:22and
0:08:24a to to meant to rise
0:08:25then
0:08:26to uh i have an excess too
0:08:28uh the parameters of ins there are i of interest which are the for them at that frequency of
0:08:34that um
0:08:35and
0:08:36as a global about uh
0:08:38but uh all block
0:08:41of uh that too
0:08:45so
0:08:45well made or that is a parametric model of spectrogram we've
0:08:49sends at each harmonic atoms and we
0:08:52does
0:08:54to
0:08:55the question is uh of this down on and then at
0:08:58and i i
0:08:59in a
0:09:01the uh at my trees
0:09:03um
0:09:05uh dependency dependent C with respect to the parameter a
0:09:08yeah which is a zero and which would be is the fundamental frequency
0:09:12all that "'em"
0:09:13and we also a and i a and then C
0:09:16uh
0:09:17we respect to
0:09:19um time so basically
0:09:21uh in or model
0:09:23i
0:09:24jam vary over time and
0:09:26is
0:09:27the makes it possible to model L
0:09:29uh something i mean a as
0:09:31a vibrato
0:09:33so here we is a or a very simple that's that's models so basically we
0:09:37a sent size
0:09:38or
0:09:39uh i am
0:09:40by
0:09:41taking the for you transform of the analysis window we use to compute the spectrogram
0:09:46and we just sound it's on the phone them at that frequency it and on the frequency of
0:09:51the difference
0:09:52a money
0:09:53and we just multiplied this
0:09:55uh forty eight to uh this window
0:09:58uh we've
0:09:59and not P to the of
0:10:01the all money so we get
0:10:04these are meant you get some uh there
0:10:07and
0:10:08we thus yet
0:10:10these parametric metric spectrogram for uh single instrument we've
0:10:15uh the parameter
0:10:16uh uh K which is
0:10:19the amplitude of each harmonic
0:10:21yeah you you of the fundamental frequency
0:10:23or
0:10:24each at two on
0:10:26for each time so this fundamental frequency can vary of of time and here
0:10:30you're are uh the activation which is very similar to the activation
0:10:34in in in that
0:10:35and tell you
0:10:36uh where
0:10:38yeah if a notes is active or a a
0:10:42so uh when you try to
0:10:44estimate to make these
0:10:45a a bomb us in red
0:10:47uh
0:10:48so we use the um
0:10:50bit that that since cost function
0:10:53but
0:10:54in fortunately
0:10:56uh this cost function out uh
0:10:58as a lot of
0:10:59local minima we respect to the for them at that frequency
0:11:03uh of course you a local minima
0:11:06i um
0:11:08at the position of the right from "'em" of that frequency but also at the up to have
0:11:12and double up to uh and feast
0:11:15if
0:11:15the notes are very similar so
0:11:18we channel do a global optimization we were like to the can of that frequency
0:11:22so we decided to
0:11:24introduce one at so on
0:11:25for each meeting so for each not
0:11:29off
0:11:29the from i to scale
0:11:31and then the optimization
0:11:33is down
0:11:34uh a locally so we will are
0:11:37uh a fine estimate
0:11:39of so from the most at frequency for uh
0:11:42uh i
0:11:42each atom and each time
0:11:47so here is uh an example of suit the composition that we can get
0:11:52we've uh all model so you're is the spectral
0:11:55of the first bass
0:11:57of uh
0:11:59uh the bar
0:12:00uh first braided which is played by a synthesiser
0:12:04and if we try to decompose
0:12:05it we've all or a i'm we will get this profile of activation so yeah
0:12:10uh i just prison the activation
0:12:13so
0:12:13you don't out
0:12:15a a complete you'd of the when it's and the fundamental frequency estimates
0:12:20but as you can see
0:12:22uh we can recognise uh the
0:12:25note
0:12:26in red yeah
0:12:27which play the
0:12:28uh the first braided by uh uh
0:12:32but
0:12:33so
0:12:33as you can see there are some problems are on you know which is uh
0:12:38uh
0:12:41which are leading to similar to the as
0:12:44uh we've uh or stuff and twelve send double up that but it's not really a problem
0:12:49as we will see later in uh
0:12:52in um
0:12:53you know system of source separation
0:12:56so
0:12:56as you can see yeah
0:12:58uh you have a
0:12:59a a a value of activation for
0:13:02each meeting notes so basic V these
0:13:05looks like
0:13:06a channel role and it would be very important in know
0:13:09you know system
0:13:10that it
0:13:11these representation is
0:13:13linked
0:13:14to uh what's you can get we've me
0:13:18so
0:13:19no we have a
0:13:20a spectrogram model for for a single instruments sets
0:13:23was a present in yeah
0:13:25is that i just present it you
0:13:27and
0:13:28we need a "'em" each model because
0:13:30we want to separate
0:13:31instruments so we i'll
0:13:33a mixture juror with several instruments so the mixture model is very easy
0:13:38uh you just a
0:13:40the single
0:13:41instrument for that for each instrument
0:13:43and
0:13:44some then
0:13:45that's right yeah and you get
0:13:48the mixture your spectral model
0:13:50and
0:13:50we we'll have to
0:13:53estimate
0:13:55for each instrument so for each source K
0:13:58the fundamental frequency at
0:14:00if for each atom at each time
0:14:03the amplitudes
0:14:04or or money for each source and
0:14:07uh they're profile of activation for each source
0:14:12and the they competition is a we've a but if you get evolve vulgar reason
0:14:16uh which uh N sets
0:14:19minimizing a a bit at the since
0:14:21between
0:14:22uh our original uh mixture spectrogram and uh
0:14:26or power meter each steal your spectrogram
0:14:29and this i varies them is very similar to
0:14:32uh as the one you have
0:14:33um
0:14:35for in the math
0:14:36we've met to to get to got that for
0:14:39so
0:14:41no we have a model for the mixture spectrogram and i will show you all we can use it
0:14:47to um
0:14:49and to do some score informed source separation
0:14:52so we have
0:14:53all mixture spectrogram and
0:14:56or or our score
0:14:58and from the score or uh you chan
0:15:01you can know where the nodes all
0:15:03for each instrument so you can very easy
0:15:06um build
0:15:07a the channel role
0:15:09binary general wall each uh uh just tell you
0:15:13where the nodes
0:15:14are
0:15:16and as they say that
0:15:17uh the activation matrix
0:15:19um of
0:15:21each
0:15:22uh instruments is very linked to this general and we will just
0:15:27use
0:15:28this general the days binary
0:15:30a have channel roles
0:15:32as in each sterilisation for or activation in know model
0:15:37but as we use
0:15:38um but if you could you at the true
0:15:42if you put a a zero i was zero somewhere
0:15:45uh in the activation uh
0:15:48uh
0:15:49matt tree
0:15:50as N it will uh it will uh uh
0:15:54it will remain zero all along the iteration
0:15:57so it's a very ah constraints
0:16:01so
0:16:02once we
0:16:03um
0:16:05uh in each yeah eyes
0:16:06our or a parametric spectrogram we get some very coarse parametric expect rounds which are represented here
0:16:13and then we use
0:16:15uh our our goal re
0:16:17oh our algorithm
0:16:19oh the mixture spectrogram
0:16:21to finally
0:16:22uh at that
0:16:23uh this
0:16:24this parametric spectral run
0:16:26to the meat sure six spectrogram and basic iffy
0:16:29the song of
0:16:31is
0:16:32three spectral spectrograms
0:16:33should be uh
0:16:35very similar to the mixture spectrogram that we are
0:16:39and
0:16:40so we get
0:16:42this a time-frequency mask
0:16:44and we can separate
0:16:46as a tracks
0:16:47a using a a of filtering
0:16:50so if is
0:16:51uh on example
0:16:52so it's based and
0:16:54sense it that's a
0:16:55uh because it is the the ground is that
0:16:59um
0:17:01is this the the
0:17:02the the signal is perfectly aligned we've
0:17:05so me defined is that we L
0:17:07so here is a mixture signal
0:17:13i
0:17:14a
0:17:16so you
0:17:16three instruments and using or not a reason a score
0:17:21we get
0:17:23oh
0:17:24i
0:17:26oh is uh
0:17:27the bass
0:17:28the base
0:17:29i
0:17:30it is a two
0:17:32uh_huh
0:17:34hmmm
0:17:35you
0:17:36and he has that's in the again
0:17:38you can
0:17:39yeah the race
0:17:40also i mean it's i
0:17:41i don't care about
0:17:42oh yeah yeah
0:17:44a more than for that which is
0:17:46uh uh which is like noise and we only only of a model for
0:17:50the harmonic many parts
0:17:51and finally
0:17:52the very night
0:17:55i
0:18:02so your it is uh we can probably
0:18:05our algorithm the reason we've uh
0:18:08another one reason which is based on probability to uh
0:18:12probabilistic that some component than a disease
0:18:15uh and which is
0:18:17a somewhat different because you need to send the size a midi tracks
0:18:21first so you need
0:18:23um
0:18:24you need to send to use it and you need to know
0:18:28the instruments of
0:18:30each uh of each trucks
0:18:33so basically we used to it as a uh that that's sets which consists in the same uh um files
0:18:39which are
0:18:40uh plastic classical
0:18:41hmmm it's
0:18:42use each
0:18:43but uh we sent a size uh each we've differ on sound bounce so it's
0:18:50uh
0:18:52so our signals are sent aside from the media because uh we need it to be perfectly online to
0:18:59they would you and
0:19:01yeah uh out the rays in the that we obtain
0:19:04so in red is
0:19:06uh something which is very similar to on all right L
0:19:09so it's like and uh probably me
0:19:12uh for
0:19:13um
0:19:14if you now by based them
0:19:16suppression of them
0:19:18and
0:19:19as you can see
0:19:20our uh algorithm
0:19:22but from quite
0:19:23where am and better on this
0:19:26first the that that's set
0:19:27then the P L C based of a reason
0:19:29and it's
0:19:31are the most
0:19:32um
0:19:33on most the it performs almost
0:19:35saying saying
0:19:36in uh
0:19:37we've a the signal
0:19:39some long
0:19:40and
0:19:41the main difference
0:19:43in this
0:19:44seconds on monk i think it's
0:19:46uh it contains a lot
0:19:47more all uh uh a a lot of uh review of iteration and
0:19:51or or or a reason is very sensitive to that
0:19:55so
0:19:56to conclude we presented and a very efficient you a
0:19:59for uh uh score in informed source separation
0:20:02which is based on the parametric model
0:20:05uh which makes it possible uh
0:20:08to access directly um the nodes and
0:20:11so to uh on the fine he's the sound
0:20:16uh i think that no we should focus
0:20:19on
0:20:20there of instruments bit and
0:20:22all
0:20:23or
0:20:23of uh
0:20:25oh of the on which are not a monique actually because
0:20:28a a or it
0:20:29that is only designed to deal with
0:20:32uh a money sounds
0:20:34and maybe uh included
0:20:36um more complex um
0:20:39uh model they'll for the team or of
0:20:42the elements
0:20:43and
0:20:44also um
0:20:47a a in order to make
0:20:48the more than
0:20:49more robust
0:20:51to uh
0:20:52the real all uh
0:20:54re signal actually
0:20:56and
0:20:57maybe we can also tried to use extra information as
0:21:01uh as the P S C A based on call reason
0:21:04no ween
0:21:05the chamber of
0:21:07the
0:21:08um
0:21:09of the instrument and using supervised learning of
0:21:12that some plates
0:21:13so thank you for uh you
0:21:15attention
0:21:31i
0:21:32and did you make a is so you need compare compare in the gives you is the algorithm for this
0:21:37source separation and you
0:21:39and the you don't from that this we're and uh net may question is what does the the the i'm
0:21:44take
0:21:44model ring
0:21:46with respect to the fact that you put zeros in H
0:21:49the activation matrix
0:21:51so have you tried running the algorithm just like putting zeros in H non in the dictionary and and H
0:21:57and compare it with you know
0:21:58but on a tree can uh
0:22:00model
0:22:01uh i i
0:22:03i don't know this to you crush so that the to that signal to go
0:22:06is the the bottom think there yeah and putting zeros in H
0:22:10yeah and it was a a nice edition which is a constraint to yeah
0:22:15so it's to
0:22:16can uh since and zero not being yeah
0:22:19so you sure you there isn't them longer than we the to building but
0:22:23uh a and then would like to know if you tried uh only by putting a the zero in age
0:22:28and what it to
0:22:30oh using basically is a very coarse estimate of some mask yeah
0:22:34yeah you the it's what you mean okay but it is the interest of
0:22:39apart from being a fine estimation
0:22:41easy job creation yes
0:22:43okay and in so a result
0:22:45actually if
0:22:46you
0:22:47uh look
0:22:49closely uh the spectrogram there are some
0:22:52a evaluation for clarinet and for the base
0:22:57and so if
0:22:58you don't
0:22:59uh
0:23:00you don't to take
0:23:02this
0:23:03valuations into account
0:23:05you read
0:23:06a a very bad as the person results
0:23:09and moreover or maybe you can out some problem of
0:23:12tuning tuning
0:23:13and
0:23:14a the fundamental frequency
0:23:16even if it
0:23:17not moving of a of time
0:23:19uh can be
0:23:20slightly different on from
0:23:22the equal to a month so if you're a yeah this studies
0:23:26i shall i Z to the you but some parameter
0:23:28we
0:23:29a provide
0:23:30bad suppression rise as i think
0:23:33to you
0:23:36anything else
0:23:39use also a quite it's so thank you