0:00:21 | a good off no everyone my name's you huh |
---|
0:00:24 | some some from the interactive you what you live of for most mattering your numerous D |
---|
0:00:28 | uh and to gonna present not each traction |
---|
0:00:31 | a using in probably a probabilistic latent component analysis |
---|
0:00:36 | so is a joint work we chin which and uh from the media technology like of a grace not |
---|
0:00:41 | so |
---|
0:00:42 | a sorry |
---|
0:00:44 | so this it |
---|
0:00:46 | a and a followed walks to for us i'm gonna give a brief introduction of a system |
---|
0:00:50 | and in i'm gonna and introduced to model we use the pharmacist system |
---|
0:00:55 | and uh i describe our a system and a present the experiment result |
---|
0:01:00 | and uh in the and i conclude that would talk |
---|
0:01:03 | uh so in this but you could uh paper we treat the singing voice as a matter of the of |
---|
0:01:08 | the song |
---|
0:01:09 | so it's not always to |
---|
0:01:11 | but we but if is true in a lot of a cases |
---|
0:01:14 | so |
---|
0:01:15 | this that ensure it this is a a a a a battery a brief intro we you in the real |
---|
0:01:19 | from system |
---|
0:01:20 | so |
---|
0:01:21 | we have a a all audio signal |
---|
0:01:23 | and all the fit to can it the for at the segment at in two |
---|
0:01:26 | non vocal segment and a walk segment of a singing voice detection |
---|
0:01:30 | what do you know system |
---|
0:01:31 | so the local step segment of the audio signal is just the |
---|
0:01:35 | uh the segment so contains only to accompaniment of the cell |
---|
0:01:39 | and the work segment it just segment are a which can of local the thing voice as a as the |
---|
0:01:45 | accompaniment of the so |
---|
0:01:46 | i then in the next that |
---|
0:01:48 | uh we train |
---|
0:01:49 | we train a company model from the non vocal segment |
---|
0:01:52 | using |
---|
0:01:54 | a P L C A which i'm gonna discuss later in this uh talk |
---|
0:01:57 | and then uh after we train a company model a common model will be applied |
---|
0:02:02 | to the local segment |
---|
0:02:03 | to extract |
---|
0:02:04 | the singing ways of the song |
---|
0:02:06 | and it doing in the and a P check make out with them is applied to the |
---|
0:02:10 | tract is one |
---|
0:02:11 | ways due to attract the that is of the |
---|
0:02:14 | of the cell |
---|
0:02:16 | so |
---|
0:02:18 | uh i'm gonna to introduce uh i one model use in our system |
---|
0:02:24 | so |
---|
0:02:25 | to mixing in here |
---|
0:02:26 | we start from a single slice of the spectrogram |
---|
0:02:29 | we put a single size thousand |
---|
0:02:31 | that's spectrogram gram a uh on the right of the figure |
---|
0:02:34 | and the which treat |
---|
0:02:35 | these |
---|
0:02:36 | spectrum |
---|
0:02:37 | and the histogram |
---|
0:02:39 | so this is his ground to be generated by some kind of a probability but |
---|
0:02:43 | and uh it's just a um |
---|
0:02:45 | a a multinomial distribution |
---|
0:02:47 | so |
---|
0:02:48 | we look at different so a spectrogram at a different at for this to uh |
---|
0:02:54 | of the signal all we can see that |
---|
0:02:56 | different at different have a it will have a diff |
---|
0:02:58 | different spectrum vector |
---|
0:03:00 | and uh |
---|
0:03:01 | so we we could possibly just the use a multinomial distribution for every single |
---|
0:03:08 | uh a that rome |
---|
0:03:10 | of the signal but sure is gonna |
---|
0:03:12 | we in this case we when you don't out of a |
---|
0:03:15 | uh a component |
---|
0:03:16 | it's gonna be a lot dictionary so is that we just use a dictionary of for like a one hundred |
---|
0:03:23 | a spectral vector at so you each |
---|
0:03:25 | so back from of the style |
---|
0:03:27 | we be uh the your component uh |
---|
0:03:30 | in yeah combination of the spectral vector |
---|
0:03:33 | so |
---|
0:03:34 | in this case we you we should have a model which is the probability latent component analysis |
---|
0:03:38 | so we have a a dictionary of a spectral vector |
---|
0:03:43 | uh which we need to to learn from the of the re showing we G the spectrogram of the song |
---|
0:03:49 | and uh |
---|
0:03:50 | it's called and the component analysis because these parameters to are late and |
---|
0:03:55 | compose uh and they and of arrivals |
---|
0:03:57 | and uh |
---|
0:03:58 | in this case so this is the mixture weight and this is the spectra |
---|
0:04:02 | oh a vector at and uh |
---|
0:04:05 | we a model these |
---|
0:04:06 | spectrum user in your comp |
---|
0:04:08 | mean your combination of the spectrum vectors |
---|
0:04:11 | and uh this to parameterize of our model can be estimated by |
---|
0:04:15 | expectation maximum |
---|
0:04:17 | algorithms resumes are will not going to the detail of the estimation you can refer to our paper for the |
---|
0:04:22 | detail |
---|
0:04:22 | so now we we should have a more though that's that we are going |
---|
0:04:26 | uh do the singing voice extraction |
---|
0:04:28 | so you to the |
---|
0:04:30 | uh at constraint |
---|
0:04:31 | i'm gonna just of focus on this part |
---|
0:04:34 | of our paper so |
---|
0:04:38 | so |
---|
0:04:39 | this |
---|
0:04:40 | is uh image each uh this it the audio signal and the we how |
---|
0:04:44 | uh us you wise detection and i was um |
---|
0:04:47 | to a segment of signal into different card |
---|
0:04:50 | which is still not work or part of it the |
---|
0:04:52 | segment and is a walk or part of a segment |
---|
0:04:54 | so this is just a a you dust reaching example in in the real case |
---|
0:04:57 | i to gonna be much company it is not that's a |
---|
0:05:00 | so non vocal segment we'll real just in the beginning and the vocal part it in the and the so |
---|
0:05:04 | is not that case i just a want to |
---|
0:05:06 | uh give us an |
---|
0:05:07 | a point you know situation shape so |
---|
0:05:09 | after we at then you find it's a non vocal segment are we use a P L A P L |
---|
0:05:13 | C a model to training uh |
---|
0:05:15 | a spectral back to a dictionary of spectral or for the accompaniment |
---|
0:05:19 | so now we have a dictionary each week sprang only use the accompaniment |
---|
0:05:23 | and then the next step |
---|
0:05:25 | so for the local segment |
---|
0:05:27 | we do with the P L C you training as you were |
---|
0:05:29 | but should |
---|
0:05:30 | we fix some of of the uh component |
---|
0:05:33 | two is already pre |
---|
0:05:35 | ah |
---|
0:05:36 | spectrum back for the non local segment |
---|
0:05:38 | and the we have some uh free component to we explain |
---|
0:05:42 | the uh i don't know close |
---|
0:05:44 | still still a part of the singing wise |
---|
0:05:46 | so in the end that we will have a two different group of a dictionary |
---|
0:05:50 | the this group is the pre-trained |
---|
0:05:52 | fig |
---|
0:05:54 | uh a component for the non local power |
---|
0:05:56 | and of this part of real is the new train from the local segments really like spring |
---|
0:06:01 | most of the the |
---|
0:06:03 | uh the scene voice of the south |
---|
0:06:04 | so |
---|
0:06:05 | as the and we just a re |
---|
0:06:07 | reconstruct out signal |
---|
0:06:09 | a separate the T we use uh |
---|
0:06:12 | new new don't a component for the work all |
---|
0:06:14 | to extract a to to reconstruct so |
---|
0:06:17 | uh |
---|
0:06:18 | the singing voice and the we use the |
---|
0:06:20 | fixed now what corpora uh that the non components two |
---|
0:06:25 | to to extract the accompaniment to in the local that |
---|
0:06:28 | so |
---|
0:06:29 | in the and at a simple uh |
---|
0:06:32 | pitch estimation algorithm will be applied |
---|
0:06:34 | to this extracted a seen ways to extract this the P of |
---|
0:06:38 | of the song |
---|
0:06:40 | uh |
---|
0:06:40 | so |
---|
0:06:43 | i not to mention that so |
---|
0:06:45 | oh very is similar |
---|
0:06:47 | uh system is proposed in this paper |
---|
0:06:50 | a but as margaret |
---|
0:06:52 | some at this uh so |
---|
0:06:54 | the different speech i was system them and uh he's system is that our system a |
---|
0:06:58 | has a uh |
---|
0:06:59 | a pre-trained trained us so |
---|
0:07:01 | a sting was detection what do so we so is totally automatic |
---|
0:07:06 | and that in you his paper so |
---|
0:07:09 | they just a manually detected the singing voice and at the |
---|
0:07:13 | uh so so now pad and the training data manual |
---|
0:07:16 | uh |
---|
0:07:17 | so we also you value of our a system uh like a a the experiment without |
---|
0:07:21 | and this is just a simple ink example point this paper |
---|
0:07:24 | so |
---|
0:07:25 | yeah is some uh just reach an example of of a system |
---|
0:07:29 | so this on on the top of the graph he's uh |
---|
0:07:33 | uh |
---|
0:07:34 | it's this on us it's a simple matter |
---|
0:07:36 | and uh the second figure it's a you track is singing voice of i was is um |
---|
0:07:41 | a to uh applied to the to the polyphonic music |
---|
0:07:44 | and that the sort of a you got a is it i |
---|
0:07:46 | is a original separated like a scene wise |
---|
0:07:50 | and and uh last the you tracked leader |
---|
0:07:53 | estimation from |
---|
0:07:55 | how we track the singing voice |
---|
0:07:57 | and uh |
---|
0:07:58 | are gonna |
---|
0:07:59 | listen to the example |
---|
0:08:23 | but |
---|
0:08:28 | a a gonna |
---|
0:08:30 | um |
---|
0:08:32 | and |
---|
0:08:33 | and |
---|
0:08:36 | i |
---|
0:08:38 | sure |
---|
0:08:44 | just can easily of a the system is an of the stuff for this is just about to |
---|
0:08:48 | compress a company might show that uh |
---|
0:08:52 | keep the same voice |
---|
0:08:53 | that for |
---|
0:08:55 | nonetheless |
---|
0:08:58 | well |
---|
0:08:59 | so this are it only of it is much better |
---|
0:09:03 | when uh |
---|
0:09:08 | so |
---|
0:09:10 | after after we extract it |
---|
0:09:11 | the thing you voice we just apply a wire simple |
---|
0:09:15 | autocorrelation correlation based |
---|
0:09:17 | pitch estimation algorithm to this map |
---|
0:09:19 | oh we can still here as a accompaniment |
---|
0:09:22 | in the uh and them uh in this track is singing voice |
---|
0:09:25 | but uh |
---|
0:09:27 | oh auto correlation would extract the lack a in but you gonna get example way that you track at eighty |
---|
0:09:31 | percent of the correct each for the same way |
---|
0:09:35 | so we do some uh |
---|
0:09:37 | comparison to other two system |
---|
0:09:39 | the first system them he is uh a multi estimation system at you about developed the you our live |
---|
0:09:45 | so it somewhat pitch estimation system so we treat stuff us the peach estimate in each frame as that |
---|
0:09:52 | i them out of the of the star |
---|
0:09:54 | and the second system is a a a singing voice extraction system |
---|
0:09:57 | so |
---|
0:09:58 | that the sort that is a result of our system |
---|
0:10:01 | so as we can see our system has a bad or read for F measure an accuracy |
---|
0:10:06 | a compared to the at the system and uh i has a comparable |
---|
0:10:10 | uh |
---|
0:10:10 | procedure jane |
---|
0:10:11 | uh compared to do bass the system |
---|
0:10:14 | in precision uh you evaluation |
---|
0:10:16 | and the second |
---|
0:10:18 | one has a relatively low performance |
---|
0:10:21 | uh we believe that |
---|
0:10:23 | so |
---|
0:10:24 | why in the thing voice see trashing uh i was um but should we only use a |
---|
0:10:29 | uh uh we only use the predominant pitch estimation at result with been in this work and uh |
---|
0:10:35 | so is |
---|
0:10:36 | kind of a a the track a P H |
---|
0:10:39 | uh |
---|
0:10:40 | is not this |
---|
0:10:41 | is not the singing voice the P each patch is that the other accompaniment instrument a peach |
---|
0:10:46 | so maybe it's that's that treat uh the other company min at the pre don't peach |
---|
0:10:51 | so we P D a tuning of the parrot or at of for the second a system would increase the |
---|
0:10:56 | performance without |
---|
0:10:57 | uh |
---|
0:10:58 | so |
---|
0:11:00 | we conclude our paper |
---|
0:11:02 | here uh and uh |
---|
0:11:03 | first the of the probability late and were rival model is uh |
---|
0:11:08 | intra use the to company meant and the lead singing ways that that you would be you have system |
---|
0:11:13 | and uh the experimental results show that |
---|
0:11:16 | the at the of the thing ways |
---|
0:11:18 | could be |
---|
0:11:18 | six F extracted in |
---|
0:11:20 | uh in that eight that the be used |
---|
0:11:22 | uh a a paper |
---|
0:11:24 | and and they are of with do some future |
---|
0:11:27 | uh directions so for us to |
---|
0:11:30 | so the work on a local singing voice detection |
---|
0:11:33 | i was um is uh a based on causing extra model |
---|
0:11:37 | it's so we we data of land and the opposite have some improvement uh space so we want to |
---|
0:11:45 | uh don't future uh you research on this but also we want a better pitch mission algorithm |
---|
0:11:51 | uh for the thing was uh |
---|
0:11:53 | what the scene was detection uh |
---|
0:11:55 | what do so |
---|
0:11:57 | this can our paper |
---|
0:11:59 | as as we still have have a so i is this work uh while i was doing an internship should |
---|
0:12:04 | meeting region |
---|
0:12:05 | so in greece note company so |
---|
0:12:07 | i i want to sec my colleagues |
---|
0:12:09 | in grace notes for |
---|
0:12:11 | the could use for discussions and uh we also want to thank uh |
---|
0:12:15 | review or of of a paper to help improve a paper |
---|
0:12:19 | and i want to send so my to wise and them according signals were in to help me to improve |
---|
0:12:24 | the |
---|
0:12:24 | presentation |
---|
0:12:26 | just |
---|
0:12:26 | thank you |
---|
0:12:39 | yes a of my name is gail uh from data compare stick |
---|
0:12:42 | and i one question concerning the segmentation you have not talk too much about segmentation but i guess that |
---|
0:12:48 | if you are |
---|
0:12:49 | you you have a on the segmentation you would probably have |
---|
0:12:52 | less good the um the separation of back from middle these two can you give us |
---|
0:12:57 | some hints of a whole well or from the segmentation sure sure |
---|
0:13:01 | so |
---|
0:13:02 | uh |
---|
0:13:03 | the segmentation is based on cost a mixture model also we trained the cost a mixture model |
---|
0:13:08 | um like a |
---|
0:13:10 | fifty |
---|
0:13:12 | pre label the manual label the uh sounds like a commercial music is include the as a pop music rock |
---|
0:13:18 | music |
---|
0:13:19 | and we just a pre-trained as this |
---|
0:13:20 | uh model wines and the the four |
---|
0:13:23 | uh for the new racks |
---|
0:13:25 | uh a so you ways you in uh so |
---|
0:13:28 | a data as that |
---|
0:13:29 | and uh |
---|
0:13:30 | so uh we're a accuracy of for the scene what detection more do is a run seventy person |
---|
0:13:36 | and the you mention that you've uh yeah our system that depends on the performance of the singing voice |
---|
0:13:41 | detection module so you've the thing was to talks to more that than work |
---|
0:13:45 | it will feel because T to real treat some a local segment of uh at of non vocal segment and |
---|
0:13:51 | uh |
---|
0:13:51 | the diction we train from this part of we also you for and uh seem ways |
---|
0:13:56 | so in this part of it's uh |
---|
0:13:58 | a system a |
---|
0:13:59 | but no book where where L so that's i want to research in this prior |
---|
0:14:03 | i in the future |
---|
0:14:07 | i a question for you i was of your or you example was wonderful but |
---|
0:14:11 | it's some like them the major draw the fact i good the symbols |
---|
0:14:14 | coming through the both okay i a more that one was were popping out |
---|
0:14:18 | uh |
---|
0:14:19 | so on one is i don't know the cyst so sit them automatically trained that uh |
---|
0:14:24 | so that that's the late and |
---|
0:14:25 | uh dictionary to explain the accompaniment |
---|
0:14:28 | so |
---|
0:14:29 | um by maybe be and one possible reason is that are made is that it actually do |
---|
0:14:34 | not local segment that does not contain the simple or the symbol is not a predominant in now |
---|
0:14:39 | to be experiment |
---|
0:14:40 | and of the at |
---|
0:14:41 | uh yeah that's my good mate |
---|
0:14:44 | you explanation based for so well we do the system a segmentation we try to use the a uh use |
---|
0:14:50 | the |
---|
0:14:51 | uh so non vocal segment uh |
---|
0:14:54 | you to the local segment it to explain it it's each other that we do not want to use |
---|
0:14:58 | not the non vocal segment or you the and the to explained uh |
---|
0:15:01 | the what segment in the beginning so |
---|
0:15:04 | a i was some is that the sound to be consistent the with the |
---|
0:15:07 | a calm accompaniment but they to be a change or what town |
---|
0:15:10 | so |
---|
0:15:11 | yeah |
---|
0:15:15 | you know questions |
---|
0:15:19 | like like remote |
---|