0:00:17 | graph to everyone |
---|
0:00:19 | uh |
---|
0:00:19 | i only a low of i them you have i i them yeah in kind issue sure that when you |
---|
0:00:23 | an inverse |
---|
0:00:26 | so that title of my presentation is cost says that is taking for audio tag annotation and retrieval |
---|
0:00:35 | so he uh here is a of |
---|
0:00:37 | example one famous |
---|
0:00:39 | uh music taking web |
---|
0:00:42 | that that every N |
---|
0:00:44 | uh we show the uh |
---|
0:00:46 | sound track that T and it's |
---|
0:00:48 | a so your text |
---|
0:00:51 | and this so take |
---|
0:00:53 | provide reach information four |
---|
0:00:55 | oh a and and then is this |
---|
0:00:57 | so for example we can trace and class wise using |
---|
0:01:01 | uh |
---|
0:01:01 | using uh the audio |
---|
0:01:04 | uh |
---|
0:01:05 | and that the audio so your at we use us that then take |
---|
0:01:10 | so in this paper we focus on two important information the first one |
---|
0:01:15 | at a um |
---|
0:01:16 | a which means the number of different users who have annotated this tech |
---|
0:01:21 | uh in this i guess each simple a larger font |
---|
0:01:25 | uh indicates a a higher take on |
---|
0:01:28 | and the second uh information used tech corporation |
---|
0:01:32 | uh as we know that |
---|
0:01:34 | uh sound takes a a uh often cork curve |
---|
0:01:39 | so we propose a a cost is at this they keep for X probably in take a long and take |
---|
0:01:44 | a should |
---|
0:01:45 | joint to the |
---|
0:01:49 | okay we first introduce |
---|
0:01:51 | uh to use the information retrieval task |
---|
0:01:54 | the first is |
---|
0:01:55 | a audio annotation that is given a an audio creep |
---|
0:01:59 | oh we will uh we can make pretty shouldn't use sound take five |
---|
0:02:04 | so we would know that |
---|
0:02:05 | are these audio you be but so at with which takes |
---|
0:02:10 | and the they why is |
---|
0:02:12 | uh take based audio retrieval |
---|
0:02:15 | uh given uh take where |
---|
0:02:17 | carrie |
---|
0:02:18 | we can opt to prediction score using in the |
---|
0:02:22 | take cross fine |
---|
0:02:23 | and they we will have a range in these |
---|
0:02:26 | for the query |
---|
0:02:30 | okay |
---|
0:02:31 | so the first interest is since is the take on |
---|
0:02:34 | uh |
---|
0:02:35 | uh i we know that uh uh show take a a a sign |
---|
0:02:39 | by people with a label musical and knowledge |
---|
0:02:43 | so they inevitable ready |
---|
0:02:45 | concave noisy information |
---|
0:02:49 | oh we think that uh take on information should be can see the rate in all to make all automatic |
---|
0:02:55 | may take it because luck on |
---|
0:02:57 | for X |
---|
0:02:58 | the constants degree of the tech |
---|
0:03:01 | and the higher take on |
---|
0:03:02 | a more reliable and at in |
---|
0:03:07 | yeah were here we can't that uh |
---|
0:03:09 | experiment woman |
---|
0:03:10 | we have to select a at high to take |
---|
0:03:14 | and sound a low can take |
---|
0:03:16 | we uh come and they are uh the prediction performance |
---|
0:03:21 | according to false negative rate |
---|
0:03:25 | as we can see that |
---|
0:03:26 | the force snake you rate on the high can take |
---|
0:03:30 | uh match |
---|
0:03:31 | more more than in the goal con |
---|
0:03:33 | okay |
---|
0:03:34 | so we believe that uh it is uh ever that that |
---|
0:03:38 | uh high context um more reliable way stay at a |
---|
0:03:45 | okay we try to calm be is you by D |
---|
0:03:47 | this it's temple yeah we show the uh not train likely be and it's |
---|
0:03:52 | so text |
---|
0:03:54 | and this a the height how high high cut takes |
---|
0:03:58 | uh include tea |
---|
0:04:01 | six these |
---|
0:04:02 | peters |
---|
0:04:03 | british |
---|
0:04:03 | drastic rock |
---|
0:04:05 | all D |
---|
0:04:07 | so we put believe that |
---|
0:04:08 | these are more reliable the D and and important take |
---|
0:04:15 | so uh is some uh previous work |
---|
0:04:19 | uh that that take on is transformed into one or zero by using a this a whole |
---|
0:04:24 | and then a binary classifier each |
---|
0:04:27 | trend for each take to make prediction |
---|
0:04:30 | but |
---|
0:04:30 | a this may uh at it uh this may have a some problem |
---|
0:04:35 | a first slice that |
---|
0:04:36 | take on information is lost |
---|
0:04:38 | but take a side twice is traded it |
---|
0:04:41 | in the say same way as a take |
---|
0:04:44 | a hunter of time |
---|
0:04:46 | is the second probably used let |
---|
0:04:48 | a it is hard to determine the source all |
---|
0:04:52 | and that's supper so |
---|
0:04:54 | that's there probably is that |
---|
0:04:55 | uh they will be ambiguity in and lower class membership |
---|
0:05:00 | all that is that is nearby by less is all |
---|
0:05:02 | for example you we set take on the so to ten |
---|
0:05:07 | so that is that is |
---|
0:05:08 | moos take on is |
---|
0:05:10 | ten it will be kind see there eyes up as the use simple |
---|
0:05:14 | but |
---|
0:05:15 | but it uh if it's take out is nine they you would be can see there S a negative example |
---|
0:05:21 | i with the are that you is it it is there is strange |
---|
0:05:27 | so i'll question use |
---|
0:05:29 | how to use the take on information for audio tag annotation and retrieval |
---|
0:05:35 | and all i and there is cost is it the name with the take on as |
---|
0:05:39 | cost |
---|
0:05:43 | so in close this at the learning we are given a training in |
---|
0:05:46 | set |
---|
0:05:47 | X Y in C |
---|
0:05:49 | the X is the feature vector |
---|
0:05:51 | and why is |
---|
0:05:52 | the class label and C is |
---|
0:05:55 | the |
---|
0:05:56 | crap and misclassification cost of these it's simple |
---|
0:06:01 | or look for all |
---|
0:06:03 | uh |
---|
0:06:03 | close says that the than the in is to then a class fine |
---|
0:06:06 | which minimize the expected |
---|
0:06:09 | cost and on and thing is |
---|
0:06:12 | and it is a more general state apple |
---|
0:06:14 | all |
---|
0:06:15 | traditional classification problem |
---|
0:06:20 | so in all uh application i'll court is to |
---|
0:06:24 | minimize |
---|
0:06:25 | mays classified take on for audio tag annotation |
---|
0:06:29 | and retrieval |
---|
0:06:31 | so if |
---|
0:06:31 | one hundred you use annotate a an audio clip which is rock |
---|
0:06:36 | but that |
---|
0:06:37 | five years |
---|
0:06:38 | oh force the egg negative |
---|
0:06:40 | then the cost is one hundred |
---|
0:06:43 | so the cost it's of the than in were |
---|
0:06:46 | where we pay more attention on the reliable or at and and important take |
---|
0:06:52 | and |
---|
0:06:53 | so we have it |
---|
0:06:55 | uh we have a it's probably at to close sensitive by binary classifiers |
---|
0:06:59 | the first why is close since that these support vector machine |
---|
0:07:03 | uh is a public the machine the training error wrote ten "'cause" see |
---|
0:07:07 | uh uh uh will uh will be a some shady |
---|
0:07:11 | with a cost |
---|
0:07:12 | to |
---|
0:07:13 | i |
---|
0:07:16 | and the second "'cause" since they class twice |
---|
0:07:19 | uh a close to the end up pose |
---|
0:07:21 | so here we show the update they uh way |
---|
0:07:24 | is that weight update do E do |
---|
0:07:27 | in add up pose |
---|
0:07:29 | and uh |
---|
0:07:30 | though uh weight updating eighteen all and is that is will be proportion to the cost of these is that |
---|
0:07:41 | okay |
---|
0:07:41 | uh the second |
---|
0:07:43 | uh you put "'em" information in is uh take variation |
---|
0:07:47 | i and a is on previous work the take on notation as is |
---|
0:07:51 | separated it into several |
---|
0:07:53 | a binary classification problem |
---|
0:07:55 | so uh |
---|
0:07:57 | les assume that that takes a are independent |
---|
0:08:02 | so |
---|
0:08:03 | the take colouration information use lost |
---|
0:08:06 | for example we know that he have and wrap open call curve |
---|
0:08:11 | for |
---|
0:08:11 | for example we yeah in |
---|
0:08:13 | our database |
---|
0:08:15 | uh |
---|
0:08:16 | we can't all the that |
---|
0:08:17 | they call curve |
---|
0:08:19 | vol one hundred and sixty times |
---|
0:08:22 | and they are only uh seventy and |
---|
0:08:25 | so T six times that |
---|
0:08:27 | they all curve |
---|
0:08:28 | a little |
---|
0:08:30 | or we propose close this at these take into it's probably eight |
---|
0:08:34 | take on and ageing information |
---|
0:08:36 | joint of the |
---|
0:08:39 | so uh |
---|
0:08:41 | in so uh for the uh a so how close is that these in these that |
---|
0:08:47 | uh in this first stage way which change stand close since the D take for a fine |
---|
0:08:53 | for each take |
---|
0:08:55 | and |
---|
0:08:56 | thus thinking class vice use the output put all take class at |
---|
0:09:01 | as |
---|
0:09:01 | it |
---|
0:09:02 | inputs |
---|
0:09:04 | and we use the in yeah class five for that's taking cows five |
---|
0:09:08 | so if the you if we then |
---|
0:09:11 | uh the and so we can then the |
---|
0:09:14 | top you here |
---|
0:09:16 | and if |
---|
0:09:17 | uh W i they is greater than zero |
---|
0:09:20 | then it means |
---|
0:09:22 | take |
---|
0:09:22 | they is positive the core eight at to take i |
---|
0:09:27 | so |
---|
0:09:28 | uh the take or if you information can be |
---|
0:09:31 | a head the read by that's taking cost five |
---|
0:09:36 | okay here you know we discuss uh with these by our experimental state up |
---|
0:09:42 | so uh |
---|
0:09:43 | our baseline is our weenie met the |
---|
0:09:46 | all E matrix |
---|
0:09:47 | two thousand nine audio take in task |
---|
0:09:50 | uh this mess the use cost is sensitive |
---|
0:09:53 | and |
---|
0:09:54 | only use binary class |
---|
0:09:57 | and all |
---|
0:09:57 | uh oh experiments basic the follow the matrix |
---|
0:10:02 | to like to thousand nice it up |
---|
0:10:04 | we use the then forty five take |
---|
0:10:07 | and uh we a little uh audio problem |
---|
0:10:10 | may john mind the with sign |
---|
0:10:12 | which is a web base you C get |
---|
0:10:15 | i we have so many |
---|
0:10:17 | uh a the paper |
---|
0:10:19 | and |
---|
0:10:20 | a it parameters amateurs a they can be |
---|
0:10:22 | select the based on in a course by dish you on a training data |
---|
0:10:26 | and we P |
---|
0:10:28 | cross validation one hundred time |
---|
0:10:32 | okay if you know we show our experiment results |
---|
0:10:35 | uh the the audio annotation is even a by trade but a use the |
---|
0:10:41 | that |
---|
0:10:41 | that is |
---|
0:10:42 | a a a keep we were run the correct |
---|
0:10:46 | uh take to be rank higher |
---|
0:10:49 | and audio retrieval is |
---|
0:10:52 | you rank by take |
---|
0:10:53 | use C and of F major |
---|
0:10:56 | that is given to take we one the |
---|
0:10:59 | correct in is is that is to be dragged higher |
---|
0:11:03 | so we have uh use uh different class Y i different class wise |
---|
0:11:07 | in the first |
---|
0:11:08 | state |
---|
0:11:09 | uh including and uh pose and S yet |
---|
0:11:12 | and the and sample is a combination of the these two |
---|
0:11:16 | uh these two got |
---|
0:11:18 | and we have come for method |
---|
0:11:21 | uh though first slice out matrix baseline line and the second a one is |
---|
0:11:27 | a a it's that the |
---|
0:11:28 | uh close a send the that need only |
---|
0:11:31 | a the sir is |
---|
0:11:32 | staking only and of force is our proposed |
---|
0:11:35 | a a at these taking |
---|
0:11:38 | as we can see that |
---|
0:11:40 | in all cases uh the close is at least expected problem |
---|
0:11:44 | better |
---|
0:11:45 | in |
---|
0:11:46 | a the of uh to other them is the |
---|
0:11:52 | and |
---|
0:11:53 | uh you thus taking only and |
---|
0:11:55 | close to the or learning all only |
---|
0:11:58 | will be better then |
---|
0:12:01 | our matrix baseline |
---|
0:12:07 | okay so i'll cook conclusion she's |
---|
0:12:09 | uh |
---|
0:12:10 | take on a hot and take a if you got two important you formation for so your take pretty should |
---|
0:12:16 | a are time media data |
---|
0:12:19 | and we have first formulate the |
---|
0:12:22 | oh T take should task as a close since T classification problem |
---|
0:12:27 | to minimize |
---|
0:12:28 | the means classified take on |
---|
0:12:32 | and we have uh then for me rate the task as a cost sensitive multi label classification problem |
---|
0:12:39 | and propose |
---|
0:12:40 | close says of these they kid to exploit |
---|
0:12:43 | uh take on and core you formation joint to the |
---|
0:12:47 | and the experiment |
---|
0:12:49 | experiment results show that the new approach |
---|
0:12:53 | oh i'll to prove our matrix two thousand than i we knee have the |
---|
0:12:59 | so uh here we have a a me uh uh a journal paper so please see out the our journal |
---|
0:13:06 | version of this paper of four |
---|
0:13:08 | start uh more details and start it's station walk |
---|
0:13:12 | all this idea |
---|
0:13:15 | okay thank you |
---|
0:13:37 | uh |
---|
0:13:39 | yeah i have a a a a a try to a i i've have used to so of the first |
---|
0:13:43 | mess the is uh transform the |
---|
0:13:46 | uh i'll put all S yeah and and outputs into power would be a T then every average you a |
---|
0:13:50 | proper bit |
---|
0:13:52 | and |
---|
0:13:52 | uh |
---|
0:13:53 | those stick a mess the is to transform the pretty she's goal in into read at rate |
---|
0:13:59 | and uh final decision use the uh |
---|
0:14:01 | every rate |
---|
0:14:09 | thank you |
---|