0:00:26 | hi everyone i |
---|
0:00:28 | moneymaking sponsored by |
---|
0:00:29 | i come from the null suppressed and only technical university over time |
---|
0:00:34 | is that we deal is a presentation on my paper |
---|
0:00:38 | although word for work or |
---|
0:00:40 | workshop of the odyssey two so |
---|
0:00:42 | to sound and the twenty |
---|
0:00:46 | now that speaking |
---|
0:00:47 | the title of this paper these partial using metric learning best a speaker verification back |
---|
0:00:53 | end |
---|
0:00:54 | in other wars |
---|
0:00:56 | this paper proposed a shallow match learning back end algorithm both speaker verification |
---|
0:01:20 | okay i will present it from this for aspects |
---|
0:01:24 | including as the metric learning and of the motivation |
---|
0:01:28 | the proposed objective function |
---|
0:01:31 | some experimental results |
---|
0:01:34 | and the and last i will give some conclusions |
---|
0:01:38 | and i will also introduce several all of that works |
---|
0:01:41 | this paper and do |
---|
0:01:43 | our future plans |
---|
0:01:48 | first |
---|
0:01:49 | the maxent learning and the motivation |
---|
0:01:55 | and illustrated in the title i thing well i can't on their these two questions |
---|
0:02:01 | the motivation of this paper we are equally |
---|
0:02:06 | the first one is what at the automatic learning and what i've we proposed a |
---|
0:02:12 | metric learning passed back end algorithm |
---|
0:02:22 | the mac learning em's to learn distance function to matters the similarity of them both |
---|
0:02:28 | third and the mahalanobis distance |
---|
0:02:32 | both speaker verification as displayed in the right speaker of this slide |
---|
0:02:39 | we first extract it is speaker identity features problems what i'm she's by a front |
---|
0:02:45 | and the speaker feature extractor |
---|
0:02:48 | but and the i-vector of the extractor |
---|
0:02:52 | and the thing we feed them to the metric learning past the back end to |
---|
0:02:57 | calculate the here |
---|
0:02:59 | similar just goals |
---|
0:03:02 | for the learning of the metrics |
---|
0:03:05 | we |
---|
0:03:07 | employed a loss function best on the optimisation of the actual use the as displayed |
---|
0:03:13 | in select speaker of this slide |
---|
0:03:22 | follows them actually learning i thing the first other one g h e that's the |
---|
0:03:27 | challenge of as a distance function is a consistent with the evaluation procedure |
---|
0:03:33 | therefore it back into can directly optimize the |
---|
0:03:38 | tom evaluation metrics the for speaker verification |
---|
0:03:42 | such as the equal the rats the life use the |
---|
0:03:47 | and style |
---|
0:03:50 | thank and eat can be easily combined to these |
---|
0:03:56 | accents front ends for them both the i-vector of the x better |
---|
0:04:04 | third this channel matched learn a matter that can be easily extended to choose the |
---|
0:04:09 | and to and the pram work |
---|
0:04:18 | the second requesting i needed to uncertainties |
---|
0:04:21 | what is the partial a use the |
---|
0:04:24 | and the |
---|
0:04:25 | why was them metric learning back end aims at its optimising |
---|
0:04:30 | actually use the |
---|
0:04:38 | in the |
---|
0:04:39 | left finger of this slide |
---|
0:04:42 | the power to use the divine and or small part of what re on there |
---|
0:04:47 | is a all c call |
---|
0:04:49 | like |
---|
0:04:50 | this correct re |
---|
0:04:53 | vol |
---|
0:04:54 | the metric learning can directly optimize thumb evaluation metrics |
---|
0:04:59 | its implementation fess these |
---|
0:05:01 | some difficulties |
---|
0:05:05 | as we all know |
---|
0:05:06 | we needed to "'cause" tried to peer wise all triple edge chanting trials with speaker-level |
---|
0:05:11 | labels to change is this function |
---|
0:05:15 | in matched learning |
---|
0:05:17 | in this edition |
---|
0:05:19 | the number of all possible training trials |
---|
0:05:22 | e is very large |
---|
0:05:24 | besides many easily distinguishable channels unnecessary to the challenge of the distance function |
---|
0:05:32 | in terms of these difficulties |
---|
0:05:35 | i think |
---|
0:05:36 | the optimisation of the pa use the has the |
---|
0:05:40 | pointing to the ones you jeez |
---|
0:05:44 | first |
---|
0:05:45 | it is easy to select the difficulty samples by cindy a two |
---|
0:05:51 | the overall |
---|
0:05:54 | and the we'd have to |
---|
0:05:58 | relative small value |
---|
0:06:00 | in this to be |
---|
0:06:01 | we can also progress the number of the |
---|
0:06:05 | ct of the training trials |
---|
0:06:08 | second we can optimize them interested the partial use the according to some specific applications |
---|
0:06:16 | and obviously |
---|
0:06:17 | a to z is a special case of partial using |
---|
0:06:27 | next |
---|
0:06:28 | in the centre part of your express the bedding comparing the impulse of the proposed |
---|
0:06:34 | algorithm |
---|
0:06:44 | in this slide i will introduce the whole to calculate to the partial use the |
---|
0:06:51 | and i health and metric learning need to construct pairwise trials |
---|
0:06:57 | here we don't see the whole to construct them |
---|
0:07:00 | and the be the in that |
---|
0:07:03 | t is an hour a day constructed this there'd |
---|
0:07:06 | here x and y n |
---|
0:07:10 | speaker features over two speech segments |
---|
0:07:14 | our is the year round to choose level |
---|
0:07:17 | you they come from of them speaker |
---|
0:07:19 | l a equal one |
---|
0:07:21 | otherwise i l and you quote the are able |
---|
0:07:26 | besides the function of s |
---|
0:07:29 | is use the to calculate the similarity |
---|
0:07:32 | of two speaker features |
---|
0:07:35 | here we used to the mahalanobis distance function |
---|
0:07:40 | no creativity the level l had can be obtained by a comparison of the distances |
---|
0:07:47 | calls |
---|
0:07:48 | as a |
---|
0:07:50 | and the is the threshold receiver |
---|
0:07:55 | given a fixed the value of the hot we i about to compute to posterior |
---|
0:08:00 | at t p r |
---|
0:08:04 | and to |
---|
0:08:07 | post |
---|
0:08:08 | positive rats f p r |
---|
0:08:13 | boundary of the hobby can get a theories o t p and the f b |
---|
0:08:18 | r |
---|
0:08:19 | which one |
---|
0:08:20 | and are of the call |
---|
0:08:22 | and the role in the speaker |
---|
0:08:27 | and to really optimize the entire |
---|
0:08:32 | optimising the optimize the entire roc call if an actual follows |
---|
0:08:39 | you were this is not only costly but also unnecessary |
---|
0:08:43 | because in most practical system |
---|
0:08:46 | work |
---|
0:08:47 | and only practical |
---|
0:08:52 | because the most of practical systems |
---|
0:08:55 | work |
---|
0:08:56 | and the part of their our roc curves |
---|
0:09:02 | walking them whole |
---|
0:09:04 | back security system you're leave equalized smaller force posterior rats |
---|
0:09:09 | in contrast |
---|
0:09:11 | terrorist the detector system always hopes |
---|
0:09:15 | we in |
---|
0:09:16 | hyper record react |
---|
0:09:21 | so without optimize the partial use the your the walk imports look at it is |
---|
0:09:27 | a better choice |
---|
0:09:32 | in this light |
---|
0:09:34 | t even though constructed up here was trained if that's |
---|
0:09:38 | key and do a |
---|
0:09:41 | the positive and negative subset of t |
---|
0:09:45 | then be needed to compute a new stuff that and the or |
---|
0:09:52 | vol |
---|
0:09:54 | from by eating that they'll |
---|
0:09:57 | can stress of that's the value of p r is peachy |
---|
0:10:03 | are far and the beta |
---|
0:10:05 | you order to compute and the oral we first needed to thank you lance our |
---|
0:10:10 | and the be higher but this formula |
---|
0:10:15 | then all this values of connectives that |
---|
0:10:22 | so |
---|
0:10:23 | sorted in ascending order |
---|
0:10:25 | and then e |
---|
0:10:26 | and the overall he's is selected as a subset of the samples under the problems |
---|
0:10:31 | at all |
---|
0:10:33 | i was to be fast position of the result you discourse |
---|
0:10:39 | after obtaining the overall |
---|
0:10:42 | p a use the can be calculated and all |
---|
0:10:45 | normalized |
---|
0:10:46 | it was the |
---|
0:10:48 | or p |
---|
0:10:49 | and the and they are well |
---|
0:10:57 | in respectively |
---|
0:11:00 | the partial if the is calculated by they'll |
---|
0:11:04 | that can a full meal or |
---|
0:11:05 | of this light |
---|
0:11:07 | you |
---|
0:11:09 | all i |
---|
0:11:10 | is an indicator function so directory optimising this formula is np-hard therefore we needed to |
---|
0:11:17 | relax eight in the best if agree |
---|
0:11:20 | elias there's no |
---|
0:11:22 | here use the calculation function by replacing the indicator function v is a huge loss |
---|
0:11:28 | function |
---|
0:11:32 | here |
---|
0:11:33 | third time is eligible hyper parameter and the it is larger than the oral |
---|
0:11:40 | the |
---|
0:11:41 | last from lord give of the relaxed the loss function |
---|
0:11:48 | to prevent |
---|
0:11:50 | it to bremen to this |
---|
0:11:53 | loss function |
---|
0:11:54 | or feed into the training data be also |
---|
0:11:57 | indeed regular |
---|
0:11:59 | not addition term |
---|
0:12:01 | the land that all mean a |
---|
0:12:04 | to the minimization problem |
---|
0:12:08 | finally |
---|
0:12:09 | this green part in large as the between-class distance |
---|
0:12:13 | and this read the patch |
---|
0:12:16 | try to minimize no between-class variance |
---|
0:12:19 | in awards our objective function ends |
---|
0:12:23 | and |
---|
0:12:24 | enlarging of each he'd marketing |
---|
0:12:26 | been to use the |
---|
0:12:28 | pasta you and in |
---|
0:12:29 | negative trials by minimizing they'll sitting at the various |
---|
0:12:35 | of the two colours trials simultaneously |
---|
0:12:42 | in the third part i go give some experimental results |
---|
0:12:47 | this lighted display our experimental it's easiness |
---|
0:12:52 | more details can be bounded in the paper |
---|
0:12:57 | this paper |
---|
0:12:58 | this table lists no comparison results on the conscience that's the data set |
---|
0:13:07 | it is then that's of the proposed |
---|
0:13:10 | pa use them actually it's better performance than p lda |
---|
0:13:14 | given both the i-vector and the expected front ends |
---|
0:13:19 | specifically the pac p a using them actually over ten s |
---|
0:13:24 | not persons and to twenty percent relative improvement over p lda |
---|
0:13:30 | in terms of the |
---|
0:13:32 | pa use the and it was the |
---|
0:13:34 | actually |
---|
0:13:38 | respectively |
---|
0:13:40 | no worry |
---|
0:13:41 | it achieves models that eleven percent relative eer reduction |
---|
0:13:49 | and five percent |
---|
0:13:51 | relative this the effort reduction over p lda |
---|
0:13:58 | table two at least the results on the core task |
---|
0:14:03 | the s i t w data that is that |
---|
0:14:05 | it is thing that's the problem lost |
---|
0:14:08 | p a using matching it's better performance than p lda |
---|
0:14:13 | specifically but the x factor front and is used |
---|
0:14:18 | pa using matching achieve some of them |
---|
0:14:21 | eight percent |
---|
0:14:23 | relative pa use the |
---|
0:14:25 | an improvement all work p l d a t |
---|
0:14:28 | if the |
---|
0:14:29 | it is also |
---|
0:14:30 | of a tent |
---|
0:14:31 | no then |
---|
0:14:32 | twenty percent and the channel or since about it you a was the improvements on |
---|
0:14:37 | the development and evaluation call tasks respectively |
---|
0:14:43 | moreover it achieves |
---|
0:14:46 | ten percent relative eer reduction and the |
---|
0:14:49 | three percent relative dcf |
---|
0:14:52 | reduction |
---|
0:14:53 | although the performance improvement to be though |
---|
0:14:56 | i-vector front end is not still significant |
---|
0:15:00 | and that the extract a front end |
---|
0:15:03 | the tense with different a front ends are consistent |
---|
0:15:10 | this page displayed as some experimental results |
---|
0:15:15 | bid i use the two analysis the if at all hyper parameters hopefulness |
---|
0:15:21 | we adopt e d |
---|
0:15:22 | read the source to study the impact of the values of common enemy performance |
---|
0:15:29 | in the |
---|
0:15:31 | a vector |
---|
0:15:32 | yes |
---|
0:15:33 | from these two tables bank and the data does double working region is quite large |
---|
0:15:42 | this fink or souls the relative performance improvements all work p lda |
---|
0:15:48 | in terms of the difference |
---|
0:15:50 | of different adored |
---|
0:15:52 | in the objective function |
---|
0:15:55 | from this finger be fine this dances the pa use them actually is a robust |
---|
0:16:01 | e o by the advantage of is the best value around do one point two |
---|
0:16:05 | five |
---|
0:16:14 | finally |
---|
0:16:15 | i will give some conclusions and the introduced several for the works as you and |
---|
0:16:21 | of our future plans |
---|
0:16:33 | in this paper |
---|
0:16:35 | mahalanobis distance past them magical learning back end is proposed to optimize partial a use |
---|
0:16:42 | the both speaker verification |
---|
0:16:47 | because directly optimize thing |
---|
0:16:50 | partial you the at and b heart |
---|
0:16:53 | be relaxed aid by a huge loss function |
---|
0:16:56 | experimental results |
---|
0:16:58 | carried out of the |
---|
0:17:00 | nist is a risky and data |
---|
0:17:02 | s i t w that have that's |
---|
0:17:05 | that must just as the effectiveness of our proposed algorithm |
---|
0:17:14 | after this work we also mad the general done normalization |
---|
0:17:20 | and to compress the analysis |
---|
0:17:23 | to the pac metric |
---|
0:17:27 | we show me |
---|
0:17:29 | published as the |
---|
0:17:30 | without relative without |
---|
0:17:33 | in this paper |
---|
0:17:37 | besides |
---|
0:17:38 | we also extended the extended to the |
---|
0:17:42 | pa is the magic to an energy and the framework |
---|
0:17:51 | more information can be found in this too |
---|
0:17:54 | more information can be found in this paper |
---|
0:18:00 | in the theatre |
---|
0:18:03 | maybe all research more general mexican and best the speaker verification or rhythm |
---|
0:18:08 | to optimize |
---|
0:18:11 | evaluation metrics |
---|
0:18:13 | in order to |
---|
0:18:15 | further improve speaker verification performance |
---|
0:18:23 | that all from my presentation |
---|
0:18:26 | thank you for your watching |
---|