0:00:26hi everyone i
0:00:28moneymaking sponsored by
0:00:29i come from the null suppressed and only technical university over time
0:00:34is that we deal is a presentation on my paper
0:00:38although word for work or
0:00:40workshop of the odyssey two so
0:00:42to sound and the twenty
0:00:46now that speaking
0:00:47the title of this paper these partial using metric learning best a speaker verification back
0:00:53end
0:00:54in other wars
0:00:56this paper proposed a shallow match learning back end algorithm both speaker verification
0:01:20okay i will present it from this for aspects
0:01:24including as the metric learning and of the motivation
0:01:28the proposed objective function
0:01:31some experimental results
0:01:34and the and last i will give some conclusions
0:01:38and i will also introduce several all of that works
0:01:41this paper and do
0:01:43our future plans
0:01:48first
0:01:49the maxent learning and the motivation
0:01:55and illustrated in the title i thing well i can't on their these two questions
0:02:01the motivation of this paper we are equally
0:02:06the first one is what at the automatic learning and what i've we proposed a
0:02:12metric learning passed back end algorithm
0:02:22the mac learning em's to learn distance function to matters the similarity of them both
0:02:28third and the mahalanobis distance
0:02:32both speaker verification as displayed in the right speaker of this slide
0:02:39we first extract it is speaker identity features problems what i'm she's by a front
0:02:45and the speaker feature extractor
0:02:48but and the i-vector of the extractor
0:02:52and the thing we feed them to the metric learning past the back end to
0:02:57calculate the here
0:02:59similar just goals
0:03:02for the learning of the metrics
0:03:05we
0:03:07employed a loss function best on the optimisation of the actual use the as displayed
0:03:13in select speaker of this slide
0:03:22follows them actually learning i thing the first other one g h e that's the
0:03:27challenge of as a distance function is a consistent with the evaluation procedure
0:03:33therefore it back into can directly optimize the
0:03:38tom evaluation metrics the for speaker verification
0:03:42such as the equal the rats the life use the
0:03:47and style
0:03:50thank and eat can be easily combined to these
0:03:56accents front ends for them both the i-vector of the x better
0:04:04third this channel matched learn a matter that can be easily extended to choose the
0:04:09and to and the pram work
0:04:18the second requesting i needed to uncertainties
0:04:21what is the partial a use the
0:04:24and the
0:04:25why was them metric learning back end aims at its optimising
0:04:30actually use the
0:04:38in the
0:04:39left finger of this slide
0:04:42the power to use the divine and or small part of what re on there
0:04:47is a all c call
0:04:49like
0:04:50this correct re
0:04:53vol
0:04:54the metric learning can directly optimize thumb evaluation metrics
0:04:59its implementation fess these
0:05:01some difficulties
0:05:05as we all know
0:05:06we needed to "'cause" tried to peer wise all triple edge chanting trials with speaker-level
0:05:11labels to change is this function
0:05:15in matched learning
0:05:17in this edition
0:05:19the number of all possible training trials
0:05:22e is very large
0:05:24besides many easily distinguishable channels unnecessary to the challenge of the distance function
0:05:32in terms of these difficulties
0:05:35i think
0:05:36the optimisation of the pa use the has the
0:05:40pointing to the ones you jeez
0:05:44first
0:05:45it is easy to select the difficulty samples by cindy a two
0:05:51the overall
0:05:54and the we'd have to
0:05:58relative small value
0:06:00in this to be
0:06:01we can also progress the number of the
0:06:05ct of the training trials
0:06:08second we can optimize them interested the partial use the according to some specific applications
0:06:16and obviously
0:06:17a to z is a special case of partial using
0:06:27next
0:06:28in the centre part of your express the bedding comparing the impulse of the proposed
0:06:34algorithm
0:06:44in this slide i will introduce the whole to calculate to the partial use the
0:06:51and i health and metric learning need to construct pairwise trials
0:06:57here we don't see the whole to construct them
0:07:00and the be the in that
0:07:03t is an hour a day constructed this there'd
0:07:06here x and y n
0:07:10speaker features over two speech segments
0:07:14our is the year round to choose level
0:07:17you they come from of them speaker
0:07:19l a equal one
0:07:21otherwise i l and you quote the are able
0:07:26besides the function of s
0:07:29is use the to calculate the similarity
0:07:32of two speaker features
0:07:35here we used to the mahalanobis distance function
0:07:40no creativity the level l had can be obtained by a comparison of the distances
0:07:47calls
0:07:48as a
0:07:50and the is the threshold receiver
0:07:55given a fixed the value of the hot we i about to compute to posterior
0:08:00at t p r
0:08:04and to
0:08:07post
0:08:08positive rats f p r
0:08:13boundary of the hobby can get a theories o t p and the f b
0:08:18r
0:08:19which one
0:08:20and are of the call
0:08:22and the role in the speaker
0:08:27and to really optimize the entire
0:08:32optimising the optimize the entire roc call if an actual follows
0:08:39you were this is not only costly but also unnecessary
0:08:43because in most practical system
0:08:46work
0:08:47and only practical
0:08:52because the most of practical systems
0:08:55work
0:08:56and the part of their our roc curves
0:09:02walking them whole
0:09:04back security system you're leave equalized smaller force posterior rats
0:09:09in contrast
0:09:11terrorist the detector system always hopes
0:09:15we in
0:09:16hyper record react
0:09:21so without optimize the partial use the your the walk imports look at it is
0:09:27a better choice
0:09:32in this light
0:09:34t even though constructed up here was trained if that's
0:09:38key and do a
0:09:41the positive and negative subset of t
0:09:45then be needed to compute a new stuff that and the or
0:09:52vol
0:09:54from by eating that they'll
0:09:57can stress of that's the value of p r is peachy
0:10:03are far and the beta
0:10:05you order to compute and the oral we first needed to thank you lance our
0:10:10and the be higher but this formula
0:10:15then all this values of connectives that
0:10:22so
0:10:23sorted in ascending order
0:10:25and then e
0:10:26and the overall he's is selected as a subset of the samples under the problems
0:10:31at all
0:10:33i was to be fast position of the result you discourse
0:10:39after obtaining the overall
0:10:42p a use the can be calculated and all
0:10:45normalized
0:10:46it was the
0:10:48or p
0:10:49and the and they are well
0:10:57in respectively
0:11:00the partial if the is calculated by they'll
0:11:04that can a full meal or
0:11:05of this light
0:11:07you
0:11:09all i
0:11:10is an indicator function so directory optimising this formula is np-hard therefore we needed to
0:11:17relax eight in the best if agree
0:11:20elias there's no
0:11:22here use the calculation function by replacing the indicator function v is a huge loss
0:11:28function
0:11:32here
0:11:33third time is eligible hyper parameter and the it is larger than the oral
0:11:40the
0:11:41last from lord give of the relaxed the loss function
0:11:48to prevent
0:11:50it to bremen to this
0:11:53loss function
0:11:54or feed into the training data be also
0:11:57indeed regular
0:11:59not addition term
0:12:01the land that all mean a
0:12:04to the minimization problem
0:12:08finally
0:12:09this green part in large as the between-class distance
0:12:13and this read the patch
0:12:16try to minimize no between-class variance
0:12:19in awards our objective function ends
0:12:23and
0:12:24enlarging of each he'd marketing
0:12:26been to use the
0:12:28pasta you and in
0:12:29negative trials by minimizing they'll sitting at the various
0:12:35of the two colours trials simultaneously
0:12:42in the third part i go give some experimental results
0:12:47this lighted display our experimental it's easiness
0:12:52more details can be bounded in the paper
0:12:57this paper
0:12:58this table lists no comparison results on the conscience that's the data set
0:13:07it is then that's of the proposed
0:13:10pa use them actually it's better performance than p lda
0:13:14given both the i-vector and the expected front ends
0:13:19specifically the pac p a using them actually over ten s
0:13:24not persons and to twenty percent relative improvement over p lda
0:13:30in terms of the
0:13:32pa use the and it was the
0:13:34actually
0:13:38respectively
0:13:40no worry
0:13:41it achieves models that eleven percent relative eer reduction
0:13:49and five percent
0:13:51relative this the effort reduction over p lda
0:13:58table two at least the results on the core task
0:14:03the s i t w data that is that
0:14:05it is thing that's the problem lost
0:14:08p a using matching it's better performance than p lda
0:14:13specifically but the x factor front and is used
0:14:18pa using matching achieve some of them
0:14:21eight percent
0:14:23relative pa use the
0:14:25an improvement all work p l d a t
0:14:28if the
0:14:29it is also
0:14:30of a tent
0:14:31no then
0:14:32twenty percent and the channel or since about it you a was the improvements on
0:14:37the development and evaluation call tasks respectively
0:14:43moreover it achieves
0:14:46ten percent relative eer reduction and the
0:14:49three percent relative dcf
0:14:52reduction
0:14:53although the performance improvement to be though
0:14:56i-vector front end is not still significant
0:15:00and that the extract a front end
0:15:03the tense with different a front ends are consistent
0:15:10this page displayed as some experimental results
0:15:15bid i use the two analysis the if at all hyper parameters hopefulness
0:15:21we adopt e d
0:15:22read the source to study the impact of the values of common enemy performance
0:15:29in the
0:15:31a vector
0:15:32yes
0:15:33from these two tables bank and the data does double working region is quite large
0:15:42this fink or souls the relative performance improvements all work p lda
0:15:48in terms of the difference
0:15:50of different adored
0:15:52in the objective function
0:15:55from this finger be fine this dances the pa use them actually is a robust
0:16:01e o by the advantage of is the best value around do one point two
0:16:05five
0:16:14finally
0:16:15i will give some conclusions and the introduced several for the works as you and
0:16:21of our future plans
0:16:33in this paper
0:16:35mahalanobis distance past them magical learning back end is proposed to optimize partial a use
0:16:42the both speaker verification
0:16:47because directly optimize thing
0:16:50partial you the at and b heart
0:16:53be relaxed aid by a huge loss function
0:16:56experimental results
0:16:58carried out of the
0:17:00nist is a risky and data
0:17:02s i t w that have that's
0:17:05that must just as the effectiveness of our proposed algorithm
0:17:14after this work we also mad the general done normalization
0:17:20and to compress the analysis
0:17:23to the pac metric
0:17:27we show me
0:17:29published as the
0:17:30without relative without
0:17:33in this paper
0:17:37besides
0:17:38we also extended the extended to the
0:17:42pa is the magic to an energy and the framework
0:17:51more information can be found in this too
0:17:54more information can be found in this paper
0:18:00in the theatre
0:18:03maybe all research more general mexican and best the speaker verification or rhythm
0:18:08to optimize
0:18:11evaluation metrics
0:18:13in order to
0:18:15further improve speaker verification performance
0:18:23that all from my presentation
0:18:26thank you for your watching