0:00:06more
0:00:07really
0:00:08okay good
0:00:10right okay so
0:00:11today i'm gonna start
0:00:12talking a little bit about um unsupervised you adaptation
0:00:15and uh
0:00:16with respect to use a total variability cosine scoring like it changes
0:00:20previously discussed but
0:00:22first off
0:00:24i've only known a gene for for a couple of months
0:00:26and
0:00:27and i think a lot of you probably know a lot more about an idea but
0:00:31i just want to summarise maybe a couple of things
0:00:33i where
0:00:34and
0:00:35and these are in in court said that he's had over the last couple of a
0:00:39couple weeks now
0:00:40um i i had to get a little bit appear approval with with with this 'cause i wasn't sure this
0:00:45introduction slight was gonna be appropriate at all
0:00:47but well
0:00:48it seemed okay so we're gonna go with that
0:00:50so this is what happened
0:00:52during a
0:00:53this sre do that okay
0:00:54this is this is this main but
0:00:57um
0:00:58no
0:00:59upon arrival in brno
0:01:01um
0:01:03it goes ahead
0:01:07oh
0:01:08yeah
0:01:11but but i like his mind changes really quickly you 'cause
0:01:14i mean
0:01:15and not just put this together bottom
0:01:17a couple hours ago 'cause like this morning he was really
0:01:20it's really up for the city is
0:01:21is good
0:01:22fig
0:01:24you know
0:01:25no words that and so i decided you know protect what what's gonna happen in a few days and like
0:01:32and and this is kind of what i'm what i'm hoping for
0:01:34as you as you may know
0:01:36he doesn't
0:01:36during
0:01:37that much
0:01:38here anything but i'm thinking that maybe
0:01:41he also for co for some reason um but
0:01:44we maybe we should
0:01:45we should help them out with with fear and an open on
0:01:48give that a go anyway
0:01:49down to business
0:01:50um
0:01:52so
0:01:54the the whole idea of how my top
0:01:56is that
0:01:57uh it's on
0:01:58again
0:01:58unsupervised
0:02:00adaptation and and the whole motivation behind it is that uh
0:02:03capturing characterising every source
0:02:05oh variability
0:02:06um is pretty difficult especially with only one enrolment session
0:02:10now
0:02:11if we were
0:02:12able to have multiple enrolments of the same speaker
0:02:15this would
0:02:15help
0:02:16average out
0:02:17um more sources of the inconsistencies
0:02:20and provide a better representation
0:02:22apart
0:02:22and a speaker model
0:02:24and so
0:02:25this brings us to the problem of um
0:02:28so you can adaptation in general
0:02:30um
0:02:30but
0:02:31in the problem unsupervised speaker adaptation we are updating our speaker models with out
0:02:35a priori knowledge
0:02:37that utterance that were
0:02:38updating a model with
0:02:40actually belongs to the target speaker
0:02:42now
0:02:43and we do so based on utterance
0:02:45process during testing and this
0:02:47is
0:02:47was incorporated in uh
0:02:49i think in this very uh two thousand four
0:02:51what isn't five or so
0:02:53um
0:02:54now
0:02:56in previous work
0:02:57um using joint factor analysis
0:02:59before
0:03:00we began on it
0:03:01work with total variability
0:03:03um what we noticed was that there were indeed
0:03:05highly variable scores
0:03:06on that were produced
0:03:07by J F K
0:03:08and the required normalisation in particular is each you know or
0:03:12um
0:03:14and applying
0:03:15these
0:03:15score normalisations
0:03:17in the unsupervised adaptation domain requires
0:03:19a significant amount of additional computation
0:03:21with each adaptation of
0:03:24um i'll go a little bit more into that
0:03:26um just a little bit
0:03:28now
0:03:29when we when i
0:03:30we begin this attend
0:03:32with total variability
0:03:33we were hoping for a certain number
0:03:36of improvement
0:03:37where
0:03:38um we could
0:03:39do
0:03:40unsupervised adaptation with
0:03:42less computation
0:03:43um and we take advantage of
0:03:45total variabilities
0:03:46use of lowdimensional total factor vectors or ivectors
0:03:50we can debate on who once
0:03:51to name what later
0:03:53um but
0:03:54and then there's a cosine similarity scoring which is
0:03:57which is very que
0:03:58and
0:03:59and
0:04:00also there's also the news
0:04:02a set of news
0:04:02score normalisation
0:04:04strategies that we wanted to play with
0:04:06namely the the symmetric normalisation snr
0:04:08and uh
0:04:09normalised cosine disk
0:04:10that is you just talk
0:04:13so
0:04:14the a little bit about sort of a
0:04:16outline for this talk on
0:04:17i i'm gonna
0:04:18go over
0:04:20really quickly whatever
0:04:21total variability
0:04:22wasn't as in just
0:04:23are you get a very good job explaining
0:04:25some of on some of the ideas behind it
0:04:26oh going to uh
0:04:27the the unsupervised adaptation
0:04:30algorithm that that we
0:04:32then we came up with on that that's
0:04:34gotten decent
0:04:35is that
0:04:35in in results and then
0:04:37we can um
0:04:38proceed onward with uh
0:04:40score normalisation experiments
0:04:42and a little better for the disk
0:04:47so total variability et cetera
0:04:49has
0:04:49as all the components
0:04:51that um
0:04:52that was shown
0:04:53in the previous thought
0:04:54and and that we've
0:04:55probably
0:04:56all seen
0:04:57in the past
0:04:58on this
0:04:59you know we're using factor analysis
0:05:00feature extractor you have a
0:05:02speaker and channel dependent supervector
0:05:04um
0:05:05there's intersession compensation without the in the V C C N
0:05:09and cosine scoring
0:05:10um
0:05:11just
0:05:12in the final at the end of the day we're just gonna use um
0:05:15W prime
0:05:16um
0:05:17which is after everything has been applied
0:05:19and such that was an scoring is actually really just
0:05:21um the you know product between that you've
0:05:26so
0:05:28for previous work
0:05:29in
0:05:30in joint factor analysis
0:05:31um
0:05:32on the topic
0:05:33of unsupervised adaptation
0:05:34was
0:05:35done by dan rosen kenny
0:05:37um
0:05:38and what they did
0:05:39was given some new adaptation data they would
0:05:42compute
0:05:43the posterior distribution of that speakerdependent hyperparameters
0:05:46using um the current
0:05:47ever
0:05:47odours
0:05:48as
0:05:48prior
0:05:49um
0:05:50and what they what they also did was to set a fixed and predefined
0:05:54adaptation threshold
0:05:55and use log likelihood ratio scoring
0:05:58um
0:05:59the in that paper they also introduce an adaptive
0:06:02T known score normalisation technique
0:06:04um because what they had observed
0:06:06was there is a there's a good
0:06:07this attribute you normalise scores as more adaptation data was used
0:06:11um and
0:06:12and
0:06:13in order
0:06:14two
0:06:14use
0:06:15um be able to use
0:06:16fixed decision threshold
0:06:18with
0:06:19um in their in their decision process
0:06:21um they had to do
0:06:23a a new type of normalisation
0:06:25now
0:06:26that that that that was met with
0:06:27a good amount of success
0:06:29and um
0:06:30and they were very promising yet
0:06:32that to say in order to combat
0:06:33the
0:06:35in order to implement the adaptive you know normalisation it requires a good bit of computation
0:06:41um
0:06:41and
0:06:42um calculating pursue to speech design easy
0:06:45and um
0:06:47and also there was
0:06:49that there was a require a computational as you know in france
0:06:52every
0:06:53cation update
0:06:54um so
0:06:56uh and then lastly i guess
0:06:57success also dependent on the choice of adaptation threshold which was
0:07:01which was tuned but
0:07:02well we also get an animal
0:07:03but
0:07:04now
0:07:05for us then
0:07:06in order
0:07:07two
0:07:08better
0:07:09uh or try to improve upon this work
0:07:11in in the context
0:07:12of total variability
0:07:14um what we wanted
0:07:15was
0:07:15satisfy the following criteria
0:07:17we wanted a simple and robust method for setting an adaptation threshold
0:07:21data
0:07:22and what we decided would
0:07:23was
0:07:24to set it to be insane
0:07:25and some optimal decision thresholds and development data
0:07:28but we're gonna use was the nist two thousand
0:07:30six
0:07:30um
0:07:31sre
0:07:32sre data and
0:07:34basically
0:07:35um
0:07:36what we would do
0:07:37uh is
0:07:40carry out your test without adaptation
0:07:42um
0:07:42are
0:07:43carry out of one without adaptation and set
0:07:45the optimal
0:07:47i guess the the the point that will
0:07:49minimise the dcf
0:07:51i i a threshold and we'd set that as a special to to test
0:07:54um i don't get in to details about that in a little bit
0:07:57no
0:07:58next was
0:07:59we wanted
0:07:59minimise
0:08:00the amount of computation that was that's
0:08:02area during each
0:08:03unsupervised adaptation update
0:08:05and
0:08:05this helps that already in
0:08:07control variability
0:08:09we are we are able to use low dimensional
0:08:11um
0:08:12total factor vectors
0:08:13and that we are able to use
0:08:14cosine similarity scoring
0:08:16and lastly
0:08:17our hope was
0:08:18to simplify
0:08:19score normalisation procedures wherever part
0:08:25so
0:08:25really the really basically i mean if you were
0:08:27if we're able to use our total factor vectors or i vector
0:08:30as
0:08:31um
0:08:32when estimates
0:08:33um
0:08:33in in our speaker space then
0:08:35then given a limited training data we we might not obviously have
0:08:39it's perfect
0:08:41estimation of
0:08:42of where our speaker really lies
0:08:44and so i suppose
0:08:46this is just the cartoon so it's
0:08:47it's not
0:08:48not anything rigorous at all
0:08:50but suppose we had our
0:08:51true speaker identity S which is
0:08:53um which can be given
0:08:55by
0:08:55by the
0:08:56the little circle there
0:08:58but then
0:08:58and are estimated
0:09:00and one utterance or speaker identity of um the reason i
0:09:03and so
0:09:04this might not be
0:09:05this isn't exactly very
0:09:07right on the spot where the true speaker identity is
0:09:10but if we had a good number of these utterances i it would make sense that
0:09:13um
0:09:14we should
0:09:15converges towards a better representation of
0:09:17speaker
0:09:18now was this
0:09:19this question is also assuming a priori that the additional data that we actually have
0:09:23is
0:09:23from
0:09:24on speaker S
0:09:26now
0:09:27um
0:09:28so
0:09:30as such
0:09:30we
0:09:31decide to propose this
0:09:32on this algorithm here
0:09:34and
0:09:35in an effort to match
0:09:36the
0:09:37the the technical rigour of the previous two presentations
0:09:40for me i decided to pack as much math
0:09:42i could in this
0:09:43in this one slide
0:09:44um
0:09:45but
0:09:45basically
0:09:47it we're saying that we have a set of total factor vectors that are
0:09:50um assumed to pertain to an identity
0:09:52of a known speaker S
0:09:54um
0:09:55and then we have a set
0:09:56of total factor vectors
0:09:58T survive
0:09:59on that are extracted
0:10:00test utterance
0:10:01each
0:10:02of arc a test
0:10:02and
0:10:03with a defined if
0:10:04decision threshold
0:10:05data
0:10:06um we have
0:10:08a new equation for the score which
0:10:10um
0:10:11since
0:10:12this is uh the
0:10:13this notation is just the cardinality so that's is it just the mean of all possible
0:10:17um and then we compared to some threshold and if the threshold
0:10:20of of all the
0:10:22and that threshold of the score exceeds
0:10:25you're you're threshold then you decide
0:10:27that you're
0:10:28that
0:10:29but the utterance
0:10:30current utterance to supply would belong in inside here
0:10:33into the
0:10:33the identity of a new speaker as
0:10:36and
0:10:36you would
0:10:38a you would
0:10:38say yes to that trial and be you would
0:10:41and the
0:10:41um
0:10:43that that new utterance these to buy into you
0:10:45speaker big W sub
0:10:48now
0:10:49um what we have is the symmetry that allows for text
0:10:51sure
0:10:52and um later we will have
0:10:54more discussion on the ideas
0:10:56for on the design
0:10:58of this function
0:10:59is that it can
0:11:00um conceivably be in a better
0:11:03but
0:11:03to to reiterate what i just said
0:11:05um it's
0:11:06it's actually
0:11:07quite easy so if you had in
0:11:09initial enrolment utterance
0:11:10estimator speaker identity
0:11:12on W
0:11:13there is a one
0:11:14and you have incoming test utterance to supply this is assuming that
0:11:17these are all of your
0:11:18speaker i i didn't is right it is all you have
0:11:21just a single utterance
0:11:22and that you compute a score than if you're
0:11:24your singles what as one is great and data
0:11:27then
0:11:27you just
0:11:28take
0:11:29um you just take a test utterance and you can
0:11:31place it
0:11:32um
0:11:33in with
0:11:33with this and this becomes
0:11:35what you have is now when you you test utterance
0:11:37uh arcs are
0:11:38and you training utterance
0:11:39um
0:11:40that that was
0:11:41just to test
0:11:42and so and that that's how you simply admit
0:11:45um
0:11:45a bit
0:11:46more
0:11:47training vectors into in your set
0:11:50um and so now
0:11:51you had a
0:11:52second test
0:11:53utterance then
0:11:54and you can keep two scores
0:11:56right you have
0:11:57yeah you're initially
0:11:58um estimated
0:12:00speaker identity and this new
0:12:01training i utterance W so uh T one
0:12:04and you can see these two scores and now
0:12:06if you um
0:12:07if that function
0:12:08of those two scores is
0:12:09again greater than your fixed threshold data
0:12:11then you do the same thing and
0:12:15and do you
0:12:16you put another
0:12:17you training utterance
0:12:18yeah
0:12:19so
0:12:20that's
0:12:20that's all that's
0:12:21and so
0:12:22and the emphasis
0:12:24behind this approach again is
0:12:26such that we do not need to change decision thresholds data
0:12:29um
0:12:30previously that the idea i
0:12:32in the past
0:12:32in related work with that
0:12:34with more
0:12:35i i adapted utterances or
0:12:36and all that in the text dependent setting
0:12:39um
0:12:39they
0:12:40the
0:12:41what some work was doing
0:12:43in the past was just that you would um
0:12:45increase
0:12:46the decision threshold with each adaptation utterance before us
0:12:49we want to keep things as simple as possible and so
0:12:51we decided
0:12:52that we did not wanna change in your decision there shop data
0:12:55and also there's simply right now there's no modification of a troll factor vectors
0:12:59simply all we're doing is actually combined score
0:13:02um and now so that
0:13:04summarises what variability and um unsupervised adaptation
0:13:08something
0:13:08go really quickly in the score normalisation which
0:13:11i am aware
0:13:12it's very well known topic
0:13:14um
0:13:15and so i'm not gonna give it a
0:13:17give it a pretty
0:13:17brief review
0:13:19um with with uh
0:13:20couple of
0:13:21indices
0:13:22um
0:13:22in in the wording
0:13:24but
0:13:27so
0:13:27it for the idea behind score normalisation
0:13:30is that we are assuming that
0:13:31distribution of target speaker
0:13:33and impostor scores
0:13:34followed to the string
0:13:36two distinct
0:13:37normal distribution
0:13:38um and
0:13:39however the the parameters for these two distributions are
0:13:42on speaker independent
0:13:43start part
0:13:44target speaker depending as such we need to normalise to allow for
0:13:47universal decision pressure
0:13:48and so in zero normalisation ones enormous is well known
0:13:51um we scale each of the distribution of scores
0:13:53produced by target speaker
0:13:55model and and a set of impostor utterances
0:13:58two
0:13:58standard normal distribution
0:14:00now in test normalisation to no one
0:14:02it's like the same thing it said
0:14:04um
0:14:04in order to adjust for the idea of intersession variability
0:14:07we we
0:14:08sh
0:14:09scale into the distribution of scores produced by test
0:14:11utterance
0:14:12and the set of impostor models
0:14:14um
0:14:15to see a normal distribution now the the idea here is to keep
0:14:18in mind that the ad tell size words um utterance and model
0:14:21um not discuss how how the related in total but in the context of total variability in just a little
0:14:25bit
0:14:26is easy norm
0:14:27we've already seen that um it achieves the best results
0:14:30uh
0:14:31in a factor analysis based system
0:14:33um
0:14:34and and that that's what's currently being used
0:14:36stay there
0:14:38so
0:14:38now what we have
0:14:40here with um
0:14:42oh sorry uh what we have in G T not parameter updates during this model adaptation is
0:14:47is
0:14:47the lack of any
0:14:49um
0:14:50or
0:14:52uh the need for
0:14:53for more normalisation parameters right because
0:14:55it is we can indeed become training utterances
0:14:58um
0:14:59W
0:15:00the T one and diversity to then
0:15:02on their previously test utterance
0:15:04so they already had a T non associated with them
0:15:06but
0:15:06what we need additionally is obviously as you don't to be computed
0:15:09but that's actually it so that means we can
0:15:12simply
0:15:12um precompute
0:15:14roc norm parameters for each test utterance
0:15:16as we do it
0:15:17i
0:15:18the same way we do it for each um
0:15:20for each
0:15:21target
0:15:21figure utterances are
0:15:22target speaker model as well
0:15:24and so that's all we need to do
0:15:26in in the past we had
0:15:27compute is um
0:15:29adapted to known parameters
0:15:31um yeah that we can do is you don't parameters after each adaptation update however here
0:15:35very simple and it's very in should be much quicker
0:15:38um
0:15:39now
0:15:40the next
0:15:41thing was
0:15:42that
0:15:43total variability
0:15:44in the context of
0:15:45the difference between
0:15:47utterances
0:15:48and
0:15:48uh models
0:15:49well
0:15:50total variability uses factor analysis
0:15:52the front end
0:15:53and so
0:15:54all we have is that
0:15:56the extraction of total factors
0:15:58from an enrolment
0:15:59or test utterance
0:16:00follows really the exact same process and as such
0:16:03there's really no difference
0:16:05between an utterance
0:16:07and the model
0:16:08and
0:16:09and it there's no a and
0:16:11also with the cosine similarity in the symmetry behind all that
0:16:14um there is
0:16:15no distinction to be made as that we can think of them all as the same thing
0:16:19which brings us to an even more simplified method of score normalisation which is
0:16:23which isn't which is the
0:16:24S norm um
0:16:26which um
0:16:27which is
0:16:28yeah
0:16:28thing that uh that uh after kenny had
0:16:31um and so
0:16:32really
0:16:33all we do in implementing the as non here is
0:16:36two
0:16:36define a new set of impostors which is simply the union
0:16:39this is the non impostors in the teen on impulse
0:16:42and
0:16:42we get is the new scoring function
0:16:45um
0:16:46that
0:16:47that looks
0:16:47pretty similar to any other normalisation function that we have except we just simply
0:16:52the two
0:16:53uh added to normalise
0:16:55scores and this becomes R S nine
0:16:57um what the the first time there
0:16:59refers to
0:17:00um use of W S which is uh
0:17:03which is the estimate parameters associated with
0:17:05on your your model
0:17:07on W S
0:17:07and then you have your score
0:17:09um are you have you mute
0:17:10so the uh sixty a
0:17:12which
0:17:13becomes simply that's not
0:17:15you know
0:17:16your uh
0:17:17the
0:17:19the test utterance
0:17:19so
0:17:20so what this gives us now is universal procedure for exactly normalisation parameters and
0:17:25correspondingly simple method
0:17:26for score normal
0:17:28and then
0:17:29the the the last step of um
0:17:31normalisation that we we
0:17:33i explored here
0:17:34was
0:17:35um previously discussed in the genes discussion
0:17:37but um that's sort of what it looks like
0:17:39um i just as a quick reminder
0:17:42so
0:17:42we have now are are some maybe some some experiments that were right
0:17:46and the system that i used here was
0:17:48really the same system that um
0:17:50that niche in previously had
0:17:51um
0:17:53given that we're working together so
0:17:54yeah you're you're standard
0:17:56parameters for for the system setup
0:17:58and
0:18:00and then the
0:18:01the table of the corpora that we use
0:18:03um
0:18:04which
0:18:05can take a more detailed look at at some other
0:18:07time like
0:18:08but
0:18:09um that the idea of with with this
0:18:11the protocol of our experiments was
0:18:13we ran
0:18:13we wanted to test
0:18:15our results are based on the female part of the two thousand eight
0:18:18nist sre
0:18:19on data set and we fix
0:18:21our decision an adaptation threshold data
0:18:24um as the optimal
0:18:26min dcf
0:18:27a posteriori
0:18:28decision threshold and development data uh
0:18:30nist two thousand
0:18:34um and so these are like
0:18:36the the ten second condition results what we
0:18:38we cared mostly about the ten second
0:18:40results here
0:18:41and um
0:18:43and these are these are the results that we see we we can see that uh
0:18:46in terms of the minimising detection cost function the
0:18:49the adaptation based
0:18:50ct norm on
0:18:51achieves the best uh min dcf
0:18:54um whereas
0:18:55and there
0:18:56the
0:18:57then normalised cosine
0:18:59on that
0:19:00then is in previously discussed did it also very well
0:19:03um
0:19:04did very well
0:19:04yeah um
0:19:05or
0:19:06for everything else
0:19:07um the english trials for equal error rates
0:19:10and
0:19:10also than than in dcf or
0:19:12all all trials
0:19:17and but at the same time we we can also notice
0:19:20that uh
0:19:20as normal also come uh
0:19:22achieves
0:19:23uh good results very competitive
0:19:25um
0:19:26and in in in some cases even even better
0:19:29uh then
0:19:30then then then Z T norm
0:19:32and this
0:19:33um at least for english
0:19:34whatever
0:19:35so
0:19:36uh
0:19:38so
0:19:39in order to validate our results we also tried
0:19:41our work
0:19:41in the in the i guess this
0:19:44uh the
0:19:45log of the conversation the longer
0:19:47utterances and um
0:19:49and and this one in this case uh regions
0:19:51normalisations
0:19:52stan actually sweat
0:19:53our results across the board
0:19:55um
0:19:55in
0:19:56in achieving that the best results here
0:20:00um but
0:20:00and at the same time there there are a couple things
0:20:03to to take no
0:20:04um
0:20:06in in that um
0:20:07well we can see here
0:20:09is that uh
0:20:11what are our proposed adaptation algorithm is
0:20:13um
0:20:14successful
0:20:15in
0:20:16improving performance
0:20:17results uh regardless of the normalisation procedure
0:20:20um we
0:20:21it this is obviously consisting with the notion that unsupervised adaptation with of course an appropriately problem
0:20:26chosen threshold
0:20:27on should be at least as good as
0:20:29the hopefully better than on the baseline method without
0:20:32adaptation
0:20:33um
0:20:34and
0:20:35the next thing is that we have our simplified as an approach
0:20:38for
0:20:38performance
0:20:39competitively with the more complicated traditional teaching on but
0:20:43um
0:20:44and of course
0:20:45um what we've seen is that
0:20:46the best result is ultimately obtained using cosine normalisation
0:20:50um
0:20:50and as a result we i i think
0:20:53one of the
0:20:53cooler things is that we we we seem to come full circle with the study
0:20:57and the story of a score normalisation techniques in that
0:21:00um
0:21:02in the beginning we needed normalisation techniques in order to better
0:21:05sort of calibrate our scores on for fixed decision threshold
0:21:08and then
0:21:09however as we got to like the most complicated ct norm
0:21:11we actually started going backwards and sort of simplifying things
0:21:14um into an as normal
0:21:16which is much easier to calculate
0:21:18and then
0:21:18and then now it's
0:21:19it's almost a
0:21:20um
0:21:22it that the parameters that we need
0:21:24are
0:21:25are not speaker dependent at all there is no
0:21:27need in the in the normalised cosine distance to actually
0:21:30have
0:21:31um
0:21:32each speaker parameter
0:21:33uh each uh that the parameters of each distribution calculated
0:21:37um for each speaker it's just it's pretty it's a pretty universal
0:21:40um
0:21:41set of
0:21:42parameters that need you
0:21:44um and
0:21:45and now um i there's a
0:21:47bit of
0:21:47work
0:21:48that i i i
0:21:49brought up earlier that where we decided that a lot
0:21:52maybe
0:21:53um there's a better way to improve
0:21:55um our our score combination function and this is
0:21:58um
0:21:59basic some basic ideas i don't want to go over
0:22:01that we're currently working but we don't have any uh
0:22:04really we don't have any significant improvement
0:22:06in our results just yet
0:22:08in terms of uh
0:22:10in terms of our results
0:22:11um however the idea is weighted averaging because
0:22:14are currently proposed method for combining scores really treat
0:22:17every vector in the set
0:22:18oh so you control factors
0:22:20as equally important however
0:22:22at the end of the day on the only
0:22:25back to that unequivocally
0:22:26um
0:22:27the long
0:22:28speaker
0:22:28to the speaker as is the initial one long vector as such
0:22:31maybe it makes more sense to wait that vector is a little bit higher
0:22:35um then
0:22:36then we have uh then then the rest of the
0:22:39training utterances that that we admit
0:22:40because
0:22:41the presence of false alarm adaptation updates
0:22:43um in which
0:22:44it has utterances incorrectly admitted
0:22:46um will have an adverse effect on all sorts
0:22:49test
0:22:49i said maybe
0:22:50R R score combining functions should take
0:22:52take the following into one
0:22:54into account
0:22:55um
0:22:56where
0:22:57uh where we we each score by
0:22:59a coefficient eight
0:23:00where the school is a is um
0:23:02is the unnormalised score like the something like the cosine similarity
0:23:06on that ranges
0:23:07between negative one and one
0:23:08so it can be seen as a way
0:23:10teach score
0:23:11and then that that's sort of
0:23:13well we can look at
0:23:14on an a quick visualisation and all morning at time
0:23:17right now
0:23:17is that uh we can we can simply
0:23:20take a look at it this way where
0:23:21or if you're
0:23:22initial identity speaker vector is uh
0:23:25W someone is on most important for the next you vectors might be erroneous because
0:23:29uh based on
0:23:30so you're threshold you're only allowing
0:23:32um the green circle the region in the green circle
0:23:34in in so
0:23:35um if you're to speaker identity is S
0:23:38then
0:23:38you may incorrectly
0:23:40uh allow some factors in
0:23:42um however
0:23:43as you get
0:23:45hmmm
0:23:46um
0:23:47yeah
0:23:48a as you get more and more
0:23:49vectors then we can we can be more more certain that they belong
0:23:53into the speaker's identity
0:23:54and so
0:23:56what we would have
0:23:57is like
0:23:57you add a training vector and use it that
0:23:59space little bit and then
0:24:01and then maybe you you add in an incorrect one
0:24:04um but then you i don't more correct one and and
0:24:06uh as a result after
0:24:08after a couple more these utterances you can see
0:24:10finally that maybe
0:24:11you get you get the right one however you did that false alarm one maybe you should
0:24:15take it out or so
0:24:16like that
0:24:17um
0:24:17and so
0:24:18that's sort of like what we're looking at working on in in feature and and we're still looking to improve
0:24:22the score combination function
0:24:24welcome to any ideas and
0:24:25and um
0:24:27and one of the ideas
0:24:28also is that uh
0:24:29though it's not allowed in in this protocol say but
0:24:32but since we're most airport in the beginning after a while
0:24:35maybe we can go take another look
0:24:36at the training vectors and correct any errors from the beginning
0:24:39uh
0:24:41and this is easy to do because we don't actually modify
0:24:43modify vectors at all
0:24:45um
0:24:47and so
0:24:48we have a final summary is that we have
0:24:51uh propose they
0:24:52uh a method for unsupervised speaker adaptation um in the use of total variability cosine scoring
0:24:58um
0:24:59we have simple and efficient method for score combination
0:25:02and the fixed a priori decision threshold before class
0:25:05and um and this
0:25:06method here can also easily comedy
0:25:09all score normalisation procedures
0:25:11um
0:25:12and
0:25:12with respect to score normalisation
0:25:14um discuss some
0:25:16some of the more
0:25:17then you were
0:25:18non C T norm
0:25:19um ideas which are like a snore
0:25:21and the normalised cosine that
0:25:23that the gene
0:25:24um
0:25:24a talk about it
0:25:26so
0:25:27oh thanks
0:25:37uh the the
0:25:38questions for uh
0:25:39steven
0:25:48you different paper
0:25:49the last one was proposed by
0:25:52each student was in the
0:25:54we look at this problem and trying to sure but
0:25:57if you are using a fixed threshold
0:26:01what
0:26:01never
0:26:02the eighteen you uh speaker model
0:26:05you want thinking about the new uh
0:26:08cost function proposed by needs to be sure
0:26:11you know
0:26:11why
0:26:12new terms
0:26:13to uh
0:26:14you don't have the will to the beginning
0:26:16to have at least we all uh
0:26:20we need if we should all over the news for sure
0:26:23and we propose in this paper
0:26:25the way to do that they should like
0:26:27you are doing to schools
0:26:28we continue so that they shouldn't
0:26:30oh
0:26:31just
0:26:32try to evaluate the confidence so
0:26:34which we all
0:26:35and it would look easy
0:26:37wait
0:26:38this real with
0:26:39the confidence
0:26:41and
0:26:41use it for good reading put increasing between you do twice
0:26:44fig legal
0:26:47um
0:26:48so so the question is
0:26:49whether or not i tried
0:26:50using
0:26:52we should could be first
0:26:53you using
0:26:54but
0:26:55using a six
0:26:57sure
0:26:58two
0:26:58so if you well
0:27:00using or not to be only two two
0:27:03to view speaker model
0:27:05using
0:27:06good solution but
0:27:08would be
0:27:09we use to
0:27:11um
0:27:12if um
0:27:12upon hearing the question correctly um
0:27:16i think so yeah uh that the use of a a fixed
0:27:19decision to
0:27:20four
0:27:20i i believe would have
0:27:22um
0:27:22well i
0:27:24at the end of the i think it makes things simple
0:27:26um
0:27:28as opposed like
0:27:29and then
0:27:30our
0:27:32using a fixed decision threshold
0:27:34but at the same time using um a
0:27:36of varying score
0:27:37combination function that
0:27:39that will
0:27:40wait
0:27:41wait
0:27:41scores
0:27:42um
0:27:43oh
0:27:43each
0:27:45of each training
0:27:46training utterance that we have
0:27:47um i i think that that's like a pretty way good way to
0:27:51pretty good
0:27:51good way to do it but uh i guess
0:27:53we we could
0:27:54talk more about it
0:27:55later yeah i
0:27:56i would just be so simple but we question
0:27:58yeah
0:27:59when you
0:28:00compute you
0:28:01try to imagine you compute to weighted school board would exist
0:28:05resort materials
0:28:07okay
0:28:08i'm just the blah you addition reachable
0:28:11on this school
0:28:12you think it would be to work with them until now
0:28:17um
0:28:19oh computing the score with
0:28:20all the
0:28:22every single
0:28:23sure trial
0:28:25so i'm i'm not
0:28:28okay
0:28:30uh jumps also i i also have the non certainly we just will on it
0:28:35uh i also have an onset to uh
0:28:38your question about
0:28:40prior
0:28:41uh
0:28:42and
0:28:42uh
0:28:43i have a a rather mixed in the comment
0:28:46on on this in my presentation tomorrow
0:28:49so uh
0:28:50um
0:28:52it's
0:28:52this
0:28:53this
0:28:53do that
0:28:56might be a last question too
0:28:58steven
0:29:03okay
0:29:03it's time to speak again