0:00:07 | i have a |
---|
0:00:08 | no i mcgill or talk about that |
---|
0:00:11 | not all background model |
---|
0:00:13 | well speaker verification |
---|
0:00:14 | and the author of this paper |
---|
0:00:17 | they are |
---|
0:00:18 | we found the assumption and the downhill |
---|
0:00:21 | and uh and a friend of which are down |
---|
0:00:23 | and the assumption |
---|
0:00:25 | my name is on the other hand i you would |
---|
0:00:27 | talk |
---|
0:00:28 | uh |
---|
0:00:28 | that |
---|
0:00:29 | this |
---|
0:00:30 | um |
---|
0:00:31 | the idea i mean these people are used to run into a single i think all for you |
---|
0:00:36 | and |
---|
0:00:37 | then |
---|
0:00:37 | yeah |
---|
0:00:38 | clean up the only |
---|
0:00:39 | no |
---|
0:00:40 | being |
---|
0:00:41 | yeah this is the one that is having |
---|
0:00:43 | cable |
---|
0:00:43 | first introduction |
---|
0:00:45 | in this introduction |
---|
0:00:46 | i will |
---|
0:00:47 | uh in |
---|
0:00:48 | why we propose gift ideas and bases also our motivation |
---|
0:00:53 | second |
---|
0:00:54 | we really nice okay that vocal tract yeah school to speaker recognition |
---|
0:00:59 | and then to to uh idea |
---|
0:01:01 | when we do some |
---|
0:01:03 | and then |
---|
0:01:03 | and this |
---|
0:01:05 | we can fix |
---|
0:01:06 | parenthood |
---|
0:01:07 | that ah |
---|
0:01:08 | finally |
---|
0:01:09 | we can raise the ticket |
---|
0:01:10 | and and the multiple background models are proposed |
---|
0:01:14 | and then |
---|
0:01:15 | no |
---|
0:01:15 | i have a cable components |
---|
0:01:19 | first |
---|
0:01:21 | got a mixture model |
---|
0:01:23 | mixture model when was so background model |
---|
0:01:25 | is that because speaker or occasions this term |
---|
0:01:28 | is that way |
---|
0:01:30 | forty |
---|
0:01:30 | quality of that case |
---|
0:01:32 | the |
---|
0:01:32 | they all thought |
---|
0:01:33 | this term |
---|
0:01:34 | such as |
---|
0:01:35 | but i do not see it |
---|
0:01:37 | and the whole thing |
---|
0:01:37 | attribute projection |
---|
0:01:39 | is based on |
---|
0:01:40 | um |
---|
0:01:41 | uh |
---|
0:01:42 | but that but that the |
---|
0:01:43 | the |
---|
0:01:44 | at that |
---|
0:01:45 | gmm |
---|
0:01:46 | you'll be an ad is basic |
---|
0:01:48 | structure |
---|
0:01:49 | and the |
---|
0:01:50 | the most important |
---|
0:01:52 | rolling in this |
---|
0:01:53 | in this this |
---|
0:01:54 | this |
---|
0:01:54 | basics but |
---|
0:01:55 | strong is the |
---|
0:01:56 | you'll be an |
---|
0:01:57 | and that we think are a complete ubm is supposed to |
---|
0:02:02 | right yeah that the speaker independent feature distribution |
---|
0:02:06 | and uh there are no man starts to counting the quantity of ubm |
---|
0:02:10 | first |
---|
0:02:11 | which we are |
---|
0:02:12 | and misc data |
---|
0:02:14 | and the means we pay for all the data better |
---|
0:02:16 | two three but when it was so yeah |
---|
0:02:18 | as a second |
---|
0:02:19 | right right |
---|
0:02:20 | gender |
---|
0:02:21 | oh channel |
---|
0:02:22 | and then |
---|
0:02:23 | but |
---|
0:02:23 | ubm |
---|
0:02:25 | but uh |
---|
0:02:26 | there may be |
---|
0:02:27 | uh |
---|
0:02:28 | there |
---|
0:02:28 | there |
---|
0:02:29 | there are other approaches |
---|
0:02:32 | first that |
---|
0:02:33 | okay |
---|
0:02:34 | well okay yeah |
---|
0:02:35 | the speaker our unity expensive i might have back |
---|
0:02:40 | such that |
---|
0:02:41 | speech rate |
---|
0:02:42 | speech what we'll |
---|
0:02:43 | emotion |
---|
0:02:44 | well collector and the song |
---|
0:02:46 | but the major differences between the speaker |
---|
0:02:48 | is due to the difference |
---|
0:02:50 | between their average week yeah |
---|
0:02:52 | so in speech recognition |
---|
0:02:54 | well we'll check yes no medication |
---|
0:02:56 | is |
---|
0:02:57 | is also used that to obtain |
---|
0:02:59 | speaker independence insurance |
---|
0:03:04 | now here is like |
---|
0:03:06 | uh |
---|
0:03:07 | you're only kills the frequency warping function |
---|
0:03:10 | it's the crew though |
---|
0:03:11 | this is the original frequency and bases there |
---|
0:03:15 | what the frequency |
---|
0:03:16 | and this is what you that current |
---|
0:03:18 | we want |
---|
0:03:19 | and that is what vector |
---|
0:03:21 | is that when we want to |
---|
0:03:23 | they want to get |
---|
0:03:25 | but unfortunately |
---|
0:03:26 | there is no closed expression for days but still |
---|
0:03:32 | we use |
---|
0:03:32 | this |
---|
0:03:33 | this |
---|
0:03:34 | great |
---|
0:03:34 | to get |
---|
0:03:36 | okay yes |
---|
0:03:38 | and that this is what the speech |
---|
0:03:40 | what what the features |
---|
0:03:41 | is that what models |
---|
0:03:43 | and then the rate of this value |
---|
0:03:45 | is that they are |
---|
0:03:47 | zero point |
---|
0:03:48 | eight |
---|
0:03:48 | eighty two |
---|
0:03:49 | one point yeah |
---|
0:03:51 | well to waste that's that's |
---|
0:03:53 | zero |
---|
0:03:54 | point |
---|
0:03:54 | zero eight |
---|
0:03:55 | zero two |
---|
0:04:01 | now we look at |
---|
0:04:02 | i would think that |
---|
0:04:03 | her mental state at the paper |
---|
0:04:05 | okay |
---|
0:04:05 | parents were having a a |
---|
0:04:08 | yeah |
---|
0:04:08 | i thought i was on a six |
---|
0:04:10 | corpora encode past |
---|
0:04:11 | condition |
---|
0:04:12 | and in crosschannel conditions |
---|
0:04:15 | that you'll be answering data were selected |
---|
0:04:17 | from use |
---|
0:04:18 | S I to solve |
---|
0:04:19 | four |
---|
0:04:20 | one side |
---|
0:04:21 | there are about |
---|
0:04:22 | sixties |
---|
0:04:23 | and what it and the sixteen afternoon |
---|
0:04:25 | and that |
---|
0:04:26 | as i see two thousand and three at |
---|
0:04:28 | as i tucson and two corpora |
---|
0:04:30 | yeah about five hundred utterances |
---|
0:04:34 | notice the feature where you mean you |
---|
0:04:36 | and then a fifty |
---|
0:04:37 | sexual mean subtraction feature what you know they're accepted |
---|
0:04:41 | acceleration |
---|
0:04:42 | and that right there |
---|
0:04:43 | i use the |
---|
0:04:44 | so we come back yeah |
---|
0:04:46 | the feature weights that if you choose dimension |
---|
0:04:48 | then hlda is used |
---|
0:04:51 | uh the final |
---|
0:04:52 | dimension of the feature you starting now |
---|
0:04:57 | oh |
---|
0:04:58 | this the finger of readout distribution |
---|
0:05:01 | we present this encourage us |
---|
0:05:03 | not want to |
---|
0:05:04 | industry that the difference between male and female |
---|
0:05:07 | we want to |
---|
0:05:09 | focus on the S |
---|
0:05:11 | it's not as bad |
---|
0:05:12 | uh |
---|
0:05:13 | the wedding is |
---|
0:05:15 | uh he's |
---|
0:05:16 | is that |
---|
0:05:17 | uh the value range from this |
---|
0:05:18 | if we used this value to it wide |
---|
0:05:21 | paper |
---|
0:05:22 | two three will be a |
---|
0:05:23 | may be |
---|
0:05:24 | maybe we can get that arnold yeah |
---|
0:05:29 | so |
---|
0:05:31 | that it has that much attention |
---|
0:05:33 | we might need ubm training they turned into a to use pointed it here |
---|
0:05:37 | says holding to the warping factor for example |
---|
0:05:41 | uh database |
---|
0:05:42 | first the |
---|
0:05:43 | the walking factories |
---|
0:05:45 | zero point eight eight |
---|
0:05:47 | we have one hundred and the |
---|
0:05:49 | it's it's three utterances |
---|
0:05:54 | no this is the |
---|
0:05:55 | whole structure of our proposed multiple background model |
---|
0:06:00 | uh i think that this structure is that right |
---|
0:06:04 | gmm you yeah |
---|
0:06:06 | uh |
---|
0:06:06 | i'm the think different these days |
---|
0:06:09 | new gmm ubm |
---|
0:06:11 | there is only one ubm |
---|
0:06:13 | and then |
---|
0:06:14 | in this way how ubm who |
---|
0:06:17 | you're you |
---|
0:06:18 | map adaptation |
---|
0:06:20 | each ubm is adapted to generate |
---|
0:06:23 | uh unique |
---|
0:06:24 | a speaker model |
---|
0:06:26 | and they |
---|
0:06:27 | ubm and the speaker models have warmed up here |
---|
0:06:30 | the only in the test the framework |
---|
0:06:32 | and i is used for |
---|
0:06:35 | yeah ubm and the speaker |
---|
0:06:37 | G G M and to |
---|
0:06:39 | table |
---|
0:06:39 | the |
---|
0:06:40 | but phonetically |
---|
0:06:41 | we shall score |
---|
0:06:46 | that |
---|
0:06:46 | well that's what kinda stuff |
---|
0:06:48 | results |
---|
0:06:49 | baseline baseline performance |
---|
0:06:51 | and that |
---|
0:06:52 | uh |
---|
0:06:53 | wait you gender independent yeah and ubm sister |
---|
0:06:58 | uh |
---|
0:06:59 | the eer for the forecast conditions are about ten percent |
---|
0:07:07 | and then |
---|
0:07:08 | wait while the |
---|
0:07:09 | and by the data |
---|
0:07:10 | you show and the |
---|
0:07:12 | where you gender dependent |
---|
0:07:14 | ubm |
---|
0:07:19 | the results |
---|
0:07:20 | if |
---|
0:07:21 | if the gender all pool ubm an agenda of confusion unmatched |
---|
0:07:26 | then |
---|
0:07:27 | the performance |
---|
0:07:28 | i'm not be improved |
---|
0:07:30 | but |
---|
0:07:30 | if the cross gender confusion |
---|
0:07:33 | just contact dave |
---|
0:07:34 | and the days |
---|
0:07:35 | we can |
---|
0:07:36 | how were bad |
---|
0:07:37 | without |
---|
0:07:39 | now this is that we have |
---|
0:07:42 | dependent ubm |
---|
0:07:44 | problem |
---|
0:07:44 | it's table we can see that |
---|
0:07:46 | four female condition |
---|
0:07:48 | ubm to game |
---|
0:07:49 | the bastards are |
---|
0:07:52 | and a four male can you should ubm six |
---|
0:07:54 | scale the best result |
---|
0:08:02 | not have a good some performance |
---|
0:08:04 | comparing that you'll be able to resolve for female conditions |
---|
0:08:08 | and the ubm six results |
---|
0:08:10 | for many conditions admits that it's not |
---|
0:08:12 | we can find that i |
---|
0:08:13 | are you yeah maze |
---|
0:08:15 | for an aspect and will so that get the training data |
---|
0:08:18 | that are contained in the back the performance |
---|
0:08:21 | in the ubm with |
---|
0:08:22 | all the training before |
---|
0:08:26 | now that's reached into his finger again |
---|
0:08:28 | uh |
---|
0:08:30 | we can get or |
---|
0:08:31 | a lot of space |
---|
0:08:33 | but there is wise enough to have his if you will |
---|
0:08:40 | for a test utterance |
---|
0:08:41 | which we are hampered you also but |
---|
0:08:43 | okay |
---|
0:08:44 | racial or just connect the ends |
---|
0:08:47 | and B M can obtain a score vector |
---|
0:08:50 | we can use |
---|
0:08:51 | coffee remastered |
---|
0:08:52 | to obtain the final results |
---|
0:08:54 | when we talk about you and and the contributions of it all in singapore |
---|
0:08:59 | vocal |
---|
0:08:59 | and that's really |
---|
0:09:01 | uh powerful tools |
---|
0:09:02 | back to |
---|
0:09:04 | but here we just want to |
---|
0:09:06 | some simple |
---|
0:09:08 | and the |
---|
0:09:09 | a simple |
---|
0:09:10 | simple |
---|
0:09:11 | uh |
---|
0:09:12 | buster |
---|
0:09:13 | first |
---|
0:09:15 | have a mismatch right |
---|
0:09:17 | uh we just |
---|
0:09:18 | you have a very |
---|
0:09:19 | but uh |
---|
0:09:20 | you look at this thing right |
---|
0:09:22 | the results |
---|
0:09:23 | it's not very good |
---|
0:09:27 | and then we'll back |
---|
0:09:29 | maximum |
---|
0:09:30 | exactly what the master |
---|
0:09:32 | and we use |
---|
0:09:33 | the ubm |
---|
0:09:34 | which |
---|
0:09:35 | do not report is the max |
---|
0:09:37 | as the final score |
---|
0:09:39 | but |
---|
0:09:40 | the |
---|
0:09:40 | yeah we can discuss this also |
---|
0:09:45 | and that's the way your minimum that you were the racial master |
---|
0:09:48 | and the you know gives the best results |
---|
0:09:51 | model based remastered |
---|
0:10:01 | and then there is the question arises why the minimum |
---|
0:10:04 | yeah the racial matters K with the best result |
---|
0:10:07 | unfortunately |
---|
0:10:09 | wait i'm |
---|
0:10:09 | know the exact reason |
---|
0:10:11 | uh |
---|
0:10:13 | in |
---|
0:10:13 | intuitively that |
---|
0:10:14 | peak |
---|
0:10:15 | we we we we try to do uh |
---|
0:10:18 | ah |
---|
0:10:18 | combination |
---|
0:10:19 | in |
---|
0:10:20 | jokingly that speaker |
---|
0:10:22 | yeah and i'm actually |
---|
0:10:23 | for the and the the you yeah you would wear both |
---|
0:10:26 | increase if what match |
---|
0:10:28 | has utterance is in court |
---|
0:10:30 | and this is just which transmits meetings bother with no the reason |
---|
0:10:34 | uh |
---|
0:10:35 | we can make it |
---|
0:10:37 | uh |
---|
0:10:38 | the |
---|
0:10:39 | means and the standard |
---|
0:10:41 | every iteration of that |
---|
0:10:42 | that |
---|
0:10:43 | oh |
---|
0:10:43 | that we would rituals |
---|
0:10:45 | oh |
---|
0:10:45 | well as i tucson has six |
---|
0:10:47 | which each |
---|
0:10:48 | yeah |
---|
0:10:49 | and we put a thinker |
---|
0:10:51 | just like this |
---|
0:10:53 | uh i have to say that |
---|
0:10:55 | it's finger is not |
---|
0:10:56 | the reason of this |
---|
0:10:58 | send |
---|
0:10:59 | of this intense |
---|
0:11:00 | uh |
---|
0:11:02 | we just want to know why |
---|
0:11:09 | now |
---|
0:11:09 | we you know the components |
---|
0:11:11 | ah in this paper with was to investigate here |
---|
0:11:14 | the week yeah that's the right term |
---|
0:11:17 | for ubm training the interesting action |
---|
0:11:20 | experiment |
---|
0:11:21 | short time |
---|
0:11:22 | that you'll be actually them is about you |
---|
0:11:24 | new media data with battered in the ubm trend with all that they are |
---|
0:11:29 | based on this finding |
---|
0:11:30 | we further propose a multiple background model system |
---|
0:11:34 | yeah right |
---|
0:11:35 | you take multiple speaker gmm and ubm yeah |
---|
0:11:38 | for speaker recognition |
---|
0:11:41 | uh through minimum |
---|
0:11:43 | now we we shall feel with the proposed master |
---|
0:11:46 | and improve the performance |
---|
0:11:48 | i'm used to be |
---|
0:11:50 | but yeah |
---|
0:11:51 | you're right |
---|
0:11:51 | open questions |
---|
0:11:53 | what the minimum that we would we show master gave the best results |
---|
0:11:56 | it's just locally experience |
---|
0:11:58 | uh we will be posting the slow but |
---|
0:12:02 | the property is under investigation |
---|
0:12:04 | well techniques to improve the state of |
---|
0:12:08 | uh standard of that |
---|
0:12:09 | this term |
---|
0:12:10 | uh for example if you |
---|
0:12:12 | the yeah the system and the |
---|
0:12:15 | yeah system |
---|
0:12:16 | you are you yeah |
---|
0:12:17 | the performance |
---|
0:12:18 | yeah |
---|
0:12:19 | cool |
---|
0:12:20 | uh we know and the way people |
---|
0:12:22 | to the expression |
---|
0:12:23 | experiment |
---|
0:12:24 | how about that |
---|
0:12:26 | computational cost the and the bases another |
---|
0:12:29 | sure |
---|
0:12:31 | finally |
---|
0:12:32 | i just talking about |
---|
0:12:40 | you have plenty of time for questions |
---|
0:12:45 | there were so |
---|
0:12:59 | hmmm |
---|
0:13:00 | oh |
---|
0:13:01 | i'm sorry it wasn't clear to me exactly how you choosing the |
---|
0:13:05 | you have multiple unions and we'll selecting what |
---|
0:13:08 | within each ubm |
---|
0:13:10 | you mean the |
---|
0:13:11 | two semester |
---|
0:13:13 | you just you have multiple units which are built a little different datasets to uh address |
---|
0:13:18 | how do you decide what |
---|
0:13:20 | when |
---|
0:13:20 | each year |
---|
0:13:22 | uh |
---|
0:13:23 | yeah |
---|
0:13:26 | it's cigarettes |
---|
0:13:27 | the what factors is different |
---|
0:13:30 | and the |
---|
0:13:31 | we use the |
---|
0:13:32 | if the |
---|
0:13:34 | hmmm |
---|
0:13:35 | for example that if the |
---|
0:13:37 | uh warping factor is that the airport mine then |
---|
0:13:41 | this is the |
---|
0:13:42 | with some back this paper |
---|
0:13:44 | to train the ubm |
---|
0:13:46 | when there is a website |
---|
0:13:47 | well it does that and this and that it is |
---|
0:13:50 | yeah |
---|
0:13:51 | what during the enrolment and the test the |
---|
0:13:53 | and the |
---|
0:13:54 | the uh |
---|
0:13:55 | we can have |
---|
0:13:56 | oh the tree data and test data and not extracted |
---|
0:14:00 | they are just the scroll back |
---|
0:14:02 | a two |
---|
0:14:02 | uh |
---|
0:14:03 | ubm and the speaker gmm pattern |
---|
0:14:11 | so i'm just going to be a a synthesis using the all the user density and the |
---|
0:14:16 | just to use |
---|
0:14:17 | they were easy to combine this discourse is used |
---|
0:14:22 | two |
---|
0:14:22 | to school too |
---|
0:14:23 | if i noticed like usual as well |
---|
0:14:26 | you you |
---|
0:14:29 | remote questions |
---|
0:14:39 | uh |
---|
0:14:40 | hmmm |
---|
0:14:41 | uh you know where you use |
---|
0:14:43 | yeah |
---|
0:14:44 | did you |
---|
0:14:45 | for speaker |
---|
0:14:46 | no |
---|
0:14:47 | i mean you mean how many of them is |
---|
0:14:50 | maybe if you assume that you are using |
---|
0:14:54 | you you |
---|
0:14:55 | you did |
---|
0:14:56 | you |
---|
0:14:58 | yeah |
---|
0:15:00 | you know |
---|
0:15:01 | yes |
---|
0:15:02 | i know that's |
---|
0:15:03 | question |
---|
0:15:03 | yeah in fact there are really many mass transit to |
---|
0:15:07 | uh so get data and the |
---|
0:15:09 | i think of like yeah it's just a way of them |
---|
0:15:12 | which has |
---|
0:15:13 | just well then |
---|
0:15:14 | and they are |
---|
0:15:15 | ah |
---|
0:15:16 | true |
---|
0:15:16 | yeah me either |
---|
0:15:18 | right |
---|
0:15:19 | it would be |
---|
0:15:28 | and that's where one last question you |
---|
0:15:32 | yeah |
---|
0:15:32 | hello |
---|
0:15:33 | i just one question um |
---|
0:15:36 | you just use the vocal tract length normalisation to select |
---|
0:15:40 | the population for the building the different buttons models |
---|
0:15:43 | and you can also use the acoustically |
---|
0:15:46 | to to select |
---|
0:15:48 | appropriate calls for normalisation |
---|
0:15:50 | looking for speakers |
---|
0:15:52 | which uh |
---|
0:15:53 | more close to the |
---|
0:15:55 | actually the speaker |
---|
0:15:56 | how do you own it is pretty maybe comparing |
---|
0:15:58 | using different but remotes or different population for |
---|
0:16:02 | normalisation with |
---|
0:16:03 | we propose from it |
---|
0:16:07 | i can't |
---|
0:16:08 | i think |
---|
0:16:09 | it's a |
---|
0:16:09 | i'm i'm just |
---|
0:16:10 | the asking these uh |
---|
0:16:12 | it's do you uh do you have |
---|
0:16:14 | this is also the vocal tract length normalisation |
---|
0:16:17 | to select |
---|
0:16:18 | a cohort |
---|
0:16:19 | six |
---|
0:16:20 | of the speaker |
---|
0:16:21 | for normalisation of the school |
---|
0:16:24 | yeah |
---|
0:16:25 | you mean |
---|
0:16:27 | i don't know |
---|
0:16:28 | uh |
---|
0:16:29 | i don't know what you |
---|
0:16:30 | good |
---|
0:16:31 | right |
---|
0:16:31 | the same |
---|
0:16:32 | that would be |
---|
0:16:33 | probably |
---|
0:16:35 | okay yeah she's didn't uh uh we don't want to just the framing of will or will not too close |
---|
0:16:41 | and uh if you |
---|
0:16:42 | yeah |
---|
0:16:42 | very nice |
---|
0:16:43 | did you do that |
---|
0:16:44 | for for you |
---|
0:16:45 | would you only |
---|
0:16:46 | yeah so you remotes and we move the twos |
---|
0:16:49 | that's because figure |
---|