0:00:06 | so i think today will you be happy to say two different |
---|
0:00:09 | point of view that mathematical point of view topic maybe that one generation |
---|
0:00:13 | i'm not saying that what's all just |
---|
0:00:15 | oh |
---|
0:00:16 | and engineering solution which is maybe fast |
---|
0:00:18 | and the more simple simplify life for us |
---|
0:00:21 | and |
---|
0:00:24 | so |
---|
0:00:25 | the topic today is about how we can make cosine distance scoring because |
---|
0:00:29 | you know buttocks |
---|
0:00:30 | this covers morning that he what is P L D A T don't need any score normalisation |
---|
0:00:34 | so we try to understand |
---|
0:00:36 | what is corn images in the wagon cosine scoring and how we can |
---|
0:00:40 | move the score normalisation from scum space to put a very but the spatial ivector space |
---|
0:00:46 | so the presentation um is |
---|
0:00:48 | uh organised as follows O four so we give an introduction on the contribution of this paper |
---|
0:00:53 | i'll try to define |
---|
0:00:55 | like a text i have to define what stop are very very space what is cosine distance |
---|
0:00:59 | our channel compensation work |
---|
0:01:01 | in this uh space |
---|
0:01:03 | and |
---|
0:01:04 | after that i will show you how how |
---|
0:01:06 | tensed and what score normalisation likes it enormously normal as normal is doing in the |
---|
0:01:11 | score it's gonna go uh design the cosine distance scoring |
---|
0:01:14 | and how we show you how we develop new scoring that |
---|
0:01:18 | not any score normalisation with it still there but we we just move it to the probability space |
---|
0:01:23 | and leave some experiment and result |
---|
0:01:25 | and |
---|
0:01:26 | finally give a conclusion |
---|
0:01:29 | so |
---|
0:01:30 | i recently would allow |
---|
0:01:31 | so new delhi motion speaker presentation |
---|
0:01:35 | we should make a lot of a lot easier for us you know |
---|
0:01:38 | it because now we just open the water pollution working can you can try i'll try P L D A |
---|
0:01:42 | wanted to the egg |
---|
0:01:43 | and it can be fatal |
---|
0:01:45 | with with this in this new dimension of the space |
---|
0:01:47 | so |
---|
0:01:48 | and we also there well |
---|
0:01:50 | i cosine scoring |
---|
0:01:51 | they don't need any target ornaments |
---|
0:01:53 | just to |
---|
0:01:54 | X X I |
---|
0:01:55 | i vector over time factors for target them |
---|
0:01:58 | yes |
---|
0:01:58 | and compute cosine distance and compile this threshold |
---|
0:02:01 | it's very easy |
---|
0:02:03 | is not complication there |
---|
0:02:04 | that's so this make the decision |
---|
0:02:07 | faster |
---|
0:02:08 | simple |
---|
0:02:08 | clacks complex |
---|
0:02:10 | there's no scatter |
---|
0:02:12 | so |
---|
0:02:13 | but |
---|
0:02:13 | yeah coming here we need spinach score normalisation |
---|
0:02:16 | so the with the wheel is it a normal as normal because in the new version of the system i |
---|
0:02:20 | use as normal so |
---|
0:02:22 | we did we did see no need that so |
---|
0:02:24 | so i try |
---|
0:02:25 | in this paper |
---|
0:02:26 | tensed and |
---|
0:02:27 | one score normalisation is doing |
---|
0:02:29 | and the cosine distance |
---|
0:02:30 | and |
---|
0:02:31 | how |
---|
0:02:32 | i can't assimilate |
---|
0:02:34 | this uh kind of scoring in the clean up but right |
---|
0:02:36 | space without going |
---|
0:02:38 | the score space |
---|
0:02:39 | so |
---|
0:02:40 | this is the thought that would talk on in this uh this part and |
---|
0:02:44 | what we did that you just want to do some speaker adaptation using cosine distance but |
---|
0:02:49 | i would not talk about in the paper you flip on some result |
---|
0:02:52 | but stephen was next |
---|
0:02:53 | presenter will talk about |
---|
0:02:55 | so we if you have any question about speak answer why that nation you can talk to him not to |
---|
0:02:59 | be |
---|
0:03:01 | okay |
---|
0:03:01 | so |
---|
0:03:03 | so if you |
---|
0:03:04 | now everyone down here now that C F A try to split |
---|
0:03:08 | in the general supervectors |
---|
0:03:10 | in two parts |
---|
0:03:11 | uh one part is the speaker space |
---|
0:03:13 | okay |
---|
0:03:14 | and the second is a sorry |
---|
0:03:16 | that's so |
---|
0:03:16 | with the first part is because by the second part of channel space |
---|
0:03:19 | so |
---|
0:03:21 | two years ago when we was engine option interested what's it is to watch a thousand eight |
---|
0:03:25 | we try to see |
---|
0:03:26 | the efficiency of it and he'd of for every line variable |
---|
0:03:30 | like speaker space common space and channel space |
---|
0:03:33 | so we |
---|
0:03:33 | take every component of this |
---|
0:03:35 | jfa and we put that in |
---|
0:03:38 | i don't was not much for support vector machine |
---|
0:03:40 | and we use cosine distance to see the performance |
---|
0:03:43 | so |
---|
0:03:44 | what was surprising that eigenchannel of this site the channel factors when we put it to to scrub the machine |
---|
0:03:50 | we are then to having black decorate |
---|
0:03:52 | fifty percent because |
---|
0:03:53 | normally channel factor don't contain speaker information |
---|
0:03:56 | we find that |
---|
0:03:56 | we have |
---|
0:03:57 | and incorrect of twenty |
---|
0:03:59 | so it means that |
---|
0:04:00 | affirmation that we are losing |
---|
0:04:01 | in this channel factors |
---|
0:04:03 | so in order to restore maybe |
---|
0:04:06 | maybe be a moron could say that but |
---|
0:04:08 | to minimise the impact |
---|
0:04:09 | of |
---|
0:04:10 | this information that we are losing the speaker factor |
---|
0:04:13 | the idea of the pen factors comes that |
---|
0:04:16 | so |
---|
0:04:16 | total factor was born in |
---|
0:04:18 | but also a tangent hopkins university |
---|
0:04:21 | so |
---|
0:04:22 | and what we did is just |
---|
0:04:23 | which |
---|
0:04:24 | although that had been built |
---|
0:04:25 | separate speaker space and channel space with one one |
---|
0:04:28 | once again space |
---|
0:04:29 | at what which model but |
---|
0:04:31 | speaker and channel variability |
---|
0:04:32 | i do recall that the real |
---|
0:04:35 | and |
---|
0:04:35 | so when we have a target |
---|
0:04:37 | and |
---|
0:04:38 | yeah that's |
---|
0:04:39 | which is |
---|
0:04:40 | got project |
---|
0:04:41 | but together |
---|
0:04:42 | in the shop and maybe space and we can just compute the cosine distance there |
---|
0:04:47 | so |
---|
0:04:48 | so the one that we use that are very pretty and what is different between speaker space like eigenvoices and |
---|
0:04:53 | put on my ability |
---|
0:04:54 | for the uh for in our case |
---|
0:04:56 | so |
---|
0:04:57 | for the egg and a voice for the jfa |
---|
0:04:59 | if like for additional for speaker |
---|
0:05:02 | for for speaker always recording |
---|
0:05:04 | it is seen as |
---|
0:05:05 | same speaker |
---|
0:05:07 | so we put all the work on it together |
---|
0:05:08 | for a bit of a space is the opposite so |
---|
0:05:11 | four |
---|
0:05:12 | for each recording of the same speaker |
---|
0:05:14 | is seen as a different speaker |
---|
0:05:16 | so we want try to model but speaker and channel variability |
---|
0:05:19 | the only thing |
---|
0:05:20 | so if you have the eigenvoice |
---|
0:05:22 | algorithm is the same things you should |
---|
0:05:24 | the same |
---|
0:05:24 | use the same are good for but |
---|
0:05:26 | just the list is different |
---|
0:05:28 | okay |
---|
0:05:29 | so for this |
---|
0:05:29 | for the eigenspace we put the data from the same speaker in the same |
---|
0:05:32 | five and four but maybe two speakers |
---|
0:05:35 | it's five |
---|
0:05:36 | is recording slightly stated |
---|
0:05:37 | different speaker |
---|
0:05:39 | so there's a different way to estimate |
---|
0:05:41 | no i can eigenvoice okay or or maybe |
---|
0:05:45 | so |
---|
0:05:46 | the relevance map so we can use |
---|
0:05:47 | for each recording estimate |
---|
0:05:49 | general supervectors by map adaptation marilyn smart |
---|
0:05:53 | and then compute pca |
---|
0:05:55 | okay |
---|
0:05:55 | and you know again with eigenvoice map map adaptation with a with a gmm supervectors not observable and we would |
---|
0:06:01 | yeah my good too |
---|
0:06:02 | we estimate |
---|
0:06:03 | all this |
---|
0:06:03 | for every day |
---|
0:06:04 | so |
---|
0:06:05 | why we are using that eigenvoice i think because like |
---|
0:06:09 | in what was in G out in a p2p uh some happen |
---|
0:06:12 | university for the workshop |
---|
0:06:14 | some people from you to try |
---|
0:06:15 | different kind if i'm not wrong |
---|
0:06:17 | like my and everyone smart and again mathematician for |
---|
0:06:21 | speaker true speaker factors training |
---|
0:06:23 | and we find that the best is |
---|
0:06:25 | eigenvoice maybe i'm wrong |
---|
0:06:27 | you can confirm after |
---|
0:06:28 | um |
---|
0:06:30 | so and also |
---|
0:06:31 | eigenvoice and is known to be more power for for short duration |
---|
0:06:35 | so maybe it's explain why are very weak in this case given a better result than |
---|
0:06:39 | irvine smart |
---|
0:06:42 | so |
---|
0:06:44 | what do we have targets |
---|
0:06:45 | speech or a target recording and test recording so we estimate this |
---|
0:06:49 | but i'm very beating up the factors |
---|
0:06:52 | uh vectors and |
---|
0:06:54 | which is to compute |
---|
0:06:55 | a cosine distance scoring between the two |
---|
0:06:57 | vectors |
---|
0:06:58 | okay |
---|
0:06:58 | so |
---|
0:06:59 | um and then competitor shot so you don't have to do channel compensation so i just |
---|
0:07:04 | uh first do lda to do some dimmers introduction and |
---|
0:07:08 | to maximise the speaker and minimised and stuff |
---|
0:07:11 | it wouldn't cost |
---|
0:07:12 | within class |
---|
0:07:13 | with the speaker variability sorry |
---|
0:07:15 | and updated obvious to see and to do some kind of normalisation |
---|
0:07:19 | in the |
---|
0:07:19 | in the little be much less of a node initially rate no timit |
---|
0:07:22 | space |
---|
0:07:23 | okay |
---|
0:07:25 | so |
---|
0:07:26 | linear so |
---|
0:07:27 | and the is |
---|
0:07:29 | it just like uh i'm gonna metric is defined by solving this generalised eigenvalue so between |
---|
0:07:34 | we use a bit with speaker |
---|
0:07:36 | viability and within speaker variability |
---|
0:07:39 | um i think |
---|
0:07:40 | my sister |
---|
0:07:41 | so here there's only one remark that they need to put up |
---|
0:07:44 | uh in the first version of the |
---|
0:07:46 | but the cosine distance i say that |
---|
0:07:48 | the mean of all all this |
---|
0:07:50 | speakers is equal to zero because they have normal |
---|
0:07:53 | from the distribution |
---|
0:07:54 | but for the top of factors |
---|
0:07:56 | but in this work |
---|
0:07:57 | i've |
---|
0:07:57 | i |
---|
0:07:58 | but it |
---|
0:07:58 | like i'd estimated |
---|
0:08:00 | so i think i need to show that |
---|
0:08:02 | i need to compute it |
---|
0:08:03 | because |
---|
0:08:04 | i find some like |
---|
0:08:05 | problem with the |
---|
0:08:06 | new scoring when i don't estimated that |
---|
0:08:11 | so |
---|
0:08:12 | for the data to C N what we do is |
---|
0:08:15 | after estimating lda would project all our background in this |
---|
0:08:19 | slowly much of the space which is |
---|
0:08:20 | we move from four hundred to two hundred |
---|
0:08:23 | and after |
---|
0:08:24 | we also the same background but not the same but all you make sure that the data to estimate |
---|
0:08:28 | adaptive this year in two hundred space |
---|
0:08:31 | so it's a |
---|
0:08:32 | because for his W C C N is applied |
---|
0:08:34 | sorry |
---|
0:08:35 | right now |
---|
0:08:36 | one |
---|
0:08:37 | the basis is applied in the projected space |
---|
0:08:39 | oh |
---|
0:08:40 | and the a okay |
---|
0:08:42 | so it's not |
---|
0:08:42 | the origin of space |
---|
0:08:45 | so here's some kind of |
---|
0:08:47 | visualisation of |
---|
0:08:48 | all the steps where all this kind of stuff so |
---|
0:08:51 | is this five |
---|
0:08:52 | speaker so is colour is this one speaker |
---|
0:08:54 | and if one is uh one recording for or speaker |
---|
0:08:58 | so that's is five |
---|
0:09:00 | female speaker |
---|
0:09:01 | so this is after lda projection |
---|
0:09:03 | into the emotional |
---|
0:09:05 | okay |
---|
0:09:06 | so |
---|
0:09:06 | if you know the other C C N |
---|
0:09:08 | so |
---|
0:09:09 | is it the same scatter |
---|
0:09:11 | if you have the same here as in black scale |
---|
0:09:13 | so we are minimising the intraspeaker variability |
---|
0:09:17 | and when you do |
---|
0:09:18 | W |
---|
0:09:19 | let normalisation of course sciences |
---|
0:09:21 | course going |
---|
0:09:22 | you are going in the spherical area |
---|
0:09:24 | here |
---|
0:09:24 | so here the speaker one who the speaker to interspeaker tape |
---|
0:09:29 | so this is why |
---|
0:09:30 | to find out what what what what a fine |
---|
0:09:33 | like |
---|
0:09:33 | how how about explained this morning about the dissertation so all this |
---|
0:09:37 | data on the same |
---|
0:09:38 | fig |
---|
0:09:39 | yeah |
---|
0:09:43 | so |
---|
0:09:44 | this is the |
---|
0:09:46 | jack i'm off |
---|
0:09:47 | that of brevity system |
---|
0:09:49 | so |
---|
0:09:50 | when |
---|
0:09:50 | you have a look |
---|
0:09:51 | not a lot of we've first we use a lot of nontarget speaker |
---|
0:09:54 | like a lot of |
---|
0:09:55 | lot lot of speaker whatsoever a recording for speakers |
---|
0:09:58 | and |
---|
0:09:59 | i use mfcc extraction i used to be into uh yeah my going to attain a B M |
---|
0:10:04 | and after extract |
---|
0:10:05 | the the what what statistical here |
---|
0:10:08 | for all the same |
---|
0:10:10 | sorry |
---|
0:10:11 | all the same recording |
---|
0:10:12 | and after a change of |
---|
0:10:13 | max i tried to to train data but maybe two metrics |
---|
0:10:16 | and then |
---|
0:10:17 | here extract ivectors |
---|
0:10:19 | for all this |
---|
0:10:20 | uh |
---|
0:10:21 | uh recording and then |
---|
0:10:23 | i estimate and the N W C C S of his on the interview C N |
---|
0:10:26 | it's not obedience ubm |
---|
0:10:28 | so what i have a target |
---|
0:10:30 | okay set according so |
---|
0:10:31 | i just extract mfccs and the U D B M to excitable what statistic here |
---|
0:10:35 | and upon my be extracted uh factors |
---|
0:10:38 | and then |
---|
0:10:39 | uh it was only and the innovation to normalise |
---|
0:10:42 | the the the the the the the new |
---|
0:10:44 | new vectors |
---|
0:10:45 | okay |
---|
0:10:45 | so when you have that yeah |
---|
0:10:47 | we're the same person |
---|
0:10:48 | and to getting that of a matrix |
---|
0:10:50 | uh that's right at the top of factors |
---|
0:10:52 | and then projected indiana B C C N and that can |
---|
0:10:55 | and uh compute the cosine distance and make a final decision |
---|
0:11:00 | so now |
---|
0:11:02 | uh |
---|
0:11:03 | i'll explain |
---|
0:11:04 | what score normalisation is doing again |
---|
0:11:06 | and the space |
---|
0:11:07 | okay |
---|
0:11:07 | in this |
---|
0:11:08 | what what's gonna musician that we can get cosine distance scoring |
---|
0:11:10 | so let me simplify some questions so this is like |
---|
0:11:14 | cosine distance scoring first |
---|
0:11:15 | okay |
---|
0:11:16 | so let's use |
---|
0:11:18 | like we call that a five normalised above factors which is the projection of |
---|
0:11:22 | lda |
---|
0:11:23 | and uh some ski decomposition of the within class parameterisation |
---|
0:11:27 | so |
---|
0:11:28 | and normalised by the land |
---|
0:11:30 | so |
---|
0:11:30 | in this case |
---|
0:11:31 | cosine distance 'cause we can just |
---|
0:11:33 | on the product |
---|
0:11:35 | okay |
---|
0:11:36 | so just |
---|
0:11:36 | i just want to simplify |
---|
0:11:37 | have a dot product okay |
---|
0:11:39 | so |
---|
0:11:41 | so |
---|
0:11:41 | this is |
---|
0:11:42 | you can see all this |
---|
0:11:44 | like maybe |
---|
0:11:45 | because we've with the first paper we say that W well opening W is feature extraction |
---|
0:11:50 | so we can see also all this as a double as a feature extraction |
---|
0:11:53 | 'cause you do it |
---|
0:11:54 | such a compensation |
---|
0:11:55 | and of course i became or just |
---|
0:11:57 | a dot product |
---|
0:11:59 | so |
---|
0:12:00 | no |
---|
0:12:01 | if you have you want to see that |
---|
0:12:02 | who started that someone's you know so we have a target speaker and the set of you know utterance okay |
---|
0:12:07 | so we |
---|
0:12:08 | or is it turns you extract the proposed factors |
---|
0:12:11 | okay |
---|
0:12:12 | and need to compute |
---|
0:12:13 | the main |
---|
0:12:15 | come the scores |
---|
0:12:16 | the mean of the scores |
---|
0:12:17 | and the standard deviation of the schools okay |
---|
0:12:19 | so |
---|
0:12:20 | i tried to say |
---|
0:12:21 | how to how what is |
---|
0:12:22 | the mean and so another innovation is doing |
---|
0:12:24 | okay what the what is that what it's got what's the value that |
---|
0:12:27 | so i try |
---|
0:12:28 | display so |
---|
0:12:29 | is it a |
---|
0:12:30 | it so it |
---|
0:12:30 | for every |
---|
0:12:32 | is that you know impostors |
---|
0:12:33 | i tried to spit in schools |
---|
0:12:35 | okay just the product between target |
---|
0:12:37 | and uh posters |
---|
0:12:38 | and |
---|
0:12:39 | it's divided by and this is the main |
---|
0:12:41 | okay |
---|
0:12:42 | so |
---|
0:12:43 | the target speakers |
---|
0:12:44 | if you to simplify that you take this |
---|
0:12:46 | oh |
---|
0:12:47 | it's just |
---|
0:12:48 | the product |
---|
0:12:49 | with win |
---|
0:12:49 | target |
---|
0:12:50 | unnormalised eigenvectors |
---|
0:12:52 | and |
---|
0:12:52 | the mean of |
---|
0:12:54 | yeah posters you know about that the normalised factor |
---|
0:12:58 | okay |
---|
0:12:59 | so this is the me |
---|
0:13:00 | okay |
---|
0:13:01 | so |
---|
0:13:02 | and this is the um posters |
---|
0:13:04 | uh no multiple vectors means |
---|
0:13:06 | okay so and and the number |
---|
0:13:09 | of |
---|
0:13:09 | and posture for the teen forms you know |
---|
0:13:11 | so if you see for standard deviation |
---|
0:13:14 | you do the same price |
---|
0:13:14 | process |
---|
0:13:15 | you have this |
---|
0:13:17 | scores |
---|
0:13:17 | four |
---|
0:13:19 | for the between target and impostors you knows |
---|
0:13:22 | and is it the meeting which is |
---|
0:13:24 | exactly this one |
---|
0:13:26 | okay |
---|
0:13:27 | so the but product between the two |
---|
0:13:29 | almost i get uh to to normalise |
---|
0:13:32 | uh target speakers |
---|
0:13:34 | and the impostors |
---|
0:13:35 | i mean |
---|
0:13:37 | and if you to go if we take |
---|
0:13:39 | the |
---|
0:13:39 | they're not targeting the target |
---|
0:13:41 | oh |
---|
0:13:42 | so here |
---|
0:13:42 | you can see this is |
---|
0:13:44 | the covariance matrix |
---|
0:13:45 | all the |
---|
0:13:47 | of the yeah uh apostasy no |
---|
0:13:50 | okay |
---|
0:13:50 | so |
---|
0:13:51 | score normalisation which is you know |
---|
0:13:53 | is just |
---|
0:13:54 | no if you |
---|
0:13:55 | you're trying to but in the |
---|
0:13:57 | a question of |
---|
0:13:58 | how do score normalisation |
---|
0:13:59 | it's just |
---|
0:14:00 | shifting |
---|
0:14:02 | the task |
---|
0:14:03 | normalisation by the mean |
---|
0:14:05 | all the impostors |
---|
0:14:06 | and the week another that normalisation |
---|
0:14:08 | but this time normalisation is base it only |
---|
0:14:11 | oh uh between class |
---|
0:14:13 | impostor |
---|
0:14:15 | okay this is an apostle so |
---|
0:14:17 | this is mean that we are going for the for this you know if i want to do is you |
---|
0:14:20 | know what do in another that normalisation that |
---|
0:14:23 | the direction is base it on |
---|
0:14:25 | maximising the distance between |
---|
0:14:28 | a poster |
---|
0:14:30 | okay |
---|
0:14:31 | in a similar way |
---|
0:14:34 | you can find |
---|
0:14:36 | that you know so you know |
---|
0:14:37 | is |
---|
0:14:38 | this you know example is shifting the test |
---|
0:14:40 | you know is |
---|
0:14:41 | shifting the body |
---|
0:14:43 | where the me |
---|
0:14:44 | and doing |
---|
0:14:44 | that that normalisation of the test |
---|
0:14:46 | you know is doing |
---|
0:14:48 | that minimises on the target |
---|
0:14:50 | do you know was doing that normalisation of the best |
---|
0:14:52 | with some kind of covariance |
---|
0:14:55 | between a poster |
---|
0:14:56 | so |
---|
0:14:58 | we will |
---|
0:15:00 | new scoring |
---|
0:15:01 | one assuming ideal which is not is not exactly easy to you know |
---|
0:15:05 | it would save you just amaze you can also |
---|
0:15:08 | we shift |
---|
0:15:09 | target |
---|
0:15:11 | we we we was like some background of impostors and we compute the mean of that |
---|
0:15:15 | and we shift the target |
---|
0:15:17 | that's why we should target |
---|
0:15:19 | here |
---|
0:15:20 | and normalised target |
---|
0:15:21 | that's done factors and also for the test by the impostor |
---|
0:15:25 | means |
---|
0:15:25 | and |
---|
0:15:26 | no my the bottom end of the test and |
---|
0:15:29 | the target a |
---|
0:15:30 | based on |
---|
0:15:31 | between awaiting covariance |
---|
0:15:33 | a posters |
---|
0:15:36 | so |
---|
0:15:36 | another one |
---|
0:15:38 | uh is that some of that |
---|
0:15:39 | i think he was and uh secondary anyway factories newspaper notice it |
---|
0:15:44 | doesn't then |
---|
0:15:45 | so it's as well and |
---|
0:15:47 | this in this case for us and all this is exactly that's not |
---|
0:15:51 | we well because what as women doing this may be seen on a systematic it's going that was eating omitting |
---|
0:15:56 | always the same |
---|
0:15:57 | so |
---|
0:15:57 | it's for the target shifting the task |
---|
0:16:00 | and normalising by the target here it shift in the target tantalising but this |
---|
0:16:04 | so this is exactly as well so we can do as well |
---|
0:16:07 | without any |
---|
0:16:08 | all windy per parameter estimation just |
---|
0:16:10 | not about maybe the space |
---|
0:16:12 | so |
---|
0:16:13 | this |
---|
0:16:14 | kind of |
---|
0:16:15 | it's going |
---|
0:16:15 | have a lot to speed up the process more |
---|
0:16:18 | so the only just compute the cosine distance so now we can do it as you know uh |
---|
0:16:22 | maybe seem easy to you know or |
---|
0:16:24 | complete as long |
---|
0:16:25 | in this paper maybe this paper |
---|
0:16:30 | so then do some experiments |
---|
0:16:32 | so |
---|
0:16:33 | we used two thousand forty eight abortions |
---|
0:16:35 | with the motion to sixty like we have ninety percent of T C as you know jeepers that of that |
---|
0:16:40 | of that |
---|
0:16:40 | is an old system that they have i don't |
---|
0:16:42 | do you need a date for that |
---|
0:16:44 | right |
---|
0:16:44 | i did or both horizontal so it doesn't and um |
---|
0:16:47 | sorry for that |
---|
0:16:48 | so is four hundred benefactors |
---|
0:16:50 | lda reduced a hundred |
---|
0:16:52 | and the basis in is applied in two hundred space |
---|
0:16:55 | and |
---|
0:16:56 | use some kind of one of our one thousand you norm |
---|
0:16:59 | and two hundred yet you know |
---|
0:17:01 | for as normal we use all overcome by all the apostle together |
---|
0:17:05 | and |
---|
0:17:06 | for the uh |
---|
0:17:08 | for the mean and the covariance of the new scoring |
---|
0:17:11 | we use |
---|
0:17:11 | all together all the impostor together |
---|
0:17:14 | but we use diagonal covariance matrix for the impostors just |
---|
0:17:17 | to speed up the process and make an experiment |
---|
0:17:19 | we can use the force to |
---|
0:17:22 | so here |
---|
0:17:24 | a lot of people ask me how you but at a very poor spatial trying to build this |
---|
0:17:28 | that table to show |
---|
0:17:30 | how can train your |
---|
0:17:31 | lda and where |
---|
0:17:32 | which database |
---|
0:17:33 | so |
---|
0:17:34 | for the A B M we use switchboard |
---|
0:17:36 | uh |
---|
0:17:37 | fig switchboard about senior and uh landline |
---|
0:17:40 | uh we use discover four and five |
---|
0:17:43 | what about a bit that we use all the data |
---|
0:17:45 | so what's the type that you have more of it is is it |
---|
0:17:48 | and |
---|
0:17:49 | use like |
---|
0:17:50 | minimum speaker that have to recording |
---|
0:17:52 | to be the order of a matrix |
---|
0:17:55 | okay this is the first time of the sixty to use fisher data to in the factorises because patrick died |
---|
0:17:59 | in the past with the jfa and he that have success |
---|
0:18:02 | with that |
---|
0:18:03 | um |
---|
0:18:04 | lda i use |
---|
0:18:06 | switchboard and nist and four and five |
---|
0:18:09 | and because i tried to model this |
---|
0:18:11 | but with speaker variability so we need more speakers |
---|
0:18:14 | for them use this year was surprising that |
---|
0:18:16 | i i found that the best result is only for two of the four and five |
---|
0:18:20 | maybe because |
---|
0:18:21 | in which are data we have this kind of speaker |
---|
0:18:23 | speaking different |
---|
0:18:25 | their phone numbers and telephone |
---|
0:18:26 | compared to switchboard |
---|
0:18:28 | i'm not maybe |
---|
0:18:29 | this is what we need only make two thousand four and five |
---|
0:18:33 | okay so this is the uh |
---|
0:18:36 | the uh uh there's a lot so |
---|
0:18:39 | i tried to sit and core condition |
---|
0:18:41 | uh often times eight |
---|
0:18:43 | result only female part |
---|
0:18:45 | uh portion |
---|
0:18:46 | so |
---|
0:18:47 | i just want to compare that |
---|
0:18:49 | the score normalisation is working here |
---|
0:18:50 | i forget to put this score without score normalisation sorry |
---|
0:18:53 | uh |
---|
0:18:54 | so |
---|
0:18:55 | this is the origin of scoring |
---|
0:18:57 | like |
---|
0:18:57 | go find the so was it you know |
---|
0:18:59 | as a group we should in the past |
---|
0:19:00 | and uh |
---|
0:19:02 | when you do a new |
---|
0:19:03 | like |
---|
0:19:04 | is uh |
---|
0:19:04 | and use it you know which image it or not |
---|
0:19:06 | it's |
---|
0:19:07 | you would most like to but incorporates your point five an absolutely great |
---|
0:19:11 | but |
---|
0:19:12 | um |
---|
0:19:13 | within this year why the same |
---|
0:19:15 | there's not very basic that improvement |
---|
0:19:17 | however for all try we have some kind of a job because he's english trials and his all time when |
---|
0:19:23 | we have |
---|
0:19:23 | different languages |
---|
0:19:25 | and |
---|
0:19:25 | here |
---|
0:19:26 | this year the accord and this you have was good very good |
---|
0:19:29 | in this new city knobs units scoring |
---|
0:19:32 | okay |
---|
0:19:32 | so |
---|
0:19:33 | it's nice norm |
---|
0:19:34 | it's quite this |
---|
0:19:35 | competitive results |
---|
0:19:36 | and getting better result in all tries applied to do |
---|
0:19:39 | like original scoring |
---|
0:19:41 | and |
---|
0:19:42 | so it seems like we can do score normalisation in this |
---|
0:19:45 | in the above a vector space so there's no problem for that |
---|
0:19:50 | so this is intense again that's again the results |
---|
0:19:52 | so |
---|
0:19:53 | here |
---|
0:19:54 | big |
---|
0:19:54 | i like to uh okay |
---|
0:19:56 | i like |
---|
0:19:57 | um |
---|
0:19:58 | core condition |
---|
0:19:59 | we find that |
---|
0:20:01 | it's had a lot here |
---|
0:20:02 | it's it's improving the performance |
---|
0:20:04 | uh not for the dcf patrol decorate |
---|
0:20:07 | and also for this you have all trials |
---|
0:20:09 | and that's that's not what was doing very well here in the second uh second |
---|
0:20:13 | compared to the core condition |
---|
0:20:15 | so |
---|
0:20:16 | and the conclusion |
---|
0:20:18 | so |
---|
0:20:19 | for this paper i try to uh |
---|
0:20:21 | simplify life |
---|
0:20:22 | again |
---|
0:20:23 | by making the score normalisation and a very this space |
---|
0:20:27 | so which makes the process more simple and more fast |
---|
0:20:30 | if you want to try to optimise the or |
---|
0:20:31 | cosine the some scoring |
---|
0:20:33 | and |
---|
0:20:34 | we do it for |
---|
0:20:35 | for the purpose of doing some speak and some adaptation |
---|
0:20:37 | no that's not that's not up to date a parameter of the |
---|
0:20:41 | but the the that you know how much you know |
---|
0:20:43 | and the answer but adaptation |
---|
0:20:45 | so |
---|
0:20:45 | stephen was talking more about |
---|
0:20:47 | after the start |
---|
0:20:48 | and thank you |
---|
0:20:58 | distance for magazine |
---|
0:21:10 | occlusion |
---|
0:21:11 | um like you say |
---|
0:21:13 | right |
---|
0:21:13 | yes |
---|
0:21:15 | oh |
---|
0:21:16 | scroll through |
---|
0:21:19 | oh |
---|
0:21:21 | uh |
---|
0:21:22 | hmmm |
---|
0:21:24 | yeah |
---|
0:21:25 | uh |
---|
0:21:26 | no |
---|
0:21:27 | yeah |
---|
0:21:27 | oh |
---|
0:21:28 | where |
---|
0:21:29 | so |
---|
0:21:30 | just |
---|
0:21:30 | true |
---|
0:21:31 | yeah |
---|
0:21:32 | uh |
---|
0:21:34 | use power |
---|
0:21:35 | or something |
---|
0:21:36 | sure |
---|
0:21:37 | uh the the uh was |
---|
0:21:39 | oh boy |
---|
0:21:41 | most |
---|
0:21:42 | yes |
---|
0:21:43 | okay |
---|
0:21:44 | the question |
---|
0:21:45 | is that |
---|
0:21:46 | right |
---|
0:21:46 | oh |
---|
0:21:47 | most |
---|
0:21:49 | yes |
---|
0:21:50 | oh |
---|
0:21:51 | so |
---|
0:21:54 | well |
---|
0:21:55 | so |
---|
0:21:57 | okay |
---|
0:21:58 | no |
---|
0:21:58 | yeah |
---|
0:21:59 | the most |
---|
0:22:00 | yeah |
---|
0:22:01 | uh |
---|
0:22:01 | hmmm |
---|
0:22:02 | some |
---|
0:22:03 | right |
---|
0:22:05 | you know |
---|
0:22:07 | so if you can |
---|
0:22:09 | hmmm |
---|
0:22:10 | yeah |
---|
0:22:11 | most |
---|
0:22:12 | maximum normalisation |
---|
0:22:13 | such that |
---|
0:22:14 | hmmm |
---|
0:22:15 | so |
---|
0:22:16 | school |
---|
0:22:18 | so |
---|
0:22:24 | but the point |
---|
0:22:25 | okay so |
---|
0:22:26 | so one of them so |
---|
0:22:28 | selecting uh |
---|
0:22:29 | emphasising you space |
---|
0:22:31 | based on the different |
---|
0:22:32 | right |
---|
0:22:34 | which is |
---|
0:22:35 | uh_huh and i'm wondering if you could modify |
---|
0:22:37 | the normalisation approach |
---|
0:22:40 | but you |
---|
0:22:40 | you know |
---|
0:22:42 | posted |
---|
0:22:42 | yeah |
---|
0:22:43 | you could modify |
---|
0:22:44 | such that |
---|
0:22:45 | who are loosely coupled |
---|
0:22:47 | okay |
---|
0:22:47 | function |
---|
0:22:49 | score |
---|
0:22:51 | that's |
---|
0:22:52 | that's can be |
---|
0:22:53 | good point here because |
---|
0:22:55 | ah |
---|
0:22:55 | this length normalisation |
---|
0:22:59 | okay if i try to do |
---|
0:23:02 | if i tried and stand like |
---|
0:23:04 | for example for that you know so |
---|
0:23:06 | i try to be okay |
---|
0:23:07 | i |
---|
0:23:07 | when i did cosine distance away by doing a D N W C A so i removing some i'm removing |
---|
0:23:13 | the within class |
---|
0:23:14 | but here pentium okay |
---|
0:23:15 | do wanna do that normalisation and take to the the reformation of |
---|
0:23:18 | maximising this year between speaker |
---|
0:23:21 | it can be seen as a between speaker via the the map quest metric |
---|
0:23:25 | so |
---|
0:23:26 | it seems like |
---|
0:23:27 | i am quite losing information between speakers |
---|
0:23:30 | but with a the other basis ian |
---|
0:23:32 | that's true |
---|
0:23:33 | when i see this kind of things |
---|
0:23:34 | it seems like |
---|
0:23:35 | i am doing something |
---|
0:23:37 | that it hurt me |
---|
0:23:40 | yeah right but this is a good point |
---|
0:23:42 | that the basis yes or no |
---|
0:23:44 | it's like |
---|
0:23:45 | we have a nice |
---|
0:23:46 | a dog |
---|
0:23:47 | here |
---|
0:23:48 | like at the end of it is it is a project that i'll do it again |
---|
0:23:51 | but this may this |
---|
0:23:52 | all the way that |
---|
0:23:52 | i need to |
---|
0:23:53 | no interaction of |
---|
0:23:55 | the the speaker |
---|
0:23:57 | right |
---|
0:23:58 | so |
---|
0:23:59 | i don't know how to do that yet |
---|
0:24:00 | because like |
---|
0:24:03 | looks to zero |
---|
0:24:03 | huh |
---|
0:24:04 | this is an excellent |
---|
0:24:05 | yes |
---|
0:24:06 | i |
---|
0:24:07 | okay |
---|
0:24:12 | i have a comment regarding decency kristin i try to |
---|
0:24:16 | do length normalisation for the eighteen |
---|
0:24:19 | the B C C N |
---|
0:24:20 | actually |
---|
0:24:22 | it had |
---|
0:24:22 | yeah |
---|
0:24:23 | before the division |
---|
0:24:24 | i i i just do it length normalisation |
---|
0:24:27 | before |
---|
0:24:28 | it's peoples |
---|
0:24:29 | then they do then the W C C |
---|
0:24:32 | i tried but they didn't have |
---|
0:24:34 | so we have |
---|
0:24:36 | a way to talk |
---|
0:24:37 | i try and |
---|
0:24:38 | i think more one that also try it |
---|
0:24:40 | you tried one but |
---|
0:24:42 | no yes and |
---|
0:24:44 | the funds not having a |
---|
0:24:47 | and the |
---|
0:24:48 | yeah |
---|
0:24:58 | a quick question |
---|
0:24:59 | um |
---|
0:25:01 | so |
---|
0:25:01 | so |
---|
0:25:03 | cool |
---|
0:25:03 | hmmm |
---|
0:25:04 | you know |
---|
0:25:06 | um |
---|
0:25:06 | and |
---|
0:25:08 | so and so i mean |
---|
0:25:11 | and |
---|
0:25:12 | the current remotes for |
---|
0:25:14 | what |
---|
0:25:15 | this is gonna prove most most |
---|
0:25:19 | yeah |
---|
0:25:20 | where |
---|
0:25:21 | um i'm just wondering you know |
---|
0:25:25 | hmmm |
---|
0:25:25 | is that |
---|
0:25:26 | you know |
---|
0:25:27 | and the the actual models |
---|
0:25:30 | yeah |
---|
0:25:31 | cluster |
---|
0:25:31 | but |
---|
0:25:33 | um |
---|
0:25:34 | um |
---|
0:25:35 | like you |
---|
0:25:35 | so like you used to i'm just |
---|
0:25:38 | you know i said oh i see is it really |
---|
0:25:41 | this is not exactly is it you know |
---|
0:25:42 | i don't know so i just wanna have your |
---|
0:25:45 | oh well |
---|
0:25:46 | where |
---|
0:25:47 | the mean and variance |
---|
0:25:49 | when |
---|
0:25:50 | i mean |
---|
0:25:51 | is used them go denotes the |
---|
0:25:54 | yeah |
---|
0:25:54 | proof |
---|
0:25:55 | posted |
---|
0:25:55 | but |
---|
0:25:56 | hmmm |
---|
0:25:57 | and you don't actually need |
---|
0:26:00 | or |
---|
0:26:00 | hmmm |
---|
0:26:01 | you wanna try to do you know in this new scoring or would you |
---|
0:26:04 | no no i'm just |
---|
0:26:05 | well you you have you mean in your room |
---|
0:26:08 | hmmm |
---|
0:26:09 | and i mean and number and you were |
---|
0:26:12 | computing the number one |
---|
0:26:13 | the |
---|
0:26:15 | i mean and variance |
---|
0:26:17 | and |
---|
0:26:18 | right |
---|
0:26:19 | uh_huh |
---|
0:26:20 | okay |
---|
0:26:21 | um |
---|
0:26:32 | yeah so i don't know uh when when you do that you know which is the process |
---|
0:26:36 | yes |
---|
0:26:37 | yes and you explain to |
---|
0:26:39 | yeah |
---|
0:26:39 | so i'm just wondering where |
---|
0:26:41 | when |
---|
0:26:41 | my the |
---|
0:26:42 | system is calibrated and also the units |
---|
0:26:45 | you mean |
---|
0:26:46 | okay that's good that the ah |
---|
0:26:48 | i try to understand what the third one is doing in the middle but i never six said to |
---|
0:26:54 | yeah i know |
---|
0:26:55 | i know |
---|
0:26:55 | i know and it's uh |
---|
0:26:57 | i |
---|
0:26:59 | i never sick said to do that but i tried to see if my |
---|
0:27:02 | system is not |
---|
0:27:03 | is |
---|
0:27:03 | is |
---|
0:27:04 | if you compare the result is not |
---|
0:27:06 | what the same as it you know |
---|
0:27:08 | the only nicole rate that change a little bit |
---|
0:27:10 | but anyways |
---|
0:27:11 | scene |
---|
0:27:11 | that they have it that's not what calibre distill what kind of it |
---|
0:27:14 | but |
---|
0:27:15 | but |
---|
0:27:16 | not good |
---|
0:27:18 | if you have any comment about how we can put the third part i will be happy to |
---|
0:27:29 | because like i did it i did it as normal |
---|
0:27:32 | because i needed in the new version of the system but |
---|
0:27:34 | did you know mine |
---|
0:27:36 | i had to start but i don't know how to do it |
---|
0:27:41 | and here |
---|
0:27:42 | uh if you are in this one comment if you are doing like |
---|
0:27:46 | max for example we have training and the telephone and |
---|
0:27:49 | make a but that's in the microphone |
---|
0:27:50 | so we can do this different |
---|
0:27:53 | based on which database are using |
---|
0:27:55 | so which can help you |
---|
0:27:57 | in the crosstalk |
---|
0:27:58 | uh not to construct |
---|
0:27:59 | costs uh channel |
---|
0:28:02 | right |
---|
0:28:06 | thank you very much larger than here |
---|