0:00:06 | so |
---|
0:00:07 | uh |
---|
0:00:07 | yesterday uh |
---|
0:00:09 | done and almost mentioned that the port that might be necessary to uh |
---|
0:00:14 | uh to something |
---|
0:00:15 | to wake up uh affect the audience |
---|
0:00:18 | uh the suggestion that okay was to uh |
---|
0:00:21 | uh do a colour wheel |
---|
0:00:24 | uh |
---|
0:00:25 | uh |
---|
0:00:26 | which |
---|
0:00:26 | you didn't do i'm i'm also not going through that today |
---|
0:00:29 | um |
---|
0:00:31 | i was there |
---|
0:00:32 | uh |
---|
0:00:33 | assisted in this work or my calling |
---|
0:00:35 | yeah well |
---|
0:00:36 | uh |
---|
0:00:37 | and we're from uh |
---|
0:00:38 | i'll meet you in in south africa |
---|
0:00:41 | so |
---|
0:00:42 | uh |
---|
0:00:43 | this is |
---|
0:00:44 | the second paper out of three |
---|
0:00:47 | essentially on the same topic |
---|
0:00:49 | uh so |
---|
0:00:50 | i'll start by defining it and then saying what is different in our paper from the other two |
---|
0:00:55 | um |
---|
0:00:56 | and then |
---|
0:00:57 | uh we'll get to the race |
---|
0:00:59 | so |
---|
0:01:01 | and |
---|
0:01:02 | defining uh what i'm calling the partitioning problem |
---|
0:01:06 | uh we're all very familiar with the canonical detection problem |
---|
0:01:09 | uh where |
---|
0:01:10 | you need to decide |
---|
0:01:12 | there are two speech segments |
---|
0:01:13 | uh do we have one or two speakers here |
---|
0:01:16 | um |
---|
0:01:18 | so if you want to generalise that |
---|
0:01:20 | which is uh essentially my main interest here how do we generalise |
---|
0:01:25 | the canonical problem |
---|
0:01:27 | uh |
---|
0:01:28 | we're allowing |
---|
0:01:29 | more than two input segment |
---|
0:01:31 | and |
---|
0:01:32 | uh the natural question is then |
---|
0:01:34 | uh |
---|
0:01:35 | how many speakers and uh |
---|
0:01:37 | how do we divide the segments |
---|
0:01:39 | green speaker |
---|
0:01:42 | so |
---|
0:01:42 | here's an example |
---|
0:01:44 | uh |
---|
0:01:44 | the simplest generalisation we got from two |
---|
0:01:47 | three |
---|
0:01:48 | and |
---|
0:01:49 | uh |
---|
0:01:49 | the task is then |
---|
0:01:51 | to partition |
---|
0:01:52 | uh |
---|
0:01:53 | the |
---|
0:01:54 | uh |
---|
0:01:54 | set of input |
---|
0:01:55 | into |
---|
0:01:56 | subset so |
---|
0:01:57 | this is why recording a partition because that's the just the |
---|
0:02:01 | uh |
---|
0:02:02 | set theory |
---|
0:02:03 | uh |
---|
0:02:04 | why of |
---|
0:02:04 | stacking the problem |
---|
0:02:05 | so |
---|
0:02:06 | uh |
---|
0:02:07 | immediately |
---|
0:02:08 | we get five |
---|
0:02:09 | possibility |
---|
0:02:10 | uh so this should be quite obvious that could be one two or three speakers and in the two speaker |
---|
0:02:14 | case there are three ways |
---|
0:02:16 | to partition |
---|
0:02:19 | so |
---|
0:02:20 | um |
---|
0:02:21 | partitioning |
---|
0:02:22 | is |
---|
0:02:22 | the most general problem |
---|
0:02:24 | of this kind |
---|
0:02:25 | uh |
---|
0:02:26 | if you assume there's a single speaker in each segment |
---|
0:02:29 | um |
---|
0:02:30 | and |
---|
0:02:31 | uh |
---|
0:02:32 | i'm saying is the most general because |
---|
0:02:34 | if you have the answer |
---|
0:02:36 | about how the segments of partition you can |
---|
0:02:39 | on sir |
---|
0:02:40 | uh any other kind of |
---|
0:02:42 | uh detection verification identification open or closed |
---|
0:02:46 | so it |
---|
0:02:47 | uh |
---|
0:02:48 | uh |
---|
0:02:49 | uh or clustering problem |
---|
0:02:51 | that you can define within inserts egg |
---|
0:02:54 | uh the connection with |
---|
0:02:55 | diarization has already been mentioned there you also need a segmentation |
---|
0:03:00 | we're we're uh |
---|
0:03:01 | yeah |
---|
0:03:02 | pretty supposing the segmentation is given |
---|
0:03:05 | so |
---|
0:03:06 | uh the problem is general which is good |
---|
0:03:08 | but |
---|
0:03:09 | uh you have to be careful because |
---|
0:03:11 | uh the complexity explode |
---|
0:03:14 | um |
---|
0:03:16 | here's a little table |
---|
0:03:17 | uh |
---|
0:03:18 | that sounds that the number of possible ways |
---|
0:03:20 | what this and that |
---|
0:03:21 | uh |
---|
0:03:22 | uh can become very large |
---|
0:03:23 | uh very quickly |
---|
0:03:25 | so |
---|
0:03:26 | uh |
---|
0:03:27 | again |
---|
0:03:28 | the canonical problem |
---|
0:03:30 | there are just two solutions |
---|
0:03:32 | uh our example which will discuss some more |
---|
0:03:36 | that's fine |
---|
0:03:37 | and uh we're not going to discuss |
---|
0:03:39 | the last one in in full uh yesterday |
---|
0:03:43 | so |
---|
0:03:43 | uh |
---|
0:03:44 | to get to what's new |
---|
0:03:46 | uh |
---|
0:03:47 | material yeah |
---|
0:03:48 | and in the by product is |
---|
0:03:50 | uh too much to present in a in a single talk |
---|
0:03:53 | uh we were uh |
---|
0:03:55 | something to |
---|
0:03:56 | get everything up to the last line in the in the in the I pages |
---|
0:04:00 | so |
---|
0:04:00 | i'll just highlight |
---|
0:04:02 | what is near here and whine what am i want to go and and read the full right |
---|
0:04:07 | so |
---|
0:04:09 | as mentioned |
---|
0:04:10 | uh |
---|
0:04:12 | this is |
---|
0:04:12 | identical |
---|
0:04:13 | two |
---|
0:04:14 | what has been mentioned before are more probable mentioned after this |
---|
0:04:18 | so the problem itself is not you |
---|
0:04:20 | um |
---|
0:04:21 | so |
---|
0:04:22 | i'm |
---|
0:04:23 | stressing this |
---|
0:04:24 | generality which i've just mentioned |
---|
0:04:26 | uh but then i also have to mention that |
---|
0:04:30 | we're focusing |
---|
0:04:31 | uh on |
---|
0:04:32 | problems with that of a small number of |
---|
0:04:34 | of input |
---|
0:04:35 | but try |
---|
0:04:36 | where as |
---|
0:04:37 | uh for example don rickles trace the case of a large number of input |
---|
0:04:42 | um |
---|
0:04:44 | further |
---|
0:04:44 | uh |
---|
0:04:45 | in the background emphasised |
---|
0:04:47 | solutions that the level |
---|
0:04:49 | probabilistic output |
---|
0:04:51 | in other words calibrated |
---|
0:04:52 | likelihood |
---|
0:04:53 | and |
---|
0:04:54 | uh |
---|
0:04:55 | also propose a and associated evaluation criteria |
---|
0:04:59 | uh which i'm not going to discuss further yeah |
---|
0:05:03 | and then |
---|
0:05:04 | uh |
---|
0:05:05 | something which are |
---|
0:05:07 | going to discuss further |
---|
0:05:09 | our paper gives i closed form solution |
---|
0:05:12 | to this very general |
---|
0:05:14 | problem |
---|
0:05:15 | by |
---|
0:05:15 | using |
---|
0:05:16 | uh a simple additive gaussian |
---|
0:05:19 | uh a generative model in ivector space |
---|
0:05:22 | patrick |
---|
0:05:22 | yeah |
---|
0:05:23 | that's exactly what patrick explained |
---|
0:05:25 | except we're not doing it |
---|
0:05:27 | maybe tell |
---|
0:05:28 | just |
---|
0:05:28 | plane cows |
---|
0:05:30 | so |
---|
0:05:31 | um |
---|
0:05:31 | this model gives us |
---|
0:05:33 | uh |
---|
0:05:34 | lot like you |
---|
0:05:35 | uh |
---|
0:05:35 | output |
---|
0:05:36 | and |
---|
0:05:37 | it is tractable even fast when |
---|
0:05:40 | we don't do too many segments at once |
---|
0:05:44 | so |
---|
0:05:45 | uh |
---|
0:05:46 | yesterday in his keynote |
---|
0:05:47 | patrick mentioned |
---|
0:05:49 | that |
---|
0:05:49 | uh |
---|
0:05:51 | using this kind of modelling |
---|
0:05:53 | uh |
---|
0:05:53 | you can |
---|
0:05:54 | calculate the likelihood |
---|
0:05:56 | for any type of speaker recognition problem |
---|
0:05:58 | so |
---|
0:05:59 | this is exactly what we show in our paper |
---|
0:06:02 | uh the formulas of the |
---|
0:06:04 | you can use |
---|
0:06:07 | so |
---|
0:06:08 | um |
---|
0:06:09 | again |
---|
0:06:10 | very briefly |
---|
0:06:11 | be although i i picked a model |
---|
0:06:13 | uh |
---|
0:06:14 | they're model |
---|
0:06:15 | speaker and channel effects as independent multivariate gaussian |
---|
0:06:18 | and |
---|
0:06:19 | and additive |
---|
0:06:20 | uh |
---|
0:06:21 | i vectors |
---|
0:06:23 | i don't need to explain |
---|
0:06:24 | uh every speech segment gets mapped and i vector |
---|
0:06:27 | the reason we call it and i vector |
---|
0:06:29 | is |
---|
0:06:30 | simply because it's all |
---|
0:06:31 | intermediate sized ice for intermediate |
---|
0:06:34 | it's |
---|
0:06:35 | larger than an acoustic feature vector smaller than a super big |
---|
0:06:39 | i might also mention that uh |
---|
0:06:41 | total variability |
---|
0:06:43 | these eigenvectors |
---|
0:06:45 | uh |
---|
0:06:46 | cannot be reconstructed |
---|
0:06:48 | to give you |
---|
0:06:49 | uh the original speech so |
---|
0:06:51 | uh |
---|
0:06:52 | in my opinion they don't |
---|
0:06:54 | reflect the total variability in the in the signal |
---|
0:06:59 | so |
---|
0:07:00 | uh |
---|
0:07:02 | yeah i vector solution |
---|
0:07:04 | uh |
---|
0:07:05 | the generative model the hyperparameters of this model in other words those |
---|
0:07:09 | variability uh |
---|
0:07:11 | uh the covariance matrices that explain all the variability |
---|
0:07:15 | uh they have to be trained |
---|
0:07:16 | with an E M algorithm |
---|
0:07:18 | and that's similar to J I the there's some detail in in the base |
---|
0:07:23 | so i'm going to concentrate on the scoring |
---|
0:07:26 | because it's |
---|
0:07:27 | nice and simple with |
---|
0:07:29 | this very simple model |
---|
0:07:30 | so |
---|
0:07:32 | uh |
---|
0:07:34 | we've given a set |
---|
0:07:35 | of |
---|
0:07:36 | uh |
---|
0:07:37 | segments each represented by nite vector A B C |
---|
0:07:41 | and that is that i've got here represents |
---|
0:07:43 | a subset |
---|
0:07:45 | of that's it so is is a subset |
---|
0:07:47 | image my generative model |
---|
0:07:49 | and then uh |
---|
0:07:51 | we can calculate |
---|
0:07:52 | the likelihood |
---|
0:07:53 | that |
---|
0:07:54 | uh all of the |
---|
0:07:56 | segments in the subset belong to the same speaker so |
---|
0:07:59 | the details of how to calculate the likelihood |
---|
0:08:02 | i i love the subset is in the paper |
---|
0:08:06 | um |
---|
0:08:07 | so i'm something now |
---|
0:08:09 | how to go from |
---|
0:08:10 | the subset likelihood |
---|
0:08:12 | to the likelihood of a full partitioning of the full six so |
---|
0:08:16 | again for the three inputs |
---|
0:08:18 | uh that's one of the possibilities |
---|
0:08:20 | the model is simple so |
---|
0:08:22 | the likelihoods multiply |
---|
0:08:24 | that's very nice very very comfortable to use so |
---|
0:08:28 | this is all you need to solve |
---|
0:08:31 | all of those problems |
---|
0:08:32 | to |
---|
0:08:33 | to get a closed form solution of course is not always going to be a good solution but |
---|
0:08:37 | you get a solution |
---|
0:08:40 | uh |
---|
0:08:41 | so |
---|
0:08:43 | uh for the three input example |
---|
0:08:45 | uh |
---|
0:08:46 | the dust |
---|
0:08:47 | three inputs |
---|
0:08:48 | that represents a trial |
---|
0:08:50 | the output |
---|
0:08:52 | or the five different likelihoods for the fight |
---|
0:08:54 | partitioning probabilities |
---|
0:08:58 | so |
---|
0:08:59 | this solution |
---|
0:09:00 | uh |
---|
0:09:01 | is neat and there's a tile |
---|
0:09:03 | but |
---|
0:09:04 | as already mentioned |
---|
0:09:05 | it blows up if you |
---|
0:09:07 | try to |
---|
0:09:07 | uh |
---|
0:09:08 | used to |
---|
0:09:09 | too many input check |
---|
0:09:13 | so |
---|
0:09:14 | moving to |
---|
0:09:16 | experimental results |
---|
0:09:17 | um |
---|
0:09:19 | uh |
---|
0:09:20 | the experimental results |
---|
0:09:21 | on |
---|
0:09:22 | realness data |
---|
0:09:23 | is available in the full paper |
---|
0:09:26 | uh |
---|
0:09:26 | but |
---|
0:09:28 | in the rest of the stock we're going to |
---|
0:09:30 | uh |
---|
0:09:32 | use |
---|
0:09:33 | and experiment with synthetic data |
---|
0:09:35 | uh |
---|
0:09:36 | the reason i didn't with that |
---|
0:09:38 | in the paper was because |
---|
0:09:40 | yes |
---|
0:09:41 | especially the anonymous ones |
---|
0:09:43 | tend not to like synthetic data |
---|
0:09:45 | but everybody's wearing name tags so |
---|
0:09:47 | i was peers are not here so i'm going to |
---|
0:09:50 | uh |
---|
0:09:51 | per season |
---|
0:09:53 | with my uh |
---|
0:09:55 | synthetic data experiments |
---|
0:09:58 | so |
---|
0:09:58 | this |
---|
0:09:59 | takes the form of a |
---|
0:10:02 | a little tutorial in |
---|
0:10:03 | i think in probability theory |
---|
0:10:07 | so |
---|
0:10:08 | the generality of the |
---|
0:10:10 | partitioning problem |
---|
0:10:12 | and the simplicity of the |
---|
0:10:14 | i vector model |
---|
0:10:16 | uh a very handy tools |
---|
0:10:18 | two |
---|
0:10:19 | uh |
---|
0:10:20 | examine |
---|
0:10:21 | a few questions one might have about |
---|
0:10:25 | basic things about speaker recognition so |
---|
0:10:27 | i'd like to |
---|
0:10:29 | just |
---|
0:10:29 | show you |
---|
0:10:30 | how this how this works |
---|
0:10:33 | so the example we're going to discuss |
---|
0:10:36 | is |
---|
0:10:37 | nest |
---|
0:10:38 | uh unsupervised adaptation uh |
---|
0:10:41 | uh it's not a toss |
---|
0:10:43 | um |
---|
0:10:43 | and that's what i promised uh some points are |
---|
0:10:46 | uh yesterday that |
---|
0:10:48 | this would be discussed |
---|
0:10:49 | so |
---|
0:10:50 | um |
---|
0:10:51 | we're going to analyse it by making it a special case of the uh partitioning problem |
---|
0:10:56 | so basic problem is |
---|
0:10:59 | uh |
---|
0:10:59 | you do need more prior information |
---|
0:11:02 | then |
---|
0:11:03 | that |
---|
0:11:03 | which was provided |
---|
0:11:05 | in in the original definition of the stars |
---|
0:11:09 | so |
---|
0:11:10 | the next |
---|
0:11:11 | several slides are going to be on that |
---|
0:11:14 | so |
---|
0:11:15 | the input |
---|
0:11:16 | um |
---|
0:11:17 | we're looking at the simplest case |
---|
0:11:19 | you're given a train segment |
---|
0:11:21 | which is known to be of the target speaker |
---|
0:11:24 | uh then you'd also given and the adaptation segment |
---|
0:11:28 | which |
---|
0:11:29 | my or may not be from the target segment and you're allowed to use that |
---|
0:11:33 | and then finally there's a test segment and your job is to decide |
---|
0:11:37 | was this the target speaker or not |
---|
0:11:40 | so |
---|
0:11:42 | three inputs as mentioned |
---|
0:11:43 | there are five |
---|
0:11:44 | possibilities of hard to uh partition these three inputs |
---|
0:11:48 | uh |
---|
0:11:49 | we can group the first two |
---|
0:11:51 | as uh |
---|
0:11:53 | belonging to the target hypothesis |
---|
0:11:55 | and uh the last three as |
---|
0:11:58 | uh the instances of |
---|
0:12:00 | uh nontarget partitions non target because the test |
---|
0:12:04 | has a different speaker from that right |
---|
0:12:09 | so |
---|
0:12:11 | we need a prior |
---|
0:12:13 | nist |
---|
0:12:13 | provided |
---|
0:12:14 | the target price |
---|
0:12:16 | we don't need a prior |
---|
0:12:18 | for the train segment we already know it's of the |
---|
0:12:21 | target speaker |
---|
0:12:22 | but what about the adaptation segment |
---|
0:12:25 | so |
---|
0:12:26 | that |
---|
0:12:27 | prior was not stated in the |
---|
0:12:29 | but original problem |
---|
0:12:31 | so |
---|
0:12:32 | we can assemble |
---|
0:12:33 | uh |
---|
0:12:35 | these two |
---|
0:12:37 | priors |
---|
0:12:37 | uh |
---|
0:12:38 | just in the obvious way |
---|
0:12:40 | uh to give a full probability distribution over the five possibilities |
---|
0:12:45 | uh i've |
---|
0:12:46 | uh |
---|
0:12:46 | somewhat |
---|
0:12:47 | arbitrarily set the last one does the other |
---|
0:12:50 | uh to simplify matters here |
---|
0:12:53 | uh you're assuming |
---|
0:12:54 | if the test segment is not to target |
---|
0:12:57 | uh |
---|
0:12:58 | the adaptation segment is also not going to be |
---|
0:13:02 | so |
---|
0:13:03 | uh |
---|
0:13:04 | the whole thing |
---|
0:13:05 | a symbols like this |
---|
0:13:07 | uh |
---|
0:13:08 | the generative model supplies |
---|
0:13:10 | the five likelihoods for the five partitioning |
---|
0:13:13 | well possibilities |
---|
0:13:15 | and |
---|
0:13:15 | then |
---|
0:13:16 | uh |
---|
0:13:18 | you use as patrick said |
---|
0:13:20 | the basic rules of probability theory some room product rule |
---|
0:13:24 | but you need |
---|
0:13:25 | the |
---|
0:13:26 | uh |
---|
0:13:28 | you need those extra prize |
---|
0:13:30 | uh this prior which |
---|
0:13:31 | has not been mentioned before |
---|
0:13:33 | you need that |
---|
0:13:34 | to compute |
---|
0:13:35 | uh |
---|
0:13:36 | to properly express |
---|
0:13:38 | the likelihood ratio between the target and then and then on top hypothesis |
---|
0:13:45 | so |
---|
0:13:47 | the experiment that we did was |
---|
0:13:49 | to demonstrate |
---|
0:13:51 | uh what role |
---|
0:13:52 | does |
---|
0:13:53 | uh this prior play what might happen |
---|
0:13:55 | uh |
---|
0:13:57 | uh |
---|
0:13:57 | if |
---|
0:13:59 | uh |
---|
0:14:00 | you're assuming a bad price |
---|
0:14:03 | how closely should match the actual proportion |
---|
0:14:06 | uh |
---|
0:14:07 | in the data that you working |
---|
0:14:12 | so |
---|
0:14:12 | uh |
---|
0:14:13 | we use synthetic ivectors |
---|
0:14:16 | because we're not interested |
---|
0:14:18 | in examining the data |
---|
0:14:20 | or indeed |
---|
0:14:22 | the the P L D A model |
---|
0:14:24 | uh |
---|
0:14:25 | while making synthetic data with the model |
---|
0:14:28 | uh |
---|
0:14:28 | the data has a perfect |
---|
0:14:30 | shot |
---|
0:14:30 | to the model |
---|
0:14:32 | um |
---|
0:14:32 | so |
---|
0:14:33 | that |
---|
0:14:34 | focused |
---|
0:14:35 | focuses the experiment |
---|
0:14:37 | on the role of the prior |
---|
0:14:41 | so |
---|
0:14:42 | back to |
---|
0:14:43 | this system diagram |
---|
0:14:45 | uh we adjust to things independently |
---|
0:14:48 | the one is the proportion of |
---|
0:14:51 | the adaptation segments in in in the data |
---|
0:14:54 | the other |
---|
0:14:55 | uh |
---|
0:14:55 | is |
---|
0:14:56 | uh |
---|
0:14:57 | the |
---|
0:14:57 | assumed |
---|
0:14:58 | prior |
---|
0:14:59 | of |
---|
0:15:00 | how much that proportion might be |
---|
0:15:03 | and we evaluate the whole thing via via equal error |
---|
0:15:07 | so |
---|
0:15:08 | the results |
---|
0:15:09 | uh look something like this |
---|
0:15:12 | uh |
---|
0:15:12 | the horizontal axis of the here |
---|
0:15:15 | we have the assumed prior increasing in this direction |
---|
0:15:19 | other horizontal axis is the actual |
---|
0:15:21 | proportion |
---|
0:15:23 | and the vertical axis of course |
---|
0:15:25 | the equal error right so |
---|
0:15:26 | uh |
---|
0:15:28 | this here is the best situation to be in |
---|
0:15:30 | uh |
---|
0:15:31 | you know there are many adaptation segments |
---|
0:15:34 | and there are in fact an adaptation works |
---|
0:15:37 | the the back corner over there |
---|
0:15:40 | uh |
---|
0:15:40 | you're saying okay |
---|
0:15:42 | i'm not expecting any targets in the adaptation data so i'm not adapting |
---|
0:15:47 | uh the bad place to be is |
---|
0:15:49 | is there |
---|
0:15:50 | uh with |
---|
0:15:52 | you're assuming |
---|
0:15:54 | uh |
---|
0:15:54 | you'll find |
---|
0:15:55 | uh many adaptation segments |
---|
0:15:58 | but |
---|
0:15:58 | uh |
---|
0:15:59 | when there aren't any |
---|
0:16:01 | so |
---|
0:16:03 | uh |
---|
0:16:05 | the important thing to realise here is |
---|
0:16:07 | it's not so bad to assume |
---|
0:16:11 | that there aren't any adaptation segments |
---|
0:16:13 | because then you just back to what you would have done without |
---|
0:16:16 | adaptation |
---|
0:16:17 | but it is bad to have the mismatched the other way |
---|
0:16:23 | so |
---|
0:16:24 | uh |
---|
0:16:25 | the prior is important |
---|
0:16:26 | uh |
---|
0:16:26 | you might choose to ignore the prior |
---|
0:16:29 | but |
---|
0:16:29 | it's not going to |
---|
0:16:30 | go away it's there |
---|
0:16:32 | closing it's that means that even if you ignore |
---|
0:16:36 | so |
---|
0:16:37 | that brings me to the conclusion of that or |
---|
0:16:39 | um |
---|
0:16:41 | back in the real world |
---|
0:16:42 | uh |
---|
0:16:43 | we've already applied this partitioning software |
---|
0:16:46 | in helping us find |
---|
0:16:47 | the |
---|
0:16:48 | mislabelling in the |
---|
0:16:50 | uh is uh T O I |
---|
0:16:51 | uh |
---|
0:16:52 | data |
---|
0:16:54 | uh which we needed for development for the uh |
---|
0:16:57 | for this evaluation |
---|
0:16:58 | so |
---|
0:16:59 | we've only started on this |
---|
0:17:00 | on this work |
---|
0:17:02 | in the uh personas workshop to be |
---|
0:17:04 | that's starting next week |
---|
0:17:05 | of it here |
---|
0:17:07 | uh we'll be using |
---|
0:17:09 | uh exploring this problem some more |
---|
0:17:11 | uh |
---|
0:17:16 | okay that's all thank you |
---|
0:17:25 | fig |
---|
0:17:26 | three |
---|
0:17:26 | some |
---|
0:17:26 | question of common |
---|
0:17:29 | i could be |
---|
0:17:32 | could be could be if you look at the real case for |
---|
0:17:35 | the purpose |
---|
0:17:35 | you should |
---|
0:17:37 | you can't this you but which we are |
---|
0:17:39 | and then when you are |
---|
0:17:41 | you will have to |
---|
0:17:43 | this |
---|
0:17:43 | more |
---|
0:17:44 | okay |
---|
0:17:45 | usually in a real application you will have |
---|
0:17:48 | an impostor trying |
---|
0:17:51 | going to system enjoying during uh |
---|
0:17:53 | free house to cheat the system |
---|
0:17:55 | and something |
---|
0:17:56 | we will |
---|
0:17:57 | we have |
---|
0:17:57 | hmmm target speaker |
---|
0:17:59 | coming and uh |
---|
0:18:01 | on the targets |
---|
0:18:02 | green |
---|
0:18:02 | a few more |
---|
0:18:04 | you will |
---|
0:18:06 | i agree |
---|
0:18:07 | and this framework |
---|
0:18:08 | allows for that |
---|
0:18:09 | because |
---|
0:18:10 | uh |
---|
0:18:11 | that right there |
---|
0:18:12 | the prior |
---|
0:18:13 | but you plug in to get the final |
---|
0:18:15 | score |
---|
0:18:16 | you can make trial bin |
---|
0:18:18 | so |
---|
0:18:18 | if you know about this guy that you can modify the prior |
---|
0:18:21 | as |
---|
0:18:22 | as time progresses |
---|
0:18:24 | okay |
---|
0:18:24 | sorry |
---|
0:18:25 | did you could you could turn |
---|
0:18:26 | oh yes |
---|
0:18:27 | yeah |
---|
0:18:29 | now you have time to find some questions |
---|
0:18:47 | so do you see that it is it yes |
---|
0:18:49 | yeah wednesday speaker diarization yeah |
---|
0:18:55 | the down here correctly |
---|
0:18:57 | you ask whether that's |
---|
0:18:58 | this is a one step |
---|
0:19:00 | diarization system |
---|
0:19:01 | yeah well when we we do this in addition |
---|
0:19:05 | and it just means you wednesday |
---|
0:19:07 | um |
---|
0:19:08 | no i'm assuming here that the segmentation is given so i assume |
---|
0:19:13 | that that are not the segments which |
---|
0:19:15 | which uh |
---|
0:19:16 | have |
---|
0:19:17 | two speakers in them |
---|
0:19:18 | yeah |
---|
0:19:19 | it is and what they see |
---|
0:19:21 | yeah |
---|
0:19:21 | we apply a system |
---|
0:19:23 | four |
---|
0:19:24 | unique segmentation |
---|
0:19:25 | and you get |
---|
0:19:26 | all the boundaries |
---|
0:19:27 | you then |
---|
0:19:29 | could be used |
---|
0:19:30 | four |
---|
0:19:30 | yeah addition |
---|
0:19:32 | the segment |
---|
0:19:34 | i i wouldn't recommend it because as i pointed out |
---|
0:19:37 | if |
---|
0:19:38 | uh |
---|
0:19:38 | you have |
---|
0:19:39 | a thousand segments |
---|
0:19:40 | uh then |
---|
0:19:43 | uh |
---|
0:19:44 | you want to add to my mike ways |
---|
0:19:46 | uh |
---|
0:19:47 | they're not designed for the large scale case |
---|
0:19:49 | uh |
---|
0:19:51 | but they're they're other approximate |
---|
0:19:53 | uh |
---|
0:19:53 | methods |
---|
0:19:54 | you could you could |
---|
0:19:55 | you could still start from the same |
---|
0:19:57 | uh |
---|
0:19:58 | gaussian P L D A model |
---|
0:20:00 | but then you would need something like |
---|
0:20:01 | uh |
---|
0:20:02 | by variational bayes |
---|
0:20:04 | to to handle a large number of signal |
---|
0:20:06 | we're going to play with that |
---|
0:20:08 | uh |
---|
0:20:09 | at the workshop as well |
---|
0:20:15 | we can think |
---|
0:20:15 | okay |
---|