0:00:15 | i |
---|
0:00:16 | and B |
---|
0:00:20 | i |
---|
0:00:21 | i this is a probabilistic pca based system for |
---|
0:00:24 | dictionary learning and encoding for speaker |
---|
0:00:47 | oh |
---|
0:00:49 | i'm second i'll go into instructing limit of the bic student |
---|
0:00:54 | at present my paper isa |
---|
0:00:57 | because pointed out it's data |
---|
0:00:59 | oh it's a hybrid factor analysis system |
---|
0:01:03 | and it uses ppca |
---|
0:01:05 | oh basically to simplify the compute intensive computation intensive parts of the factor analysis system |
---|
0:01:12 | as we have seen from the previous |
---|
0:01:14 | talks |
---|
0:01:15 | oh the fact that the system is |
---|
0:01:18 | what competition intensive and a this work is still any about how to simplify some |
---|
0:01:25 | parts of the system |
---|
0:01:26 | us so that |
---|
0:01:29 | oh be again some advantages at the same time but not a having some state |
---|
0:01:33 | of wanting performance process but |
---|
0:01:36 | so basically all of the four |
---|
0:01:40 | and that as we basically explaining why such a |
---|
0:01:47 | such a thing is possible for so far and what perspective of this factor analysis |
---|
0:01:51 | system that enables us to simplify |
---|
0:01:53 | a those parts that i'm gonna talk about especially the hyperparameter estimation technique which is |
---|
0:01:58 | basically the T matrix in the subspace model |
---|
0:02:01 | a spastic total variability space |
---|
0:02:05 | so and the end will be looking at how the performance of the system is |
---|
0:02:10 | and two is a very modest a representation of the entire vector |
---|
0:02:14 | spain but |
---|
0:02:15 | so basically you have supervectors only to fix the damage the representations of speech utterances |
---|
0:02:22 | a that are converted to a low dimensional |
---|
0:02:26 | i-vector representations |
---|
0:02:28 | so this a second time you press |
---|
0:02:30 | i D W is so basically the representation used in this paper |
---|
0:02:35 | and i have i'm going to fix the same station |
---|
0:02:37 | this paper is that |
---|
0:02:40 | so oh to just |
---|
0:02:43 | for the sake of completeness i just get i |
---|
0:02:45 | most of us my most been doing what is |
---|
0:02:47 | oh there was a light of the system but |
---|
0:02:50 | but still thank you |
---|
0:02:52 | the |
---|
0:02:53 | i just explain what is happening and |
---|
0:02:55 | then we know the perspective that is very important for the test |
---|
0:02:59 | and this is that the that is |
---|
0:03:02 | for from a speech utterance |
---|
0:03:04 | we consisting of feature vectors at |
---|
0:03:06 | and basically we use the G a gmm parameters |
---|
0:03:11 | gmm ubm parameters |
---|
0:03:13 | to basically for the supervector |
---|
0:03:15 | and what was once we have a supervector be from the development data we used |
---|
0:03:20 | to train the |
---|
0:03:22 | subspace motivated videos is met |
---|
0:03:25 | and then we try to extract the feature i-vectors that are quite of the sent |
---|
0:03:29 | data |
---|
0:03:30 | and get low dimensional representation to presentation |
---|
0:03:33 | okay |
---|
0:03:34 | speech utterance |
---|
0:03:36 | so once we have |
---|
0:03:37 | a testing phase we try to find acoustic distance between the target speaker and the |
---|
0:03:43 | brain patterns |
---|
0:03:44 | and this is how the agenda the general framework of speaker recognition system |
---|
0:03:49 | and such a system can actually be viewed as |
---|
0:03:54 | oh consisting of two |
---|
0:03:58 | and encoding |
---|
0:03:59 | so here once you are the development data this product development data you like you |
---|
0:04:05 | oh estimate a subspace in which all these people with the state is the total |
---|
0:04:08 | variability space and this can be done to speech and it in |
---|
0:04:11 | and i also wonder and one of that's this paper |
---|
0:04:15 | this is an overcomplete dictionary |
---|
0:04:18 | and |
---|
0:04:20 | so once the subset you matrix is lower |
---|
0:04:23 | we try to encode the data based on the supervector it has been observed |
---|
0:04:27 | okay so this is |
---|
0:04:29 | audio video frame but |
---|
0:04:31 | that there is used in this paper |
---|
0:04:33 | so we'll see how decoding these stages of the and rate if the system is |
---|
0:04:39 | that by the us |
---|
0:04:40 | the basic motivation behind this is |
---|
0:04:45 | a variational importance of encoding and importance of decoding this entire system |
---|
0:04:51 | oh as i in two phases which is the dictionary learning |
---|
0:04:54 | sure but you |
---|
0:04:57 | so basically the end of the |
---|
0:05:00 | but must be done on a sparse encoding procedures better for example if |
---|
0:05:06 | you take a orthogonal matching pursuit |
---|
0:05:09 | algorithm a bit okay sparse vectors |
---|
0:05:12 | and you train a dictionary using those using that algorithm they also that using in |
---|
0:05:18 | your encoding algorithm |
---|
0:05:20 | but not work |
---|
0:05:21 | so they have the that some of the encoding of buttons what better than the |
---|
0:05:24 | others in any does not is necessary had to be the optimal matching pursuit algorithm |
---|
0:05:30 | set |
---|
0:05:31 | so i'll for example this they have that the soft thresholding scheme works better |
---|
0:05:37 | a way back to the speaker |
---|
0:05:39 | oh |
---|
0:05:40 | so yeah i is an opportunity to see if we can |
---|
0:05:44 | we list a particular a base that is very computationally intensive to explain the observations |
---|
0:05:51 | made in this work |
---|
0:05:55 | just to |
---|
0:05:56 | see this is cool for any improvement in terms of computational |
---|
0:06:01 | a efficiencies |
---|
0:06:02 | we look at the union |
---|
0:06:04 | step i was thinking is taken from |
---|
0:06:07 | the egg S P |
---|
0:06:08 | you the e-step we want to |
---|
0:06:11 | we started images with that them |
---|
0:06:14 | please |
---|
0:06:14 | and then accordingly anomalies the |
---|
0:06:18 | columns |
---|
0:06:20 | you |
---|
0:06:21 | you get a i-vectors of development data and then keep the estimating a given okay |
---|
0:06:26 | oh convergence |
---|
0:06:29 | so this |
---|
0:06:32 | this isn't about the and you see |
---|
0:06:35 | okay is |
---|
0:06:37 | machine |
---|
0:06:39 | additionally density and i'll try to formalise it in terms of a big limitation data |
---|
0:06:46 | so once this is done |
---|
0:06:48 | i |
---|
0:06:50 | to look at an alternative if they don't if it is the total variability space |
---|
0:06:55 | model |
---|
0:06:56 | which is a problems T V C |
---|
0:06:58 | the |
---|
0:06:59 | once is in this one is it is yes introspective in addition |
---|
0:07:05 | oh they come up with a stick three parts of this estimation |
---|
0:07:09 | and one of the important ones that they have in that is that is |
---|
0:07:16 | it just a special case of i |
---|
0:07:17 | that is a set of them badly covariance matrix is |
---|
0:07:21 | for example |
---|
0:07:23 | and one of the main |
---|
0:07:27 | S but there is that how the computation of the covariance matrix in a probabilistic |
---|
0:07:32 | we use L |
---|
0:07:34 | it's less intensive in terms of four |
---|
0:07:37 | computational complexity when it comes to |
---|
0:07:40 | a very high dimensional data samples like this product of the P |
---|
0:07:45 | so a lot of the reasons if it's but |
---|
0:07:47 | but that's be observed the proposal ppca is not as good as the |
---|
0:07:53 | oh that of the i-vector techniques |
---|
0:07:56 | the conventional factor analysis techniques and we'll see how to |
---|
0:08:00 | oh |
---|
0:08:01 | complete all the observations that we need to know into a |
---|
0:08:05 | these systems |
---|
0:08:07 | so this just use |
---|
0:08:09 | say i hear the unit and step |
---|
0:08:13 | i |
---|
0:08:16 | and a similar i saw analysis case |
---|
0:08:19 | except a that as so the first a P |
---|
0:08:24 | kenny mentions that the ppca does not the |
---|
0:08:27 | necessarily assume that that's what it does come from a gmm |
---|
0:08:32 | so this that the computation vol |
---|
0:08:35 | yeah instead |
---|
0:08:38 | so we can see that |
---|
0:08:42 | and what steps we can see there's a huge |
---|
0:08:46 | different |
---|
0:08:49 | you |
---|
0:08:50 | less intensive than that of the conventional technique |
---|
0:08:53 | your C |
---|
0:08:56 | i |
---|
0:08:56 | images |
---|
0:08:57 | ubm |
---|
0:08:58 | yeah the dimensionality of the feature use |
---|
0:09:01 | i say |
---|
0:09:05 | i five |
---|
0:09:07 | i don't understand percent in this case of development data sequence |
---|
0:09:10 | we should be allowed |
---|
0:09:12 | and that's why |
---|
0:09:14 | they said to be |
---|
0:09:16 | i |
---|
0:09:18 | i |
---|
0:09:20 | i'm you know his own |
---|
0:09:21 | but what can be done |
---|
0:09:24 | oh |
---|
0:09:26 | is the |
---|
0:09:28 | you say you know |
---|
0:09:30 | this that these sources |
---|
0:09:34 | you |
---|
0:09:35 | is it possible |
---|
0:09:37 | to consider only i mean you method using the ppca |
---|
0:09:43 | in |
---|
0:09:45 | using the conventional thing doesn't give any advantage of these sets of observations |
---|
0:09:51 | to make it |
---|
0:09:52 | and |
---|
0:09:55 | so the proposed approach |
---|
0:09:57 | and basically |
---|
0:09:59 | the estimated you made using |
---|
0:10:01 | here |
---|
0:10:03 | i |
---|
0:10:05 | okay |
---|
0:10:07 | the conventional technique |
---|
0:10:09 | bad |
---|
0:10:10 | you make an assumption |
---|
0:10:12 | that which makes an assumption that the supervector comes from a german |
---|
0:10:18 | so what happens is |
---|
0:10:20 | the i-vector that are encoded using |
---|
0:10:23 | taking |
---|
0:10:24 | oh |
---|
0:10:25 | the rest of this |
---|
0:10:27 | i was going this time constants which is |
---|
0:10:32 | oh |
---|
0:10:32 | okay |
---|
0:10:35 | my presentation |
---|
0:10:37 | i think |
---|
0:10:38 | see without so suppose i |
---|
0:10:41 | ppca estimate of T |
---|
0:10:44 | the i-vectors are that are estimated so what was interesting here |
---|
0:10:48 | that if i had to estimate that is using the conventional ppca technique |
---|
0:10:55 | oh i is that you want to be |
---|
0:10:58 | what happens is estimated using the |
---|
0:11:01 | oh proposed approach |
---|
0:11:03 | is that i think that information |
---|
0:11:06 | as |
---|
0:11:08 | speech are basically covariance matrices |
---|
0:11:10 | and in the middle but that expression access to normalization |
---|
0:11:17 | which seems to be really useful |
---|
0:11:19 | which will be seeing in this |
---|
0:11:23 | so |
---|
0:11:24 | using |
---|
0:11:28 | was that experiments on is that it is that |
---|
0:11:33 | okay |
---|
0:11:34 | i |
---|
0:11:36 | the data development dataset used is right |
---|
0:11:41 | minimalistic then compared to the box |
---|
0:11:44 | oh |
---|
0:11:47 | these are |
---|
0:11:48 | is that |
---|
0:11:48 | databases |
---|
0:11:51 | me parts that are missing |
---|
0:11:53 | there's |
---|
0:11:54 | i |
---|
0:11:56 | and the mfcc features extracted |
---|
0:11:59 | basically the string to cepstral coefficients |
---|
0:12:02 | and those are numbers in addition |
---|
0:12:04 | consider the feature extraction is |
---|
0:12:11 | so that means we're and you matrix |
---|
0:12:15 | so it is five hundred but with in what's techniques |
---|
0:12:20 | and the standard hmm based is only that has usually in the in the one |
---|
0:12:25 | we wccn |
---|
0:12:26 | and the data that can be |
---|
0:12:28 | a plate mental image |
---|
0:12:32 | so the doctors directions are |
---|
0:12:34 | leading |
---|
0:12:37 | and just to say support for that you see it's |
---|
0:12:42 | much faster than the conventional technique |
---|
0:12:46 | this |
---|
0:12:48 | the |
---|
0:12:49 | although system was implemented |
---|
0:12:51 | oh |
---|
0:12:52 | yeah systems with that in matters fashion using |
---|
0:12:57 | and the contents of speech recognition is and that makes |
---|
0:13:00 | so |
---|
0:13:01 | if you look at the context of |
---|
0:13:03 | technique |
---|
0:13:04 | vol |
---|
0:13:05 | yeah |
---|
0:13:06 | we see here for different |
---|
0:13:09 | the whole that the difference in |
---|
0:13:11 | many B E citation |
---|
0:13:15 | and |
---|
0:13:16 | with the final class |
---|
0:13:18 | so |
---|
0:13:19 | we wanted to |
---|
0:13:22 | take advantage this of |
---|
0:13:26 | this exactly |
---|
0:13:30 | oh |
---|
0:13:31 | so |
---|
0:13:32 | one C |
---|
0:13:33 | just |
---|
0:13:34 | preliminary tests |
---|
0:13:35 | we see that the i-vectors incorporate |
---|
0:13:39 | in that way that we have |
---|
0:13:40 | proposed |
---|
0:13:41 | are |
---|
0:13:43 | good enough for |
---|
0:13:45 | well being used in speaker recognition |
---|
0:13:48 | so this shows that interspeaker and intraspeaker is |
---|
0:13:51 | is |
---|
0:13:54 | i |
---|
0:13:57 | i |
---|
0:13:58 | in the performance to ensure that |
---|
0:14:01 | it is |
---|
0:14:03 | you |
---|
0:14:03 | the degradation |
---|
0:14:05 | yeah |
---|
0:14:11 | conventional factor analysis system |
---|
0:14:13 | on this |
---|
0:14:14 | still |
---|
0:14:18 | so one aspect of a |
---|
0:14:21 | this work that is interesting is to find of the relationship between the two i-vectors |
---|
0:14:27 | that are extracted in two different ways so just to look at the whole data |
---|
0:14:32 | related with the relationship is linear we need to be used to |
---|
0:14:37 | correlation analysis |
---|
0:14:38 | canonical correlation analysis and |
---|
0:14:42 | applicable |
---|
0:14:44 | so basically cca is like mutual information greatly usually |
---|
0:14:49 | estimation of mutual information and that all there is a large population |
---|
0:14:55 | but when you have a basis so if |
---|
0:14:59 | if you need to determine that the relationship is nonlinear that is |
---|
0:15:03 | it is in here in a high dimensional space |
---|
0:15:05 | all but try to use P C |
---|
0:15:09 | so what we can see is that the convention and i think and ppca subspace |
---|
0:15:14 | is not nearly but |
---|
0:15:16 | that is what and suggest is that what is based on that |
---|
0:15:21 | and then you look at a yeah |
---|
0:15:26 | it should be |
---|
0:15:28 | the i-vectors extracted from the |
---|
0:15:30 | a conventional approach and the i-vector six extracted |
---|
0:15:35 | and the proposed approach |
---|
0:15:36 | the extent of the asian to its |
---|
0:15:40 | oh basically a full kind of T V though in the space generated by a |
---|
0:15:45 | point we can |
---|
0:15:46 | and this is |
---|
0:15:49 | what is a most interesting aspect |
---|
0:15:51 | what it |
---|
0:15:52 | gives is that |
---|
0:15:54 | is an opportunity to look at different splitting procedures |
---|
0:15:57 | oh |
---|
0:15:58 | so that the performance of what systems can be much |
---|
0:16:03 | so in this |
---|
0:16:05 | a baseline system forty eight and C six is given |
---|
0:16:09 | and you see you know |
---|
0:16:14 | oh i have a problem is |
---|
0:16:16 | the so that this |
---|
0:16:18 | six C |
---|
0:16:22 | is it is |
---|
0:16:23 | and you can see in two days |
---|
0:16:26 | what is it that it's a |
---|
0:16:29 | yeah |
---|
0:16:32 | or it's a ppca |
---|
0:16:36 | so if you look at the ppca technique |
---|
0:16:39 | and the proposed |
---|
0:16:40 | though there is a clear improvement in terms of the |
---|
0:16:46 | so in summary |
---|
0:16:47 | oh the ppca |
---|
0:16:49 | this is actually which is the total variability space |
---|
0:16:53 | oh matrix |
---|
0:16:54 | and in doing so we |
---|
0:16:57 | oh |
---|
0:16:58 | speaker for explosives or |
---|
0:17:00 | and the performance |
---|
0:17:02 | close to one |
---|
0:17:04 | point |
---|
0:17:06 | respect to the |
---|
0:17:08 | the degradation was just one thirty nine point |
---|
0:17:13 | yes |
---|
0:17:14 | and |
---|
0:17:15 | which that is also a baseline system |
---|
0:17:18 | and |
---|
0:17:20 | we are basically ppca system the improvement is |
---|
0:17:23 | related to point a person |
---|
0:17:25 | and the i-vectors |
---|
0:17:27 | system |
---|
0:17:28 | one important conclusion |
---|
0:17:30 | i-vectors and the |
---|
0:17:32 | proposed extracted using the proposed approach |
---|
0:17:35 | is non-linearly related to |
---|
0:17:37 | those and the baseline |
---|
0:17:55 | i |
---|
0:17:58 | yeah P |
---|
0:18:00 | this |
---|
0:18:02 | oh |
---|
0:18:05 | that was a context as |
---|
0:18:07 | cost "'cause" use |
---|
0:18:09 | but you're based sparse structure is always |
---|
0:18:12 | so you used |
---|
0:18:15 | oh is the observation was that |
---|
0:18:17 | i don't know the reason why the decoding of the |
---|
0:18:21 | dictionary learning and encoding part and me is there is there is only observation right |
---|
0:18:26 | number i digital data stream |
---|
0:18:29 | and then some |
---|
0:18:34 | so i'm not |
---|
0:18:38 | oh |
---|
0:18:50 | and the speaker again |
---|