0:00:15i
0:00:16and B
0:00:20i
0:00:21i this is a probabilistic pca based system for
0:00:24dictionary learning and encoding for speaker
0:00:47oh
0:00:49i'm second i'll go into instructing limit of the bic student
0:00:54at present my paper isa
0:00:57because pointed out it's data
0:00:59oh it's a hybrid factor analysis system
0:01:03and it uses ppca
0:01:05oh basically to simplify the compute intensive computation intensive parts of the factor analysis system
0:01:12as we have seen from the previous
0:01:14talks
0:01:15oh the fact that the system is
0:01:18what competition intensive and a this work is still any about how to simplify some
0:01:25parts of the system
0:01:26us so that
0:01:29oh be again some advantages at the same time but not a having some state
0:01:33of wanting performance process but
0:01:36so basically all of the four
0:01:40and that as we basically explaining why such a
0:01:47such a thing is possible for so far and what perspective of this factor analysis
0:01:51system that enables us to simplify
0:01:53a those parts that i'm gonna talk about especially the hyperparameter estimation technique which is
0:01:58basically the T matrix in the subspace model
0:02:01a spastic total variability space
0:02:05so and the end will be looking at how the performance of the system is
0:02:10and two is a very modest a representation of the entire vector
0:02:14spain but
0:02:15so basically you have supervectors only to fix the damage the representations of speech utterances
0:02:22a that are converted to a low dimensional
0:02:26i-vector representations
0:02:28so this a second time you press
0:02:30i D W is so basically the representation used in this paper
0:02:35and i have i'm going to fix the same station
0:02:37this paper is that
0:02:40so oh to just
0:02:43for the sake of completeness i just get i
0:02:45most of us my most been doing what is
0:02:47oh there was a light of the system but
0:02:50but still thank you
0:02:52the
0:02:53i just explain what is happening and
0:02:55then we know the perspective that is very important for the test
0:02:59and this is that the that is
0:03:02for from a speech utterance
0:03:04we consisting of feature vectors at
0:03:06and basically we use the G a gmm parameters
0:03:11gmm ubm parameters
0:03:13to basically for the supervector
0:03:15and what was once we have a supervector be from the development data we used
0:03:20to train the
0:03:22subspace motivated videos is met
0:03:25and then we try to extract the feature i-vectors that are quite of the sent
0:03:29data
0:03:30and get low dimensional representation to presentation
0:03:33okay
0:03:34speech utterance
0:03:36so once we have
0:03:37a testing phase we try to find acoustic distance between the target speaker and the
0:03:43brain patterns
0:03:44and this is how the agenda the general framework of speaker recognition system
0:03:49and such a system can actually be viewed as
0:03:54oh consisting of two
0:03:58and encoding
0:03:59so here once you are the development data this product development data you like you
0:04:05oh estimate a subspace in which all these people with the state is the total
0:04:08variability space and this can be done to speech and it in
0:04:11and i also wonder and one of that's this paper
0:04:15this is an overcomplete dictionary
0:04:18and
0:04:20so once the subset you matrix is lower
0:04:23we try to encode the data based on the supervector it has been observed
0:04:27okay so this is
0:04:29audio video frame but
0:04:31that there is used in this paper
0:04:33so we'll see how decoding these stages of the and rate if the system is
0:04:39that by the us
0:04:40the basic motivation behind this is
0:04:45a variational importance of encoding and importance of decoding this entire system
0:04:51oh as i in two phases which is the dictionary learning
0:04:54sure but you
0:04:57so basically the end of the
0:05:00but must be done on a sparse encoding procedures better for example if
0:05:06you take a orthogonal matching pursuit
0:05:09algorithm a bit okay sparse vectors
0:05:12and you train a dictionary using those using that algorithm they also that using in
0:05:18your encoding algorithm
0:05:20but not work
0:05:21so they have the that some of the encoding of buttons what better than the
0:05:24others in any does not is necessary had to be the optimal matching pursuit algorithm
0:05:30set
0:05:31so i'll for example this they have that the soft thresholding scheme works better
0:05:37a way back to the speaker
0:05:39oh
0:05:40so yeah i is an opportunity to see if we can
0:05:44we list a particular a base that is very computationally intensive to explain the observations
0:05:51made in this work
0:05:55just to
0:05:56see this is cool for any improvement in terms of computational
0:06:01a efficiencies
0:06:02we look at the union
0:06:04step i was thinking is taken from
0:06:07the egg S P
0:06:08you the e-step we want to
0:06:11we started images with that them
0:06:14please
0:06:14and then accordingly anomalies the
0:06:18columns
0:06:20you
0:06:21you get a i-vectors of development data and then keep the estimating a given okay
0:06:26oh convergence
0:06:29so this
0:06:32this isn't about the and you see
0:06:35okay is
0:06:37machine
0:06:39additionally density and i'll try to formalise it in terms of a big limitation data
0:06:46so once this is done
0:06:48i
0:06:50to look at an alternative if they don't if it is the total variability space
0:06:55model
0:06:56which is a problems T V C
0:06:58the
0:06:59once is in this one is it is yes introspective in addition
0:07:05oh they come up with a stick three parts of this estimation
0:07:09and one of the important ones that they have in that is that is
0:07:16it just a special case of i
0:07:17that is a set of them badly covariance matrix is
0:07:21for example
0:07:23and one of the main
0:07:27S but there is that how the computation of the covariance matrix in a probabilistic
0:07:32we use L
0:07:34it's less intensive in terms of four
0:07:37computational complexity when it comes to
0:07:40a very high dimensional data samples like this product of the P
0:07:45so a lot of the reasons if it's but
0:07:47but that's be observed the proposal ppca is not as good as the
0:07:53oh that of the i-vector techniques
0:07:56the conventional factor analysis techniques and we'll see how to
0:08:00oh
0:08:01complete all the observations that we need to know into a
0:08:05these systems
0:08:07so this just use
0:08:09say i hear the unit and step
0:08:13i
0:08:16and a similar i saw analysis case
0:08:19except a that as so the first a P
0:08:24kenny mentions that the ppca does not the
0:08:27necessarily assume that that's what it does come from a gmm
0:08:32so this that the computation vol
0:08:35yeah instead
0:08:38so we can see that
0:08:42and what steps we can see there's a huge
0:08:46different
0:08:49you
0:08:50less intensive than that of the conventional technique
0:08:53your C
0:08:56i
0:08:56images
0:08:57ubm
0:08:58yeah the dimensionality of the feature use
0:09:01i say
0:09:05i five
0:09:07i don't understand percent in this case of development data sequence
0:09:10we should be allowed
0:09:12and that's why
0:09:14they said to be
0:09:16i
0:09:18i
0:09:20i'm you know his own
0:09:21but what can be done
0:09:24oh
0:09:26is the
0:09:28you say you know
0:09:30this that these sources
0:09:34you
0:09:35is it possible
0:09:37to consider only i mean you method using the ppca
0:09:43in
0:09:45using the conventional thing doesn't give any advantage of these sets of observations
0:09:51to make it
0:09:52and
0:09:55so the proposed approach
0:09:57and basically
0:09:59the estimated you made using
0:10:01here
0:10:03i
0:10:05okay
0:10:07the conventional technique
0:10:09bad
0:10:10you make an assumption
0:10:12that which makes an assumption that the supervector comes from a german
0:10:18so what happens is
0:10:20the i-vector that are encoded using
0:10:23taking
0:10:24oh
0:10:25the rest of this
0:10:27i was going this time constants which is
0:10:32oh
0:10:32okay
0:10:35my presentation
0:10:37i think
0:10:38see without so suppose i
0:10:41ppca estimate of T
0:10:44the i-vectors are that are estimated so what was interesting here
0:10:48that if i had to estimate that is using the conventional ppca technique
0:10:55oh i is that you want to be
0:10:58what happens is estimated using the
0:11:01oh proposed approach
0:11:03is that i think that information
0:11:06as
0:11:08speech are basically covariance matrices
0:11:10and in the middle but that expression access to normalization
0:11:17which seems to be really useful
0:11:19which will be seeing in this
0:11:23so
0:11:24using
0:11:28was that experiments on is that it is that
0:11:33okay
0:11:34i
0:11:36the data development dataset used is right
0:11:41minimalistic then compared to the box
0:11:44oh
0:11:47these are
0:11:48is that
0:11:48databases
0:11:51me parts that are missing
0:11:53there's
0:11:54i
0:11:56and the mfcc features extracted
0:11:59basically the string to cepstral coefficients
0:12:02and those are numbers in addition
0:12:04consider the feature extraction is
0:12:11so that means we're and you matrix
0:12:15so it is five hundred but with in what's techniques
0:12:20and the standard hmm based is only that has usually in the in the one
0:12:25we wccn
0:12:26and the data that can be
0:12:28a plate mental image
0:12:32so the doctors directions are
0:12:34leading
0:12:37and just to say support for that you see it's
0:12:42much faster than the conventional technique
0:12:46this
0:12:48the
0:12:49although system was implemented
0:12:51oh
0:12:52yeah systems with that in matters fashion using
0:12:57and the contents of speech recognition is and that makes
0:13:00so
0:13:01if you look at the context of
0:13:03technique
0:13:04vol
0:13:05yeah
0:13:06we see here for different
0:13:09the whole that the difference in
0:13:11many B E citation
0:13:15and
0:13:16with the final class
0:13:18so
0:13:19we wanted to
0:13:22take advantage this of
0:13:26this exactly
0:13:30oh
0:13:31so
0:13:32one C
0:13:33just
0:13:34preliminary tests
0:13:35we see that the i-vectors incorporate
0:13:39in that way that we have
0:13:40proposed
0:13:41are
0:13:43good enough for
0:13:45well being used in speaker recognition
0:13:48so this shows that interspeaker and intraspeaker is
0:13:51is
0:13:54i
0:13:57i
0:13:58in the performance to ensure that
0:14:01it is
0:14:03you
0:14:03the degradation
0:14:05yeah
0:14:11conventional factor analysis system
0:14:13on this
0:14:14still
0:14:18so one aspect of a
0:14:21this work that is interesting is to find of the relationship between the two i-vectors
0:14:27that are extracted in two different ways so just to look at the whole data
0:14:32related with the relationship is linear we need to be used to
0:14:37correlation analysis
0:14:38canonical correlation analysis and
0:14:42applicable
0:14:44so basically cca is like mutual information greatly usually
0:14:49estimation of mutual information and that all there is a large population
0:14:55but when you have a basis so if
0:14:59if you need to determine that the relationship is nonlinear that is
0:15:03it is in here in a high dimensional space
0:15:05all but try to use P C
0:15:09so what we can see is that the convention and i think and ppca subspace
0:15:14is not nearly but
0:15:16that is what and suggest is that what is based on that
0:15:21and then you look at a yeah
0:15:26it should be
0:15:28the i-vectors extracted from the
0:15:30a conventional approach and the i-vector six extracted
0:15:35and the proposed approach
0:15:36the extent of the asian to its
0:15:40oh basically a full kind of T V though in the space generated by a
0:15:45point we can
0:15:46and this is
0:15:49what is a most interesting aspect
0:15:51what it
0:15:52gives is that
0:15:54is an opportunity to look at different splitting procedures
0:15:57oh
0:15:58so that the performance of what systems can be much
0:16:03so in this
0:16:05a baseline system forty eight and C six is given
0:16:09and you see you know
0:16:14oh i have a problem is
0:16:16the so that this
0:16:18six C
0:16:22is it is
0:16:23and you can see in two days
0:16:26what is it that it's a
0:16:29yeah
0:16:32or it's a ppca
0:16:36so if you look at the ppca technique
0:16:39and the proposed
0:16:40though there is a clear improvement in terms of the
0:16:46so in summary
0:16:47oh the ppca
0:16:49this is actually which is the total variability space
0:16:53oh matrix
0:16:54and in doing so we
0:16:57oh
0:16:58speaker for explosives or
0:17:00and the performance
0:17:02close to one
0:17:04point
0:17:06respect to the
0:17:08the degradation was just one thirty nine point
0:17:13yes
0:17:14and
0:17:15which that is also a baseline system
0:17:18and
0:17:20we are basically ppca system the improvement is
0:17:23related to point a person
0:17:25and the i-vectors
0:17:27system
0:17:28one important conclusion
0:17:30i-vectors and the
0:17:32proposed extracted using the proposed approach
0:17:35is non-linearly related to
0:17:37those and the baseline
0:17:55i
0:17:58yeah P
0:18:00this
0:18:02oh
0:18:05that was a context as
0:18:07cost "'cause" use
0:18:09but you're based sparse structure is always
0:18:12so you used
0:18:15oh is the observation was that
0:18:17i don't know the reason why the decoding of the
0:18:21dictionary learning and encoding part and me is there is there is only observation right
0:18:26number i digital data stream
0:18:29and then some
0:18:34so i'm not
0:18:38oh
0:18:50and the speaker again