0:00:15 | and so what i'm gonna talk about is also what we did at in last |
---|
0:00:20 | summer a at hopkins and again this work between myself and steven a dog |
---|
0:00:26 | doug reynolds the annual and l |
---|
0:00:29 | so |
---|
0:00:31 | actually |
---|
0:00:32 | i really hope |
---|
0:00:35 | that |
---|
0:00:37 | well okay |
---|
0:00:38 | four years ago |
---|
0:00:39 | was my first a odyssey |
---|
0:00:41 | ever and does my first conference presentation i made a joke |
---|
0:00:45 | at the start of the slide that are not to be far more memorable and |
---|
0:00:48 | then the presentation itself |
---|
0:00:50 | i wanna do the same this time but i do the on couldn't come up |
---|
0:00:54 | with any good stories so |
---|
0:00:55 | i was gonna give you this picture in which i hope none of you will |
---|
0:00:59 | look like by the end of my presentation |
---|
0:01:03 | okay |
---|
0:01:04 | all right now |
---|
0:01:06 | so what i meant talk about is on the remote clustering approaches |
---|
0:01:09 | for domain adaptation in speaker recognition systems |
---|
0:01:13 | and first off i guess the titles a bit of a handful so i'm gonna |
---|
0:01:17 | break it down and explain kinda each piece one at a time |
---|
0:01:21 | so domain adaptation |
---|
0:01:23 | is |
---|
0:01:24 | one in which |
---|
0:01:27 | where |
---|
0:01:28 | most |
---|
0:01:29 | current statistical learning techniques assume like someone incorrectly rather that |
---|
0:01:33 | the training and test data come from the same underlying distribution right |
---|
0:01:38 | and |
---|
0:01:38 | so what we know in general is that labeled data may exist in one domain |
---|
0:01:42 | but what we want is a model that can also perform well in a related |
---|
0:01:47 | but say not necessarily identical domain |
---|
0:01:51 | and labeling data |
---|
0:01:53 | in this particular the new domain may be difficult and or expensive |
---|
0:01:57 | and so what can we do |
---|
0:01:59 | and to leverage the original labeled out-of-domain data one building a model to work with |
---|
0:02:04 | this in domain data |
---|
0:02:07 | so is nothing new here everything we've heard before in the previous presentation so speaker |
---|
0:02:12 | recognition systems on the blaster once again used for all rather familiar with the i-vector |
---|
0:02:17 | approach |
---|
0:02:19 | and that's |
---|
0:02:20 | clearly just you know you're standard summary length of segment length independent |
---|
0:02:25 | low dimensional vector based or some representation of the audio |
---|
0:02:30 | and what we've done what the i-vector allows us to do is to use large |
---|
0:02:35 | amounts of previously collected in labeled audio to characterize and exploit speaker and channel variability |
---|
0:02:42 | that's right and usually that entails the use of you know thousands of speakers making |
---|
0:02:47 | tens of calls each |
---|
0:02:49 | so |
---|
0:02:51 | unfortunately it is a bit unrealistic to expect that most applications will have access to |
---|
0:02:56 | such a large set of labeled data from matched condition |
---|
0:03:01 | and so |
---|
0:03:02 | well here's you know that anatomy of that standard i-vector system that's very similar and |
---|
0:03:07 | almost actually identical to what counted shown and z are yet again the thing that |
---|
0:03:13 | the no is that |
---|
0:03:15 | you know the your ubm your i-vector extractor and your resulting mean subtraction and like |
---|
0:03:21 | normalisation is are does not require the use of labels |
---|
0:03:25 | what does require some labels are |
---|
0:03:27 | on your within class and across class covariance matrices |
---|
0:03:31 | and that's where the labels come in so |
---|
0:03:34 | that's what we've got now the first thing and like to do sort of just |
---|
0:03:38 | the like paints |
---|
0:03:40 | at the a larger picture of what we've done |
---|
0:03:42 | on this to demonstrate that mismatch right |
---|
0:03:45 | on between are two domains |
---|
0:03:47 | on similar to the deck curve plot that hounded shown on what we start with |
---|
0:03:51 | this one role in score on sre two thousand ten |
---|
0:03:55 | and what one denote as the in domain set is that of the sre data |
---|
0:03:59 | i mean that's all the telephone calls from mixer o four five six and two |
---|
0:04:04 | thousand eight collections |
---|
0:04:06 | now the mismatched out-of-domain data is all the switchboard data |
---|
0:04:11 | which are all the calls from that from those collections |
---|
0:04:15 | so in general |
---|
0:04:17 | some summary statistics there |
---|
0:04:19 | what we're basically looking at is that |
---|
0:04:21 | number of speakers |
---|
0:04:24 | a number of calls an average number of calls per speaker and number of channels |
---|
0:04:28 | that you speaker spoke on a relatively the same and that help with that as |
---|
0:04:32 | a visualization |
---|
0:04:35 | that's kind of the normalized histogram of the distribution of |
---|
0:04:39 | the number of utterances per speaker between the two |
---|
0:04:44 | between the two sets of data in blue well that the pretty much all overlap |
---|
0:04:49 | but in blues the switchboard and read is that of the sre |
---|
0:04:54 | so what we can say is that we would not expect a large performance gap |
---|
0:04:59 | between these two sets of data if indeed |
---|
0:05:03 | are |
---|
0:05:04 | our ability you know are training where |
---|
0:05:07 | dataset independent and are robust across datasets |
---|
0:05:11 | so |
---|
0:05:12 | what we found obviously i is that this is not the case which is why |
---|
0:05:15 | we ended up having a summer workshop |
---|
0:05:18 | on it and so it just take |
---|
0:05:19 | to give in summary of |
---|
0:05:21 | equal error rate results the wrestler talk just be using equal error rate to provide |
---|
0:05:25 | a summary set of results and |
---|
0:05:30 | what we what we have is are |
---|
0:05:33 | inbred i believe denoted just |
---|
0:05:36 | the portion of the system that actually requires |
---|
0:05:40 | labels on and have also |
---|
0:05:42 | on the shown what we what we had at hopkins of the summer and what |
---|
0:05:46 | we |
---|
0:05:46 | replicated at mit |
---|
0:05:49 | as well so you can see that for use all switchboard on to train everything |
---|
0:05:53 | we'll get a set of results around seven percent equal error rate |
---|
0:05:58 | and if we just use all of the sre |
---|
0:06:01 | we will get around two and half percent |
---|
0:06:04 | so now if we start varying the ingredients that we used to actually train these |
---|
0:06:07 | systems |
---|
0:06:09 | in particular we just say we just switch these two here we go from switchboard |
---|
0:06:14 | to the your whitening parameters that's the mean subtraction et cetera |
---|
0:06:20 | and you switch it to use sre |
---|
0:06:22 | you get a little bit of again you get you go down from seven percent |
---|
0:06:25 | of five |
---|
0:06:27 | and subsequently if you |
---|
0:06:30 | stick with |
---|
0:06:32 | a switchboard to do your |
---|
0:06:34 | ubm |
---|
0:06:35 | and i-vector extraction |
---|
0:06:39 | then |
---|
0:06:40 | and also |
---|
0:06:42 | keep the sre as are whining and use the sre labels |
---|
0:06:46 | basically here then you get down to |
---|
0:06:50 | under two and a half percent which is actually better than the last row here |
---|
0:06:53 | not gonna try and explain |
---|
0:06:55 | what happens there but |
---|
0:06:57 | or we decided from then on is that we were obviously can focus |
---|
0:07:01 | on the performance gap |
---|
0:07:02 | between the sre and the use of switchboard labels for our compare a within and |
---|
0:07:10 | across class covariance matrices |
---|
0:07:12 | so |
---|
0:07:13 | that's what will continue one and |
---|
0:07:16 | so basically the source be the baseline that we've got and this will be the |
---|
0:07:21 | the benchmark that we're trying to hit or even to better that |
---|
0:07:26 | so |
---|
0:07:27 | the rules for this what we call the domain adaptation challenge task is that's were |
---|
0:07:34 | allowed to use switchboard all the data and all of the labels |
---|
0:07:39 | and |
---|
0:07:40 | where allowed to use the |
---|
0:07:42 | sre data but not of that labels |
---|
0:07:45 | and obviously we're gonna okay and evaluate on a the twenty ten sre |
---|
0:07:50 | so before we actually jump into that though on well we'd like to do perhaps |
---|
0:07:54 | is to be a mix for the domain mismatch i got a lot of questions |
---|
0:07:57 | like what actually is the difference between these two datasets that might cause such a |
---|
0:08:01 | gap |
---|
0:08:02 | in there |
---|
0:08:03 | and so we |
---|
0:08:04 | big n and did a little bit of a rudimentary analysis of actually what was |
---|
0:08:09 | going on |
---|
0:08:09 | and well some the |
---|
0:08:12 | clear questions that you might wanna |
---|
0:08:14 | our might think of as well as at the speaker age right |
---|
0:08:18 | or is it perhaps the languages spoken in particular switchboard contains only english and it's |
---|
0:08:24 | collected from a over a decade and |
---|
0:08:29 | and it's a over a decade that preceded that of the sre on and the |
---|
0:08:33 | sre contains twenty more than twenty different languages right so the question is whether or |
---|
0:08:38 | not |
---|
0:08:39 | that might have caused some of the shift and variabilities that be that we see |
---|
0:08:45 | are the difference in performance in some of this |
---|
0:08:47 | this work |
---|
0:08:48 | there was previously also export believe by columns back and twenty two l |
---|
0:08:54 | and what we found however was that there was like absolute there was really no |
---|
0:08:59 | affect of either h |
---|
0:09:02 | and |
---|
0:09:03 | eight or language spoken |
---|
0:09:05 | and so |
---|
0:09:07 | with that |
---|
0:09:08 | well |
---|
0:09:08 | one |
---|
0:09:09 | the next step then was to look at something else |
---|
0:09:12 | which was |
---|
0:09:14 | that of the switchboard |
---|
0:09:16 | itself on a what we found was well we realised first off that they're switch |
---|
0:09:21 | but was collected in different phases over approximately a decade and so what would happen |
---|
0:09:27 | what happens whether when we use on different subsets |
---|
0:09:31 | we just use different subsets to build our models |
---|
0:09:35 | and so |
---|
0:09:36 | well we ended up finding |
---|
0:09:37 | was |
---|
0:09:39 | the following if you if you take on switchboard cellular both parts and those of |
---|
0:09:45 | the most recent ones |
---|
0:09:47 | you actually get a starting baseline so the previous starting baseline was five and a |
---|
0:09:51 | half percent |
---|
0:09:53 | you actually get a starting baseline percentage of four point six which is a little |
---|
0:09:57 | bit better and now if you also at in switchboard phase by three you can |
---|
0:10:03 | actually start all the way down at three not have percent |
---|
0:10:07 | and then but then as you keep adding these i guess you could say maybe |
---|
0:10:11 | older |
---|
0:10:12 | the older portions of switchboard on you might you'd start actually doing a bit worse |
---|
0:10:19 | and |
---|
0:10:19 | and that's we found in and i think are similar i work was also on |
---|
0:10:23 | done in presented by high guy on during the summer and over it i can |
---|
0:10:28 | ours is a slightly different take on it but |
---|
0:10:32 | that's kind of what we what we noticed on as we're trying to analyze the |
---|
0:10:36 | mismatch that |
---|
0:10:37 | it basically the differences within switchboard itself on selecting out some of those particular subsets |
---|
0:10:45 | might actually |
---|
0:10:47 | affect the baseline performance |
---|
0:10:49 | so then the next question then is alright so it should be actually just |
---|
0:10:54 | continue with other three graph |
---|
0:10:56 | and |
---|
0:10:57 | also secondly can you actually |
---|
0:11:00 | just |
---|
0:11:01 | find some automatic way of selecting out the out-of-domain data that you actually wanna end |
---|
0:11:08 | up using okay |
---|
0:11:09 | to do your initial domain adaptation |
---|
0:11:12 | or to not even to just like selected the labeled data that you want to |
---|
0:11:17 | use that best matches the in domain data that you have right |
---|
0:11:21 | and so what we |
---|
0:11:22 | did again was and it's just a couple of ninety that's for exports were experiments |
---|
0:11:26 | were set alright |
---|
0:11:28 | if we |
---|
0:11:29 | did an automatic subset selection so |
---|
0:11:32 | in particular |
---|
0:11:34 | first are this is the three no half percent of equal error rate |
---|
0:11:38 | on that you get from the cellular and |
---|
0:11:40 | and the faces that's the best we did |
---|
0:11:42 | and this here on the five and a half percent is approximately what |
---|
0:11:46 | you what if you use all of the data all the switchboard and started off |
---|
0:11:51 | there so instead if you |
---|
0:11:53 | these two lines let's focus on the blue for a second that's if you select |
---|
0:11:59 | the proportion of scores |
---|
0:12:02 | or proportion of i-vectors that's are |
---|
0:12:05 | in the in at the highest that you the prop highest probability density function value |
---|
0:12:12 | with respect to the that the sre so you select the switchboard |
---|
0:12:18 | a subset of the switchboard automatically that were closest in the likelihood onto the sre |
---|
0:12:24 | marginal |
---|
0:12:25 | and you increase the proportion a how would you do in terms of the baseline |
---|
0:12:30 | performance |
---|
0:12:31 | and similar the and lda |
---|
0:12:33 | but is if you took switchboard and |
---|
0:12:36 | and |
---|
0:12:37 | sre and you try to |
---|
0:12:39 | learn just a simple |
---|
0:12:41 | one dimensional linear separator between the two the ones and i take the ones that |
---|
0:12:46 | are closest to |
---|
0:12:49 | the sre data and i reckon that way so |
---|
0:12:52 | and how well can i do the and basically what we can see is obviously |
---|
0:12:55 | if you use all of the discourse and |
---|
0:12:58 | you've done nothing different |
---|
0:13:00 | but you know as you as you as you use just the some proportion of |
---|
0:13:03 | the likelihood |
---|
0:13:04 | are proportion of these top ranking scores |
---|
0:13:07 | you can actually do a little bit better than our baseline however |
---|
0:13:10 | you never approach |
---|
0:13:12 | this three half that seem to be set by this particular this magical subset on |
---|
0:13:17 | that was not |
---|
0:13:19 | so that was the initial exploration of the domain mismatch that we did |
---|
0:13:23 | now |
---|
0:13:24 | covered most of the set up most of the problem |
---|
0:13:27 | and |
---|
0:13:29 | now i can continue one with the rest of a work |
---|
0:13:32 | so |
---|
0:13:33 | the bootstrap remark that i'm gonna go over one more time on it's pretty standard |
---|
0:13:37 | for the domain adaptation we begin with our prior across class and within class hyper |
---|
0:13:43 | parameters |
---|
0:13:44 | and then we use |
---|
0:13:45 | p lda to confuse and pairwise affinity matrix |
---|
0:13:49 | on the sre data |
---|
0:13:51 | subsequently will do some form a clustering on that are pairwise affinity matrix to obtain |
---|
0:13:56 | some hypothesized cluster labels will use these labels to obtain another set |
---|
0:14:01 | of hyper parameters |
---|
0:14:03 | and then be linearly interpolate |
---|
0:14:08 | as alan showed and then potentially we iterate on the me |
---|
0:14:12 | just to make this look better it |
---|
0:14:14 | between mac and windows so that's actually have that slide supposed to look |
---|
0:14:21 | so |
---|
0:14:22 | basically that's the set up and we'll just run into some clustering algorithms and output |
---|
0:14:27 | unsupervised in parentheses "'cause" you know all clustering other algorithms have at least some parameter |
---|
0:14:33 | that you can to right |
---|
0:14:35 | so you start off a mobile find later on is that hierarchical clustering on really |
---|
0:14:39 | does do the best |
---|
0:14:41 | however |
---|
0:14:43 | in light of you know the stopping criterion that you choose or the cluster merging |
---|
0:14:46 | criterion those are kind of up to the user to choose but we find that |
---|
0:14:50 | with some reasonably appropriate choice on hierarchical clustering does do the best the two algorithms |
---|
0:14:56 | that we also explored pretty extensively on word some graph based random walk algorithms |
---|
0:15:02 | and i and that's known as in format and of markov clustering i'm not gonna |
---|
0:15:06 | go into the details about those but on feel free to ask me offline or |
---|
0:15:09 | at the end of on the presentation |
---|
0:15:12 | and those do you know you basically have a graph work each node is an |
---|
0:15:17 | i-vector and then you have some edges on that a contain |
---|
0:15:21 | perhaps and edges and then you do some clustering on those edges |
---|
0:15:25 | so our initial findings this is no really no different from what i wanted shown |
---|
0:15:30 | previously but mainly is that what's mainly true is that the in the presence of |
---|
0:15:35 | interpolation |
---|
0:15:37 | an imperfect clustering is in fact forgivable |
---|
0:15:41 | this here |
---|
0:15:41 | is just the plot that says we took a thousand speakers subset |
---|
0:15:46 | and this shows a cluster error just some thing of cluster error |
---|
0:15:51 | and |
---|
0:15:53 | these are the solid lines in a green and red |
---|
0:15:57 | are if you |
---|
0:15:58 | new |
---|
0:15:58 | the |
---|
0:15:59 | the cluster labels |
---|
0:16:04 | if you new cluster labels are pure in didn't have to do any automatic clustering |
---|
0:16:06 | and then the rest of these two lines here are a in dotted lines are |
---|
0:16:12 | basically |
---|
0:16:13 | what you would have you would do if you |
---|
0:16:16 | clustered or stop your clustering at different points of a |
---|
0:16:21 | at different points of the hierarchical tree okay and basically what the thing is that |
---|
0:16:26 | this ball is incredibly flat okay |
---|
0:16:29 | and this and also the last thing is that |
---|
0:16:32 | alpha star itself is basically the best adaptation parameters so much whatever just talked about |
---|
0:16:40 | so |
---|
0:16:41 | however |
---|
0:16:42 | one thing is that we that we kinda glossed over so far is that alpha |
---|
0:16:46 | itself needs to be estimated you can do it improves on via like more principled |
---|
0:16:52 | way be as a the counts of |
---|
0:16:55 | of the relative dataset size is or you can look at it empirically and you |
---|
0:16:59 | can separate you know you can do your alpha for a within class differently from |
---|
0:17:03 | the alpha of your across class and |
---|
0:17:05 | and that's |
---|
0:17:06 | that seems to be an empirically the case the better ones seem to be this |
---|
0:17:10 | way and so you can see we be range across the elephants on both sides |
---|
0:17:15 | for the within class and you across class and find that this is approximately the |
---|
0:17:20 | best on for a one particular subset of a thousand speakers however |
---|
0:17:25 | like and it seems like |
---|
0:17:27 | alpha star itself is an open an unsolved problem but actually it's not so bad |
---|
0:17:31 | because if we rescaled is plot to within ten percent of this optimal on equal |
---|
0:17:36 | error rate and we can actually find that |
---|
0:17:40 | there's actually a range of values |
---|
0:17:44 | that would you a range of values for alpha that would actually you'll the pretty |
---|
0:17:48 | good |
---|
0:17:49 | good results |
---|
0:17:52 | so results so far without parsing drum running on a bit out of time but |
---|
0:17:58 | basically the best you can do is you roughly around fifteen percent of the absolute |
---|
0:18:04 | best you can use the best we can do with automatic methods is on it |
---|
0:18:08 | close that gap by about eighty five percent |
---|
0:18:12 | so that a calm ideas for now is that given interpolation an imprecise estimate of |
---|
0:18:18 | the number of clusters is okay |
---|
0:18:21 | there is a range of adaptation parameters that would yield reasonable results and the best |
---|
0:18:25 | automatic system on gives us within fifteen percent of a system that has access to |
---|
0:18:29 | all speaker labels |
---|
0:18:31 | now that fourth that between allan's talking mine |
---|
0:18:35 | we wonder well |
---|
0:18:36 | i mean this telephone the telephone domain mismatch simple solutions work already |
---|
0:18:41 | and |
---|
0:18:42 | and we'd like to |
---|
0:18:44 | and what we been working on is to explicitly identified the sources of this mismatch |
---|
0:18:49 | and that's kinda ongoing work at the moment but the question just like mitch brought |
---|
0:18:53 | up a couple seconds ago are at the end of alan's five |
---|
0:18:57 | what can we do about telephone to microphone domain mismatch i did the work independently |
---|
0:19:01 | actually did not know that a |
---|
0:19:05 | alanna daniel had done this and this about what i'm about to show is that |
---|
0:19:09 | is not in the paper itself but |
---|
0:19:11 | it's a little just a little at all |
---|
0:19:13 | and lastly what else you can talk about is out of domain detection like what |
---|
0:19:17 | when |
---|
0:19:19 | do i actually when maybe when what is system knowing that it actually needs |
---|
0:19:25 | some additional |
---|
0:19:27 | albeit unlabeled data on but you know that it cannot perform at the level it |
---|
0:19:32 | usually doubts so that's perhaps an instance of like outlier detection or something like that |
---|
0:19:38 | that we can also we will look into on that something sort of a future |
---|
0:19:42 | work kind of thing |
---|
0:19:43 | so |
---|
0:19:45 | what i will really quickly show is a quick visualization using some low dimensional embedding |
---|
0:19:51 | is actually |
---|
0:19:54 | and basically what we're gonna start with is |
---|
0:19:57 | if you have switchboard |
---|
0:19:59 | and sre and those are these are all the i-vectors in there and i'm gonna |
---|
0:20:03 | collapse |
---|
0:20:04 | a lot i-vectors into a very low dimensional space which is why just looks very |
---|
0:20:07 | cloudy at the moment |
---|
0:20:09 | it's harder to |
---|
0:20:10 | a fit a lot of points into |
---|
0:20:12 | into a into a small space and still have them preserve their to their relative |
---|
0:20:17 | distances |
---|
0:20:18 | however this is |
---|
0:20:19 | if i try to learn first off and i using unsupervised |
---|
0:20:24 | and betting that |
---|
0:20:25 | it just takes all the data and learns on some low dimensional visualization here |
---|
0:20:29 | and then i apply the colouring is to the spline |
---|
0:20:32 | so what it shows here is that we have switchboard |
---|
0:20:35 | in blue and we have the sre data in red and you can kinda see |
---|
0:20:39 | that there is a little bit of separation |
---|
0:20:42 | you |
---|
0:20:43 | perhaps right but their the can also a little bit on top of each other |
---|
0:20:47 | now to be a just one other point set it talked about earlier |
---|
0:20:53 | if we just took that subset of |
---|
0:20:57 | at that magical subset the gave us that three not have percentage that magical subset |
---|
0:21:01 | of switchboard we get this in green and we have the sre in the red |
---|
0:21:06 | as well and so they're pretty uniformly distributed a round the sre data itself |
---|
0:21:10 | right |
---|
0:21:12 | on the other hand |
---|
0:21:14 | if you just |
---|
0:21:15 | if you just the remaining amount of data |
---|
0:21:17 | and we leave it in blue the old switchboard stuff |
---|
0:21:20 | they're actually like a little farther away then the rest of the sre itself so |
---|
0:21:25 | that kind of maybe that gives some idea of how things work r y |
---|
0:21:31 | a what performance was as once and |
---|
0:21:34 | however if you take a look at |
---|
0:21:37 | telephone and microphone |
---|
0:21:38 | if you do same |
---|
0:21:39 | it's a it's you same kind of an embedding |
---|
0:21:42 | then |
---|
0:21:44 | you will |
---|
0:21:45 | get it completely different a slight a much more separate sort of |
---|
0:21:51 | visualization and that sort of just illustrate that i think telephone and microphone |
---|
0:21:56 | can be a harder problem however i guess initial results have also shown that is |
---|
0:22:00 | actually not as bad as maybe this visualization shows some other stop there |
---|
0:22:05 | and take any questions |
---|
0:22:22 | you said that you have found that the language is not the cost of these |
---|
0:22:27 | domain mismatch how to find that |
---|
0:22:31 | let me think so |
---|
0:22:32 | but like basically |
---|
0:22:36 | well i basically hold |
---|
0:22:38 | like the different languages out |
---|
0:22:40 | note that the various different languages out of |
---|
0:22:43 | of the sre and of the sre data and just try to basically see whether |
---|
0:22:48 | that was |
---|
0:22:51 | that would be like distinctly different from that of the |
---|
0:22:56 | sorry |
---|
0:22:56 | sorry no the one that i basically on looked at it and saw |
---|
0:23:02 | whether or not so on |
---|
0:23:03 | the different languages are clustered together in a sense |
---|
0:23:09 | that's a in general that's how we what about trying to tease apart whether or |
---|
0:23:13 | not the languages |
---|
0:23:14 | where |
---|
0:23:16 | at a source of a domain mismatch |
---|
0:23:21 | so you look at t s and you can just like that's on now |
---|
0:23:25 | no |
---|
0:23:29 | let's talk offline about that i'm actually for getting some of the details of that |
---|
0:23:33 | it of the language experiment exactly at the moment but |
---|
0:23:38 | what soft aligned about that |
---|
0:23:40 | sorry |
---|
0:23:46 | this is in no the beginning of the two you have table issue the did |
---|
0:23:52 | you know if you the |
---|
0:23:53 | of what used for training u v and also |
---|
0:23:57 | it did you try this which is |
---|
0:24:00 | put in the training switchboard in a city |
---|
0:24:03 | that yes we did originally and there was one terribly different |
---|
0:24:08 | there does very just about the same okay thanks there's really no difference |
---|
0:24:18 | so sweet little then mix to zero were collected over a wide range will use |
---|
0:24:24 | so maybe the your easy dependent variable shows the evolution of the telephone network and |
---|
0:24:31 | how |
---|
0:24:32 | speech is transmitted of the telephone that the now compared to the in nine and |
---|
0:24:38 | ninety nine |
---|
0:24:39 | absolutely no it totally on that's actually one of the that's almost exactly a sentence |
---|
0:24:44 | at it like that we wrote in a and yes |
---|
0:24:46 | and that's a |
---|
0:24:48 | a potential like a hypothesis that |
---|
0:24:50 | i'm certainly willing to leave thanks |
---|
0:24:54 | even a related question |
---|
0:24:56 | the p lda has the within and between speaker covariance parameters so |
---|
0:25:03 | which of those most need to be adapted with moving from switchboard two |
---|
0:25:07 | the mixer a think that shown |
---|
0:25:13 | go with |
---|
0:25:16 | this one right |
---|
0:25:18 | so |
---|
0:25:19 | the one that most needs to be adapted would be that within class |
---|
0:25:25 | variability relative to |
---|
0:25:27 | the across class at the it shown in so that we just |
---|
0:25:32 | the speakers the speaker distribution |
---|
0:25:34 | so i more this constant exactly but the left channels |
---|
0:25:39 | so that we which is what you need more weight within |
---|
0:25:44 | it's very even and |
---|