0:00:06okay i'm going to talk about the what we did together with uh when you only be suggesting this one
0:00:12uh
0:00:13in the scope of the
0:00:14last
0:00:15and a simulation
0:00:17per speaker
0:00:18or it was the motivation beginning
0:00:21this is
0:00:21uh before the line of my presentation
0:00:24i would try to motivate and interviews the problem we went to phase and we went to sell
0:00:29and
0:00:30since my words but it wasn't related to connectionist speech recognition i
0:00:33i will
0:00:34uh have a look to the very basic cell so
0:00:37and then they will
0:00:39interviews the
0:00:40uh the
0:00:41the novel features obtained using this work but they call
0:00:44the information that would features
0:00:46for speaker recognition
0:00:47and then we go uh we we good
0:00:49we will go to the the experiments and
0:00:51so conclusions and some
0:00:53feature to work or work ideas
0:00:56so
0:00:56uh the main motivation was
0:00:58uh we wanted to participate in this
0:01:00and me stipulation
0:01:02we saw that the best systems were using
0:01:04um
0:01:05an arm is
0:01:06different amount of
0:01:07systems
0:01:08combining them
0:01:09and what
0:01:11i want i'm not going to mention them but eh
0:01:14you know there are many men is possible subsystems
0:01:16and
0:01:18i'm on all of them i was
0:01:19particularly attractive uh directed by
0:01:22but what are usually called like level features that it's in close relation with the prior session
0:01:27and basically this
0:01:28this system say to use
0:01:29speaker adaptation transforms employed in asr systems for speaker detection
0:01:34features
0:01:35and our proposed as alternatives to four times a capsule
0:01:39features that are the most commonly used one
0:01:42and you know we have the the work of a and veracity
0:01:44in fact uh the working press it into the it's very closely we
0:01:47but it was never samples that were in that
0:01:49in another
0:01:51in another
0:01:52with some difference of course
0:01:53the same
0:01:54and basically
0:01:55uh uh
0:01:56what is done in this work
0:01:58uh is
0:01:59places to use
0:02:00uh weights they re
0:02:01from mllr transforms
0:02:03to produce a grimace in the back doors
0:02:05uh
0:02:06concatenate them and use
0:02:07this is four coefficients to model
0:02:09speaker
0:02:10uh support vector machines
0:02:12so
0:02:13what is the problem
0:02:15we propose that at least one of them
0:02:17i don't like
0:02:18we'd have been always working
0:02:19uh we i read a a a and then a tremendous sense of this is that on the neural networks
0:02:24remarkable systems
0:02:26and
0:02:28and
0:02:29i wish to show later
0:02:31some characteristics but uh the
0:02:33the main problem nor the motivation for this work is that
0:02:35we can no
0:02:36we cannot use
0:02:38a typical adaptation methods like
0:02:40mllr that are usually used in
0:02:43and gaussian approaches
0:02:45so
0:02:45what i try
0:02:46to doing this work at the very beginning
0:02:49it began with
0:02:50to see if i can do something similar to the motor transformation for
0:02:53i've read
0:02:54uh
0:02:55um
0:02:56systems and if we can use it
0:02:57uh to obtain the speaker
0:02:59information into china
0:03:01a speaker discussion system
0:03:03and it with the farthest 'em
0:03:04some
0:03:05baseline systems in that in that
0:03:07it's very with us tonight
0:03:08uh telephone bill for condition
0:03:11so
0:03:12the minister
0:03:13with some of the irony is a basis for
0:03:16uh
0:03:18uh
0:03:18you know for the one
0:03:21someone do the probably don't are not very related
0:03:23basically we have been working on this
0:03:25uh for some applications
0:03:27mainly for business than prescription
0:03:29uh but also for telephony get a telephone applications
0:03:32and for some other languages but
0:03:34our main focus so quickly or two is
0:03:36um
0:03:38and usually is considered
0:03:39that was the way he works is that will replace the gaussian
0:03:42for a neural network and all cases are and will be emulated perceptron
0:03:47and we use this uh
0:03:48the probability estimations
0:03:50as the dubs a a as the
0:03:53i'll probably this
0:03:54or as
0:03:55as the break this uh
0:03:57the pasta probably this L the
0:03:59of the single state hmm
0:04:01and usually
0:04:03uh we have very uh relatively few outputs
0:04:06like
0:04:07just
0:04:07uh phonemes or some other so for that
0:04:09uh
0:04:10units but not not a more
0:04:12the main characteristics is that
0:04:13or
0:04:14is that they are usually considered but up to classify of the the neural networks
0:04:18the up with these two marks of other streams
0:04:20and
0:04:21they're pretty good for a blind test
0:04:23as we will just leave you
0:04:25and the only time we have
0:04:27uh some problems with context modelling
0:04:29uh and also
0:04:31uh with annotation it's no so or at least they are not so
0:04:34so what estimation methods like in gaussian systems
0:04:37so this is this is
0:04:38uh uh diagram block of
0:04:40or row 'cause my suspicion system for make an english
0:04:43you can see
0:04:44probably
0:04:46so that this
0:04:47here
0:04:47and use this
0:04:49okay
0:04:50okay
0:04:50we can you can you can see a similar streams with different features you pay a fee B B features
0:04:56uh
0:04:57you be with breast the features on
0:04:59this one so modulation spectrum for each of features
0:05:01it's one of them
0:05:02for me it's a different
0:05:03we could later perceptron
0:05:05well trained with with
0:05:06with they uh with like a
0:05:09with transcriptions
0:05:10and everything
0:05:11and we marked
0:05:12uh the similar stint in uh with a simple um
0:05:15problem or
0:05:16rule
0:05:18this probably does this posterior so the use by uh because or
0:05:21to have with a language model the lexical oh no
0:05:24and
0:05:24uh some definitions of the hmm
0:05:27that is the relation between the perot would probably be and
0:05:30and wait
0:05:31phoneme for instance
0:05:32uh represent and that
0:05:33minimal relation also
0:05:35to provide the most likely word
0:05:37or send
0:05:38uh some
0:05:40characteristics of the system
0:05:41in dummies
0:05:42a four
0:05:44and made in nineteen seventy bottle said
0:05:46we have been a one to call in less one group
0:05:48one real time
0:05:50is that a seventy percent
0:05:52one six were right
0:05:54will this will use this in the form of the phonemes some others of phonetic units
0:05:58and we train with a four
0:06:00although the forty hours
0:06:01i would have a program language model
0:06:03that is the interpolation of the transcripts and the
0:06:06and
0:06:07written that uh from newspapers
0:06:09and that a relatively small colours
0:06:12just a four thousand
0:06:14so uh in that let's just say i'm just saying
0:06:17um sorry data
0:06:19for evaluation and uh
0:06:22i needed to
0:06:23the trained on you
0:06:24and you a speech recogniser
0:06:25and it's i i read about a very very very weak
0:06:27system
0:06:28because it is i have access to C T S
0:06:30uh it's speech
0:06:31and basically way what is was to train new remote unit and networks with
0:06:35we don't simply data with the simple 'cause news data
0:06:38and there is some other differences to the system i use
0:06:40in this work is that they
0:06:42i have another another issue mostly with and see a bunch of from today
0:06:45different than features
0:06:46i don't use a monophone M unit
0:06:49and
0:06:49and i did some very informal evaluations yes
0:06:52see for myself how it was working then
0:06:54and in in telephone data conversational telephone is it
0:06:58and i have
0:06:58uh very everywhere
0:07:00i
0:07:01where relate
0:07:02uh but anyway
0:07:04this
0:07:04recognise is used for
0:07:05uh two purposes for support for someone
0:07:07as in a uh generate a phonetic alignment with the descriptions of the provided by nist
0:07:12and
0:07:13and also for for training the
0:07:16the speaker adaptation
0:07:17the summation
0:07:20so
0:07:20uh how how can we
0:07:22uh the other
0:07:23i will be needed and then works
0:07:25to a speaker information or whatever else
0:07:28uh
0:07:28there are several approaches but i
0:07:30some basic
0:07:31the two of them
0:07:32the first one would be uh starting from i speak in that and then i'm open an mlp network
0:07:38uh
0:07:38we can do uh the rival uh of what propagation algorithm
0:07:42and
0:07:43mm
0:07:44what
0:07:44we started with a network of anyone train it
0:07:47instead of random
0:07:48wait
0:07:49and with about voice and i went and we had that the weights and
0:07:52and that so
0:07:53uh the other
0:07:54think we can do it's probably
0:07:55just a they'd some of the weights forces that
0:07:58the ones that we go from the the last hidden layer to that
0:08:01to the output layer
0:08:03the price of something
0:08:04more interesting to do
0:08:06it's too
0:08:06to modify the the structure of the detector
0:08:09of the mlp network
0:08:11and tried
0:08:12not too
0:08:13to modify the speaker independent component
0:08:15and that's what we can do for instance well we can get
0:08:18there's more that for most of the phonetic level that would be
0:08:20to uh uh
0:08:22some kind of transformation at the output of the problems and try to
0:08:25but that to the speaker characteristics and on the other hand you can
0:08:28try to the same
0:08:29at the acoustic level
0:08:31try to add that the
0:08:32the features the input features to the characteristics of the
0:08:35from that could from the speaker dependent of the characteristics of the speaker independent
0:08:39uh system so
0:08:41this last solution
0:08:43i i did some just for a desire to verify that this could work on it works
0:08:47and and i i from that what that that was the best one
0:08:51for yourself application so
0:08:53hey
0:08:54decided to try
0:08:56also forced to get
0:08:58and here we have a um
0:09:00a typical mlp neville with just one could allow yeah
0:09:03is it impolite or the feel i don't open later
0:09:06and
0:09:07how can we train dissertation
0:09:09uh features or this other additions
0:09:11uh
0:09:12lattices
0:09:13basically we incorporate a newly nine
0:09:16lighter than the beginning
0:09:18and we apply
0:09:19uh there but replication algorithm
0:09:21as usual i mean
0:09:22we have data would labels we it make the forward propagation compute the output of the network
0:09:27there are
0:09:28we do the
0:09:29okay
0:09:30the quality cover we do that
0:09:32about the position of the yeah well
0:09:33and then
0:09:35when it comes to the the weight
0:09:36i'm sorry
0:09:37well
0:09:38opening
0:09:39no but
0:09:40okay when it comes to
0:09:41to the bit the weight
0:09:42we just the the the date of the of the linear would never and we can
0:09:46we keep
0:09:47froze and the the the speaker independent component
0:09:50so
0:09:51let me
0:09:53okay
0:09:54about this
0:09:55the formation of a normalisation uh
0:09:57well i seaside it's intended them up in common the switched over the representation that consists
0:10:02the mlp uh performance
0:10:04and it can be considered a kind of sorry
0:10:06right hand
0:10:06as for the normalisation
0:10:08but with some
0:10:09a special characteristics because
0:10:11we are not imposing any any
0:10:13a restriction in addition to the base station process i mean
0:10:16we don't have a
0:10:17a target speaker that we try to normalise
0:10:19that that uh
0:10:20the data
0:10:21and
0:10:22and according to previous works
0:10:24it seems that it's also
0:10:25i don't stick to depend on i mean
0:10:26if we train the transformation network
0:10:28with
0:10:29i
0:10:30a speaker independent network behind and it changed is a speaker independent network
0:10:34that instead of having one hidden linux that's too
0:10:36it doesn't works anymore
0:10:38so
0:10:38uh
0:10:39it has some kind of
0:10:40the pendant
0:10:41of the detector
0:10:42uh
0:10:43well we have trained
0:10:44it's withstand the the marketing from that to say
0:10:46um with a diagonal buttons vintage metrics
0:10:49and
0:10:50when we use implemented of the same speaker
0:10:53uh what we
0:10:54falcon beginning is that uh
0:10:57it could
0:10:57hopefully if we send the differences we continue a speaker and
0:11:00and so model and that
0:11:02well i thought that would be useful for speaker identification
0:11:06so
0:11:06there so i stuck exactly the features
0:11:09i'd in the phonetic alignment with a nice
0:11:12the stations
0:11:13and
0:11:14train a speaker additions estimation for every segment
0:11:17and it's um
0:11:19a special things that they do is to remove
0:11:21long
0:11:22segments of silence to to avoid background and channel effect
0:11:25in the resulting features
0:11:27and i then just thinking of 'cause what edition that that this
0:11:30that is usually don't in the market
0:11:31in mlp training
0:11:33i just
0:11:34a place um
0:11:35fix the number five books and
0:11:36already said that this was that
0:11:38base and already sticks that they
0:11:39from the
0:11:40what what
0:11:41uh i don't that you think is that instead of
0:11:44training
0:11:45a full matrix
0:11:47uh
0:11:47and fully mean
0:11:49the input usually a fireman mlp it's composed of
0:11:51by the frame
0:11:52but the current frame and and it's context
0:11:55uh
0:11:56and if
0:11:57this for the square matters would be
0:11:59and
0:11:59and feed the number of features and
0:12:02the shape of the context
0:12:04it said that the reason that i
0:12:05i
0:12:06train or
0:12:08right
0:12:08tie the network
0:12:09uh for each frame independently on its position the context
0:12:13so are reduce the size of the of the transformation this chi
0:12:17so networks also um attic
0:12:19and what is our intent to come between them okay
0:12:22and
0:12:23and in addition to that that the the the
0:12:25that the source and the word feature vector
0:12:28uh you also a stack the feature in the meeting
0:12:31the feature mean and variance
0:12:32because it is it is it
0:12:33uh it is very usual to
0:12:35to do mean um but it's not my decision to the
0:12:37to the input of the mlp
0:12:39okay
0:12:40and i do this for
0:12:41for the difference thing to have
0:12:43the plp that could be with that's the modulation spectrum and at sea
0:12:49and for modelling i use support vector machines
0:12:52i i think that the speaker my uh
0:12:54feature vector and uh and i said above are impostor
0:12:57said
0:12:58used as negative examples
0:13:00i use the lips of them
0:13:02with linear kernel and ideas uh i mean that's almost stationary
0:13:05oh the input in the front seat one
0:13:09so
0:13:10let's go to the sperry meant
0:13:11um
0:13:13it's it's a i use the estimated as an extra to show three only the telltale condition
0:13:19uh
0:13:19i used to come to stiff
0:13:21systems
0:13:21to verify the usefulness or not
0:13:24oh this approach
0:13:25uh
0:13:26uh quite simple gmm ubm
0:13:29uh based on
0:13:32based on the features
0:13:33i i remove
0:13:35nonspeech frames or look at
0:13:37no one or two frames based on
0:13:38i well trained as business be it
0:13:40and uh
0:13:42i mean why becomes an alignment of the log energy
0:13:45uh i did that so i mean embodiments a shot and well
0:13:49typical things in
0:13:50ubm
0:13:52this is the set of the to use from previous
0:13:54a summary of relations
0:13:56i also play uh the normal score
0:13:58lemma session
0:14:00it
0:14:00and in addition to that uh can persist it compresses
0:14:04system
0:14:05i L C is uh
0:14:07a supervector are
0:14:08system that the quality of the S B svm
0:14:11and for the uh
0:14:13for the negative said it's i i i did read that the
0:14:15the supervectors from from this speaker models and
0:14:18i'm for the
0:14:20for the battery use data from the previous
0:14:22sorry
0:14:23S R I evaluations
0:14:25and i didn't apply score normalisation because they didn't
0:14:27uh see
0:14:28much improvement probably
0:14:30fig so there's some kind of problem might
0:14:32in my configuration i'm a conclusion
0:14:36uh i did calibration function and uh gender dependent is in the the toolkit
0:14:41by an equal
0:14:42to gain and it in two steps these
0:14:45this has gotten
0:14:47for every single system
0:14:49and later on i did
0:14:50other linear logistic regression
0:14:52and in case of
0:14:53uh yeah
0:14:54doing function of more than one system it's not it at this is that
0:14:57okay
0:14:58uh and i i did pay for every focus validation
0:15:02in the same evaluation set
0:15:04so
0:15:04what
0:15:08i i didn't think carole double colouration because uh
0:15:11it's what the recognition some set for calibration forty one right
0:15:14so
0:15:15and here we have already some results
0:15:18you can see that that works
0:15:19in blue
0:15:21over the course of the individual
0:15:23transformation network C stands
0:15:25based on different features plp but uh
0:15:27well listen a spectrogram and nancy
0:15:29yeah you have
0:15:30the mean detection cost function um
0:15:32point
0:15:34supplied by
0:15:35i i i i for me to say that it's the cost
0:15:37i use the cost of the sre propose an eight
0:15:39not the new all the
0:15:41two thousand nine
0:15:42yeah
0:15:43so
0:15:43and this is the the war right
0:15:46well the the the first thing that that that they want to make about this is that that
0:15:50well is not but it would but it worked
0:15:52and anyway i wasn't sure when a list of the to this
0:15:54and
0:15:56and with this but the individual systems
0:15:58uh we can see probably but the performance of that C
0:16:01the features but uh i don't have a big explanation probably
0:16:04because the feature
0:16:05sizes
0:16:06is bigger but the that i'm not sure or what simply because then that what is but
0:16:10it's over the classifier
0:16:12uh then i did
0:16:13some to other experiments that
0:16:15what's first try to fuse with audiologist information that the four individual systems
0:16:19or even better
0:16:21to try uh to concatenate that there
0:16:23the four
0:16:25david well features
0:16:27and to uh to train a single
0:16:29ordered in a single
0:16:30transformation of the feature vector
0:16:33what uh and we can see a nice improvement
0:16:35using the complete
0:16:36just wasn't at work
0:16:37feature vector
0:16:39um
0:16:43move to the next one
0:16:45this is that the that or
0:16:47comparing the different
0:16:48bayesian systems
0:16:50together with the new proposed in from the pacific on
0:16:52T N svm
0:16:54uh
0:16:56we can see with respect to the gmmubm
0:16:59um
0:17:01about their
0:17:03it performs better that close to the operation point
0:17:05but it seems that it goes it was
0:17:08words
0:17:09or a or a plus list items
0:17:11as long as we go closer to the whatever
0:17:13point
0:17:14and with us to the supervector uh
0:17:17we have a slightly worse performance in close to that
0:17:20the person point and and it works
0:17:23right
0:17:23words
0:17:24in the in the other
0:17:26in the other
0:17:26one to the the car
0:17:28thirteen point
0:17:29and we yeah
0:17:30what do think it's important from these results
0:17:32is that
0:17:33i can achieve more or less similar system
0:17:36some of the baseline systems by comparing to
0:17:38in some cases
0:17:39a bit worse in some cases a bit better but
0:17:42not politically different
0:17:45so
0:17:46the the the the the the final corpus of what's
0:17:48in fact
0:17:49trying to use it for for improving the the baseline systems
0:17:52and this is that the the the results show that the combination
0:17:56and you can see several
0:17:58different combinations
0:17:59these are the two baselines
0:18:01this is the minimum cost
0:18:03obtain deeper
0:18:04right
0:18:04and we can see that when we
0:18:06yeah
0:18:07we incorporated this formation of what features system
0:18:10we have
0:18:11some improvement
0:18:12probably
0:18:13uh
0:18:14it's um
0:18:19that that all the combinations here also
0:18:22so
0:18:24and i'm sure
0:18:25okay with that
0:18:28i mean
0:18:29yeah
0:18:30the conclusions
0:18:31uh
0:18:32what they combine
0:18:33in this work or what they want to do is to show
0:18:36that features that it
0:18:37from N in a a and then it to my meditation techniques
0:18:41can be used for speaker identification
0:18:43in a very similar way to
0:18:45how similar are
0:18:46is used for lotion systems
0:18:48uh i have used uh uh annotation technique
0:18:51technical information network
0:18:53and
0:18:56okay back to base
0:18:57on the recognition of this everlasting transforms
0:19:00and
0:19:01and the mean and variance of the input feature statistics uh should do to perform
0:19:06but well
0:19:07and with respect to the baseline
0:19:09we could see a relatively good performance
0:19:11so cases it in some operation points of
0:19:14all the the the
0:19:15cover it with
0:19:16it was more
0:19:16it was bad another
0:19:18it was worse but more or less
0:19:20uh
0:19:21similar performances
0:19:22and
0:19:23uh we could build five verify that it provides some
0:19:26complementary speaker
0:19:28choose for for for channel that we can have
0:19:30uh or baseline systems
0:19:33uh with respect to
0:19:34to carlisle and future work
0:19:36that we are going in a a
0:19:37or listen uh
0:19:39with these features
0:19:40um
0:19:41we need
0:19:42to assess a better than classified other we case our system because
0:19:46i would have very
0:19:47by a very low they were provided fact
0:19:50and
0:19:50well for discussion and i imagination also for
0:19:53for the station itself because uh
0:19:56probably with a better
0:19:57a speech recognition system would
0:19:59uh we'll have more meaningful features
0:20:02uh
0:20:03we we did almost all the tuning
0:20:06and another one to two characteristics uh
0:20:09base and all these things but that would probably
0:20:10should do something
0:20:12uh
0:20:13more and undertones and which is the relation between the the architecture of the speaker independent and never on the
0:20:18resulting features
0:20:20uh
0:20:20or even to mystical there
0:20:22adaptation method
0:20:23i do not try this
0:20:24adaptation of the output of the problem is
0:20:26um
0:20:28we have also i would have some of these things some us to meet at the
0:20:32it is a bit
0:20:33we can say
0:20:33and
0:20:35also uh apply in but everything compensation like now
0:20:39and so the nothing can really work in other things
0:20:41with interest in it
0:20:42and into the something similar to
0:20:44what is only in in people um for language identification letter
0:20:47that is
0:20:48use in
0:20:49several mlp
0:20:50uh networks from different languages and
0:20:53the reading this transformation networks for every of these languages without
0:20:56phonetic alignment and and then
0:20:58to get in a in a single feature vector and
0:21:01and this way of making the the the the approach
0:21:04uh not needed for the asr descriptions and finally
0:21:07making it also language independent
0:21:09and
0:21:11that's all
0:21:13okay
0:21:24okay
0:21:24questions
0:21:30actually
0:21:30chris slot
0:21:32uh
0:21:33no
0:21:34yeah i think it and number
0:21:36like i don't know
0:21:37number
0:21:38oh
0:21:42oh
0:21:47that one
0:21:48one
0:21:50okay some
0:21:51um
0:21:53lost
0:21:54um
0:21:55right
0:21:56a lot
0:21:57yeah
0:21:58um systems
0:21:59yeah
0:22:00yeah
0:22:01you know
0:22:02normalisation
0:22:03five
0:22:03so
0:22:04oh
0:22:05uh
0:22:07oh
0:22:09sure
0:22:11right
0:22:11um
0:22:12right
0:22:12normalisation
0:22:13no um
0:22:14i just the randomisation of the input of the svm
0:22:18modelling
0:22:19uh i didn't do
0:22:20a modelling uh also when i was doing testing but i i didn't do
0:22:24any other normalisation to the
0:22:26feature vectors
0:22:27in the rents either one
0:22:28i think it's conditional and
0:22:30in this
0:22:31uh support vector machine approaches
0:22:33just one
0:22:34some
0:22:34features
0:22:35just
0:22:36no
0:22:37no
0:22:37not not
0:22:38uh
0:22:40it's true
0:22:41yeah
0:22:42but
0:22:42well i and number should but it was between the the the svm
0:22:45we
0:22:46go with this i mean it will select
0:22:48this
0:22:49features that are more important i i mean
0:22:51i i didn't read in a different way that
0:22:53if you just coming from plp or
0:22:55i i just let the this ubm to learn
0:22:58what he thought it was better
0:23:00didn't do anything
0:23:02in this way
0:23:08uh
0:23:10this morning to mobilise the ubm it can be
0:23:15speaker not to to
0:23:18i'll close system model
0:23:21and if you use one of my
0:23:24oh
0:23:24oh yeah
0:23:26and to train neural network much more data than not
0:23:32one more thing
0:23:34and uh so why not too many
0:23:37rich but from the old people from one
0:23:41uh i i think it differs when you will
0:23:43should be because it and get it very well
0:23:45you're talking about
0:23:45way way idea and still with a random initialisation of the mlp network
0:23:50for training
0:23:52oh
0:23:53that was a layer of the moment
0:23:56the soft mask
0:23:57yeah
0:23:58yeah
0:23:59so why the force
0:24:00the one that works well mark
0:24:02just
0:24:03right
0:24:04you need only so much
0:24:06you can
0:24:08that
0:24:11uh
0:24:11the D C
0:24:15uh
0:24:17yeah
0:24:18i have
0:24:18a soft but
0:24:19max output here
0:24:21yeah
0:24:21and i don't have any other any other softmax output
0:24:24anyway this
0:24:25the linear input network and
0:24:27and i'm not i'm not doing any kind of non nonlinear in section at this point
0:24:32it's a
0:24:34am i think so
0:24:35but
0:24:35the
0:24:37no there's no
0:24:38nonlinearly stationary i dunno if an answering to you
0:24:43i
0:24:43oh
0:24:47no they didn't have or is it just it's a it's
0:24:50uh
0:24:51uh speech features
0:24:52yeah p2p or
0:24:54okay or
0:24:55and
0:24:56sorry and it's
0:24:57uh the
0:24:58the current frame and its context
0:25:00not only uh use anything context of
0:25:02there are two
0:25:04but uh
0:25:05but it's it's it's it's it's feature
0:25:11yeah
0:25:12so would you like to slide forty
0:25:19table
0:25:20um
0:25:21uh
0:25:23as a baseline
0:25:24but just like how much is it
0:25:26uh
0:25:27how many
0:25:27map estimation
0:25:29ah i did five and probably the support of okay relation
0:25:33um but i i think it was
0:25:35right
0:25:36this
0:25:36yeah
0:25:37yes i did five map iterations
0:25:39yes so i improprieties them but
0:25:40uh you you did
0:25:42five map map iterations before
0:25:44sitting
0:25:45do your is for him
0:25:47yeah
0:25:48so
0:25:48uh
0:25:49we found
0:25:50that
0:25:51yeah
0:25:52one
0:25:53yeah i i
0:25:54if only they
0:25:55right i i
0:25:57it's a
0:25:58to control
0:25:59yeah uh
0:25:59but but but we verified
0:26:01well
0:26:02i'm not completion but
0:26:03uh in that basic gmm ubm
0:26:05with five we got better even if we we go farther away
0:26:09we got
0:26:10uh
0:26:11a slight improvement
0:26:12but uh we and verified it when we moved to the supervector we do
0:26:16the so uh
0:26:17and and i realised i could do you want to david that it
0:26:20this was not a good idea
0:26:22probably
0:26:22well that's
0:26:23probably um
0:26:25with the but the configuration i would have
0:26:27uh
0:26:28okay
0:26:28i'm sure
0:26:29but the performance and as a purveyor in the supervector
0:26:32system
0:26:33sure i i realise that
0:26:35fig
0:26:51oh
0:26:51on the loss
0:26:53right
0:26:56X
0:26:57yeah
0:26:57okay
0:26:58oh
0:26:59sure
0:26:59uh_huh
0:27:00you
0:27:01yeah
0:27:01so
0:27:02how much
0:27:04oh
0:27:05hmmm
0:27:08oh
0:27:08and
0:27:09well
0:27:09it improves
0:27:10right
0:27:11the
0:27:12yeah
0:27:12no no i i like the way it was a
0:27:14uh too much but probably there is
0:27:16oh
0:27:16fig configuration problems because they see that people
0:27:19i get
0:27:20very nice improvement with
0:27:21no
0:27:22i don't know if it's because they'll tell only
0:27:24uh that the prove it
0:27:26yeah they get the improvement it's not
0:27:28so let's say i tried with
0:27:30uh
0:27:30different dimensionalities
0:27:32and the
0:27:34it improves
0:27:35yeah
0:27:35but the but it was not moving from
0:27:37i don't know how much i had here
0:27:39it wasn't moving from
0:27:42the six point fifty nine to three
0:27:45it was less
0:27:46and that
0:27:47um
0:27:48it is part of one
0:27:49um
0:27:50i just one
0:27:52which
0:27:53uh sport
0:27:54not
0:27:55uh
0:27:56um
0:27:57one
0:27:57you
0:27:59oh
0:27:59oh
0:28:00yeah
0:28:01yeah
0:28:02hmmm
0:28:03we live
0:28:04just
0:28:04straight on it
0:28:05no
0:28:06what
0:28:07she
0:28:08um
0:28:09no
0:28:10yeah
0:28:11one
0:28:12um
0:28:14oh
0:28:14oh
0:28:15oops
0:28:15yeah
0:28:17right
0:28:18or a
0:28:19hmmm
0:28:19oh
0:28:20sure
0:28:21also
0:28:22school
0:28:23hmmm
0:28:25oh
0:28:25i i i if it is a did something not right would be because i didn't various incidents
0:28:30okay fine
0:28:31and
0:28:31um i'm not sure
0:28:33i'm not sure i'm just currently live
0:28:35uh svm because probably in
0:28:37i think it was using that probably estimation it beeps and it's not a good idea
0:28:42but then using
0:28:42in both
0:28:43and both systems based on svm i mean
0:28:46and using also in a my proposal so
0:28:48i think i i can improve in that way the noise
0:28:51more was what you were meant in
0:28:52because and and are doing the
0:28:54and and and are doing the the this kind of problem with the background using that
0:28:58the to the the as the sum
0:29:00as being pretty and i think that
0:29:02the prediction the problem
0:29:03prediction is not that would the score for the
0:29:05for the speaker identification
0:29:07thing
0:29:08but
0:29:09well
0:29:11okay