0:00:06my name is uh as you get can
0:00:08and uh i will present you with the
0:00:10the work we uh
0:00:12we do we can do it uh
0:00:14in L D A
0:00:16yeah which is entitled intraspeaker variability effect
0:00:19speaker verification
0:00:22over the last decade
0:00:24uh the the one of the systems uh the performance of this is that
0:00:28uh
0:00:29is uh
0:00:30very very
0:00:31at uh
0:00:32the performance the
0:00:34have a rich uh a good uh little
0:00:36and uh is it
0:00:37so
0:00:38this permit
0:00:39uh to have allow set of practical application
0:00:42uh like
0:00:43in industry or in forensic application
0:00:46and uh all this uh performance
0:00:49performance are always driven by average error rate
0:00:53and uh
0:00:54uh
0:00:55we don't have a lot uh
0:00:58a lattice to D's
0:00:59on uh the
0:01:01i explanation
0:01:03of
0:01:03the performance viable
0:01:05hmmm
0:01:05and uh on the arrow
0:01:07uh
0:01:08we have a one important which is that doing so in that context mean actually
0:01:13who uh explain the performance viability according to the speaker for five
0:01:18it is a well known that the
0:01:20according to the lens
0:01:21of the training and testing that's out uh the the back of the T the performance liability is very important
0:01:29and uh it was proposed
0:01:30two
0:01:31uh to use the diff there in front of me contain
0:01:34in the
0:01:35two
0:01:36to use the interview
0:01:38showings that
0:01:39uh
0:01:40there is interference performance
0:01:42uh according to
0:01:43this one i mean
0:01:46to do a um
0:01:48our question
0:01:49uh we only work
0:01:50on
0:01:52the training data
0:01:53four
0:01:54one speaker
0:01:55uh the question is uh is that
0:01:57we have several except
0:01:59for the same speaker
0:02:01so
0:02:02uh what is the viability
0:02:04due to the signal sample used to that
0:02:08the speaker point
0:02:10and uh do you also questioned it is
0:02:12what kind of information may explain
0:02:15this difference of performance
0:02:17and uh we propose to use to stew D the number of selected frames the phone and make distribution
0:02:23and it in for uh for naming candlestick different
0:02:28okay
0:02:29uh we use the we use the the ideas is because that's a system which is an ubm gmm approach
0:02:36approach
0:02:37uh with uh that in fact one of these these
0:02:40and uh we use the
0:02:42the the C the this used and uh used for
0:02:46the news that several complaints and uh
0:02:49but we don't do a score normalisation
0:02:52the global I D is uh to uh to do
0:02:56a lot
0:02:56of um
0:02:58of uh trails
0:02:59for the different training samples we have
0:03:02four
0:03:03a speaker
0:03:04and uh we select
0:03:05the best
0:03:06training
0:03:07except
0:03:08and the worst training except
0:03:10for each
0:03:10speaker
0:03:11um
0:03:12the the best
0:03:13training except
0:03:14is uh use might have one is um calculated
0:03:17by the um
0:03:19by many
0:03:20minimise the
0:03:21the percentage of
0:03:23four
0:03:23exception
0:03:24and uh
0:03:25uh forms recreation
0:03:27and it's the same thing we maximise
0:03:29the
0:03:29stage of
0:03:30phone sex option
0:03:31accepts and
0:03:33false
0:03:33action
0:03:35Z we have a
0:03:36it to to set
0:03:38if uh we see selection
0:03:40one
0:03:40uh
0:03:41named mean and that mother name
0:03:45for i mean
0:03:46max
0:03:47and random
0:03:48yeah
0:03:48we do different
0:03:50uh experiment
0:03:51uh we
0:03:52there is exactly the same speakers
0:03:54exactly the same testing except
0:03:57but we change the training except
0:03:59four
0:04:00each set
0:04:03uh we do this uh experiments on two corpora
0:04:07the first is the
0:04:09based on then used uh
0:04:11two thousand eight
0:04:12with the telephonic conversational speech
0:04:15and uh which uh a lance of uh
0:04:18two two minutes
0:04:20uh
0:04:21for for each uh uh
0:04:22samples
0:04:23and that will maximise
0:04:24the number of training except for each speaker
0:04:27we do uh leave one out
0:04:29uh and uh
0:04:31with this per process uh we uh we have a
0:04:35be doing this
0:04:36uh one hundred that seventy one speaker for we have
0:04:40three to uh twenty models
0:04:42but
0:04:44and uh
0:04:45do you also corpus we used is the right for one hundred
0:04:49twenty
0:04:50which is an
0:04:51stooge or recording uh
0:04:54corpora that database
0:04:56we visited exactly the same microphone
0:04:58and uh it is the read speech
0:05:01by uh newspapers and
0:05:02is it
0:05:03oh what a speaker on the T french
0:05:06and uh we have uh
0:05:08more
0:05:08uh females on me
0:05:10and uh
0:05:11for each
0:05:12uh speaker
0:05:13we have a
0:05:14training and
0:05:15testing except
0:05:17and uh it is uh
0:05:19the the the the
0:05:21we we concatenate
0:05:22so the some sentences
0:05:24to have more than
0:05:26uh twenty seconds
0:05:27all the selected frames
0:05:29by itself
0:05:33yeah i'm the heave uh we take a do we we we analyse the the viability due to the training
0:05:40except
0:05:41we see that
0:05:42the uh
0:05:44the equal error rate
0:05:45uh range
0:05:46uh is that
0:05:47four point one person's too
0:05:50twenty one
0:05:51only nine percent
0:05:52for mean that are and for breath
0:05:54uh iran
0:05:55two
0:05:56uh one person to
0:05:58the thirty three person
0:06:00we uh have done a random
0:06:03uh
0:06:04set
0:06:04and the
0:06:05the mean
0:06:06here
0:06:06it's uh
0:06:08with with the um the breath
0:06:10is the is
0:06:11the mean of up
0:06:14different uh run them
0:06:18it is the very and
0:06:20very important gap
0:06:21between according to the
0:06:23so
0:06:24training
0:06:27and the
0:06:28now the important is this
0:06:30to explain the viability
0:06:32and the the question is what kind of
0:06:34information
0:06:35so for
0:06:37for the number of selected frame
0:06:39it's possible to do that
0:06:40we have uh nice
0:06:41and right but for
0:06:43when i make distribution and that for phonemic acoustic difference
0:06:46uh we use
0:06:47only right
0:06:48because it is so
0:06:50mm easier
0:06:51two
0:06:52what is this type of information
0:06:56and uh
0:06:57for i used to uh we have a significant effect
0:07:00well the number of frames
0:07:02but it is something that is controlled in uh breath one hundred
0:07:06twenty so it is an relevant fact so for
0:07:09eight an explanation of uh the difference of uh performance
0:07:12but
0:07:13the other four factors that was important
0:07:15because we have
0:07:16uh more important yeah
0:07:17in this
0:07:18in brief uh one hundred twenty
0:07:21and uh it's not can be explained but
0:07:23the number of
0:07:24uh
0:07:24for free
0:07:27though so for the phonetic uh
0:07:30content
0:07:31uh we for me we do a forced alignment
0:07:33also i mean and i
0:07:36five
0:07:36where the spirits about
0:07:37and uh we correct
0:07:39thus this argument
0:07:41manually
0:07:42and the to analyse the phonemic content
0:07:45uh we
0:07:46just
0:07:47uh for the first time
0:07:48uh counts the number of selected frame for each phoneme
0:07:52we don't man over
0:07:54with a between subjects factor which are the the set and the dependent variables
0:07:58are the number of selected
0:08:01and
0:08:02we see that
0:08:04there is
0:08:04quietly no
0:08:06different
0:08:06on phone it
0:08:07media content
0:08:09between
0:08:09here
0:08:10as for female speakers
0:08:12uh between the mean max
0:08:14and uh the random
0:08:16and the only oh
0:08:18one
0:08:18for names
0:08:19which is uh
0:08:21which is the relevant
0:08:22and the formalities
0:08:24was it the same thing
0:08:25so it's not uh a sufficient to explain the gap
0:08:29of performance
0:08:32oh for the infra phonemic information
0:08:34uh we uh we use the acoustic feature
0:08:38uh for each for names
0:08:39and uh
0:08:40it's uh exactly the same for sitting with a a man of a
0:08:44bit we have we have uh between subject factor of the set
0:08:47and the dependence of i'll
0:08:49are the L S D C the delta that so that's all
0:08:52yeah
0:08:53uh we have a
0:08:54uh important significant difference
0:08:56for L F C and for all the phonemes
0:08:59and the four del sol
0:09:01is an
0:09:02important
0:09:02uh yeah
0:09:04difference
0:09:05four
0:09:06um
0:09:06around majority of uh for names
0:09:08and the mainly
0:09:10stops
0:09:10and several voice
0:09:12but we don't find difference
0:09:14for that utterance
0:09:16and uh this is uh
0:09:18this type of uh analysis
0:09:20um
0:09:22it is challenge and proves that uh the infra permit unique
0:09:25acoustic difference our uh i
0:09:27to be accounted for
0:09:29from
0:09:31and uh so when's the training except she ends
0:09:35uh the uh we have a large performance differences
0:09:39you might not be explained by the number
0:09:41of selected frames
0:09:42or it is a possible factor
0:09:44but not a sufficient proctor
0:09:46and the the form a mixture distribution to account
0:09:49uh explain exactly
0:09:51this is uh got
0:09:52is there a investigation on it
0:09:54to that reminds influence
0:09:56of uh in prof anaemic
0:09:58acoustic
0:10:00and uh
0:10:01that's the the question is to do the drilling
0:10:04between six
0:10:06acoustic
0:10:07uh in phonemic acoustic difference
0:10:09and uh uh higher
0:10:11yeah but
0:10:12four
0:10:13uh from the media
0:10:14information
0:10:15and uh
0:10:16work
0:10:16there is uh in your results
0:10:19since uh the
0:10:20the the summation of the paper
0:10:22and uh we see that
0:10:24uh the intensity is either
0:10:26you mean
0:10:27than that
0:10:28but
0:10:28it is the
0:10:32it's the significance but if you take the mean
0:10:35of
0:10:36the intensity it is uh
0:10:37a very short
0:10:38different
0:10:40there is no difference
0:10:41for uh
0:10:43fundamental
0:10:44top of the peach
0:10:45and the you you can see it's form and here we don't have different
0:10:50and uh
0:10:50we we you say the dissipation of the volumes three and and no difference
0:10:55for uh
0:10:56this type of
0:10:57information
0:10:58and uh
0:10:59it is the same thing for the spectrum
0:11:01um so uh
0:11:02right
0:11:02of the
0:11:03fig
0:11:05so for the future work
0:11:07uh
0:11:08it's the the question it is
0:11:11that the viability may not be only
0:11:14the result
0:11:15all the signal samples
0:11:16and uh
0:11:17maybe the system itself
0:11:19a a a problem
0:11:21and uh
0:11:22now we are working on the linkage between the llr
0:11:27by the frame
0:11:28and
0:11:28the phoneme it
0:11:29distributed description
0:11:30to understand
0:11:31what are the exactly the
0:11:33good for that frame and
0:11:34if it is
0:11:35there is not a link
0:11:36uh with uh funding information
0:11:40thank you
0:11:50question
0:12:06uh
0:12:07i entered
0:12:08and there's two you said that
0:12:09there was no
0:12:11significant difference between the snr
0:12:15yeah
0:12:15oh do
0:12:16by
0:12:17training try out some good three trials
0:12:20yeah that is another difference for
0:12:22there is a difference on uh the acoustic for the L F C C for a for it
0:12:27we have
0:12:28the significant difference for all the finance
0:12:30but
0:12:31uh she if uh we we want to find uh the link
0:12:35with uh i'm here
0:12:36uh features
0:12:38and we don't fine
0:12:39something so
0:12:40the question is uh
0:12:41oh
0:12:42that
0:12:42we don't
0:12:43have found
0:12:44uh with the description
0:12:46the the the description the
0:12:50the the feature we
0:12:51use only used
0:12:52uh in phonetic science
0:12:54to describe
0:12:55the speech
0:12:56actually we don't have find
0:12:58the link between
0:12:59the L X T C
0:13:00and uh
0:13:02and the the the recognition
0:13:03and uh
0:13:04the
0:13:06phonetic
0:13:07uh information in the we don't
0:13:09we don't know
0:13:11uh
0:13:12uh well
0:13:13why
0:13:13yeah we have this type of guy
0:13:16and uh
0:13:17and uh we don't have an explanation
0:13:19actually
0:13:20uh by by the acoustic and the phonetic
0:13:24uh analysis
0:13:26so if you just take your means
0:13:28trials we don't we we selection
0:13:31train
0:13:32turned out
0:13:33and the mean high snr don't know with an hour
0:13:37so don't see a difference in performance
0:13:40sorry
0:13:41you take on your knees trials
0:13:42no no no we we still i mean
0:13:45but eventually you could do yeah yeah
0:13:48yeah we we did something like that in there is to be difference in performance
0:13:53i mean is what you would expect
0:13:54but yes in our training data should be yeah
0:13:57worse performance
0:13:58buttons
0:13:59you
0:14:00not
0:14:01not a break
0:14:02you rattle basically for exactly the the same
0:14:06but
0:14:07maybe there is not so much but you be the the
0:14:11nice
0:14:13'cause
0:14:13maybe the breath they that there is not so much
0:14:16but maybe it's an hour
0:14:19no
0:14:20very
0:14:21that um
0:14:22the viability
0:14:23about the the uh
0:14:25a four position for example there is no viability right
0:14:28okay
0:14:29that no it is exactly the same microphone exactly
0:14:32the only people are are recorded
0:14:35uh oh no
0:14:36as the same day and uh it's
0:14:38there is no viability of the station
0:14:40the unique the only uh this the unique viability
0:14:45is uh is on the speaker
0:14:47so and uh when we have only the information about the speaker
0:14:51we can have
0:14:52uh evaluation like
0:14:54this
0:14:55between one
0:14:56two
0:14:56thirty three percent
0:14:58i think what everybody
0:15:00so
0:15:01it's
0:15:01very
0:15:03and then the the question you
0:15:05how to explain that because that
0:15:07if we can
0:15:08if we can have a an explanation
0:15:11we can the
0:15:11and uh a coffee then score
0:15:13or something like this
0:15:15that
0:15:15can't say that
0:15:16uh okay
0:15:17uh
0:15:18i i know
0:15:19the
0:15:20the the training and i know the the testing
0:15:24detecting the testing sample
0:15:26and uh i can say i can say
0:15:28oh okay for this
0:15:30i i can't
0:15:31i have a a good score
0:15:32and i don't have a a confidence
0:15:34with
0:15:35this doctor
0:15:36but we have an older data i can have
0:15:38uh
0:15:39a good
0:15:39the a score uh would computed
0:15:41and it is
0:15:43it is the objective
0:15:44of
0:15:45this kind of us to do
0:15:46it's a good
0:15:55but
0:16:05what
0:16:07sure
0:16:08hmmm
0:16:08what
0:16:10uh_huh
0:16:11oh
0:16:12some
0:16:14from
0:16:16hmmm
0:16:16hmmm
0:16:17uh_huh
0:16:19um
0:16:22yeah and it's uh yeah
0:16:24the
0:16:25actually boring problem anyway
0:16:27any information we
0:16:28just
0:16:29use
0:16:29the L S C that that that the delta delta
0:16:32and that it was
0:16:33to to check that
0:16:34the there is the
0:16:36a difference
0:16:37because uh at the beginning we don't understand the question now it is the link
0:16:41between
0:16:42uh or the fornication mister
0:16:44and
0:16:45this
0:16:46uh L S C uh
0:16:47which are used because
0:16:49we know that
0:16:50in L A C C and delta we have information
0:16:53but
0:16:53we don't
0:16:54yeah
0:16:55found
0:16:55a link between
0:16:57the test
0:16:58see
0:16:58and the dental
0:16:59and
0:17:00this
0:17:00the
0:17:01the i'll evil
0:17:02uh i phonemic information
0:17:04actually i am working on them
0:17:07the coarticulation information
0:17:09and uh
0:17:10the
0:17:11uh
0:17:11i i the first uh experiments i do we use
0:17:14it was the only with the
0:17:16a trifle
0:17:17and analysing
0:17:18the distribution of the triphones
0:17:20and uh i don't
0:17:21fine
0:17:21difference
0:17:22so uh actually i am a misery go all the locus
0:17:26to see if our with a lexus whether we have here
0:17:30in high school
0:17:31that with raucous we have
0:17:34yeah you use the you know
0:17:36uh not use
0:17:37is um
0:17:38uh you take uh the value of the formants
0:17:41of the second that's a formant
0:17:43at uh
0:17:44then purred
0:17:44and
0:17:45or the beginning of the boy
0:17:47and uh on a fifty percent of the volumes and you
0:17:51you
0:17:52you analysed evaluation
0:17:54between uh
0:17:55as it to to the two values
0:17:57and uh
0:17:58normally if uh there is a a lot of articulation
0:18:01and so the the people
0:18:03uh we you and you have a
0:18:06you are a regression
0:18:07all the value according to the
0:18:09for all the value but if
0:18:11there is no coarticulation
0:18:13uh you have something that is very
0:18:16and uh
0:18:17two
0:18:18yeah
0:18:23uh_huh
0:18:25first
0:18:25fig
0:18:28oh
0:18:28well
0:18:30you yeah
0:18:34oh good
0:18:36or or
0:18:37uh
0:18:39oh
0:18:39for those
0:18:41yeah our
0:18:42uh
0:18:43okay
0:18:44the more you
0:18:46or or
0:18:49the
0:18:52yes yeah
0:18:56it's a it's a good question
0:18:57um
0:18:58yeah you have uh the score
0:19:00the last call
0:19:01four
0:19:03um i is the speaker that on the twenty eight
0:19:07the
0:19:07it is there is
0:19:09a different
0:19:09uh according to the normalisation
0:19:12but it is
0:19:13not compatible
0:19:14with the difference
0:19:15we have
0:19:16in a house normalisation
0:19:18between the
0:19:19the
0:19:19when we select
0:19:20you said to yeah
0:19:24yeah
0:19:25that no we we are trying we are training the
0:19:29the normalisation
0:19:30is the it is something that so we have to do
0:19:33but the problem is uh we have uh
0:19:35a database like yeah right
0:19:37uh it's very difficult because
0:19:39we don't have
0:19:40and now that a lot of uh
0:19:43a lot of data and uh to be able to to have a a good uh a good word
0:19:47and that's who have uh
0:19:49uh would uh
0:19:50all
0:19:51different sub training and testing
0:19:53uh we don't have a lot of
0:19:55uh on that that so it's very difficult to to do
0:19:58the normalisation
0:19:59we if we want
0:20:00to to have a lot of
0:20:02different
0:20:03uh training
0:20:05excel
0:20:09oh
0:20:10or or what
0:20:11two
0:20:12maybe more to each source model one quarter sometimes you can point to
0:20:18oh
0:20:19um
0:20:20we have for the the concatenation it is uh a randomised
0:20:25concatenation
0:20:26we are sure that there is
0:20:28never
0:20:28the same
0:20:29uh samples
0:20:30for testing and training
0:20:32but
0:20:33uh
0:20:34uh it
0:20:34so we we don't
0:20:36combine that actually
0:20:37um
0:20:38for example if if your question is that
0:20:40uh have betrayed try to train
0:20:43right
0:20:43to um
0:20:45to use the the the best
0:20:47uh and uh concatenate the bad
0:20:50to to to have a best
0:20:52model we don't have
0:20:53uh
0:20:54i tried
0:20:55it's uh
0:20:56type of combination
0:20:57a small country
0:20:59you have some recordings of each speaker
0:21:02point
0:21:02time
0:21:05between three and twenty
0:21:08recording yeah
0:21:09and each recording
0:21:10some
0:21:10some some some
0:21:11point in time
0:21:13and
0:21:15according to teach
0:21:16yeah
0:21:17okay
0:21:18strong
0:21:19combining multiple recordings to a more
0:21:22no yeah
0:21:23we we have done um
0:21:25with um
0:21:26to to to have a
0:21:28um
0:21:29samples
0:21:29with
0:21:30for
0:21:30two minutes
0:21:31i mean it's and how
0:21:32uh
0:21:34um
0:21:35phrase selected frame
0:21:36the a and the we
0:21:38we
0:21:39we do the same thing that uh
0:21:42select the what best and the worst with um
0:21:45a longer
0:21:46uh
0:21:47signal
0:21:47and the
0:21:48the the results
0:21:50are
0:21:51this one is that
0:21:52uh the there is
0:21:53let's uh that's also that's why the the curve
0:21:56is that not
0:21:57so
0:21:58so good
0:21:59but uh we have
0:22:00the
0:22:01the set not
0:22:02uh the same yeah
0:22:03that's
0:22:04a gap which is important
0:22:06and uh
0:22:07here it is that the the equal error rate is last one
0:22:10one person
0:22:11and uh here it is um five percent and do we have
0:22:15a lot of frame select
0:22:16yeah
0:22:17which shows more
0:22:19combination of so yeah things from yeah point sometimes or
0:22:25between no no no
0:22:28no
0:22:29no
0:22:29it's uh
0:22:31now because ah it is uh it is
0:22:34yes there there is a it is exactly the same testing for
0:22:38for this curve
0:22:39and this curve
0:22:41so it is uh compare it is possible to compare
0:22:43the
0:22:44that's why the
0:22:45posted to
0:22:47i don't know
0:22:51from
0:22:52sessions which
0:22:53you just
0:22:54no i have no information about it
0:22:57because
0:22:58because the
0:22:59what the sample
0:23:00or
0:23:01uh recording in the same
0:23:03it with the same microphone and exactly
0:23:06the same day so if there is
0:23:07no the there is no uh interior stationed viability
0:23:12there is only
0:23:13uh intraspeaker valuable
0:23:16it is controlled that
0:23:17the speaker hon that's a
0:23:19the
0:23:20the one i want to find an optional
0:23:25for example for half an hour or two
0:23:30open or something
0:23:32yes
0:23:33yeah
0:23:46oh
0:23:46oh
0:23:47right
0:23:48hmmm
0:23:54right
0:23:54hmmm