0:00:14i variable
0:00:16the to have we really fair use the i per speaker characterization using key and
0:00:22then
0:00:25sure there's
0:00:27speaker i four nist sre two so
0:00:31the right
0:00:33my and gently one
0:00:38basically nine
0:00:40first we like that you a large
0:00:45my boss range
0:00:47and that we use the used a
0:00:50five
0:00:51and we the tree
0:00:56about the punch
0:00:58the network based speaker dataset
0:01:02and three demonstrate very what also
0:01:06and
0:01:07because the mainstream mixture
0:01:10different
0:01:11i for one thing works structure what
0:01:14oops
0:01:15such as convolution one they work
0:01:21i did you walk
0:01:23here
0:01:26the lowest eer
0:01:27a vectorized that's or
0:01:31in speaker baiting
0:01:34area cordoned sartre soccer but it to freeze
0:01:38a pension
0:01:39in that picture
0:01:41so
0:01:42we can is t
0:01:44use a better to better talk of these two to speaker recognition
0:01:52this paper is process speaker characterization
0:01:56using active they only work
0:01:58don't
0:01:59sure that
0:02:00then we don't work a protection a call at a robust protection
0:02:06the speaker
0:02:10and the
0:02:12well
0:02:13right dependability
0:02:15used
0:02:17is are
0:02:18the variation that that's the
0:02:21the next baseline if the park on speaker recognition evaluation
0:02:27kentucky by the
0:02:29you first nation on thirty two hours passed
0:02:32and there are large
0:02:34since
0:02:35nineteen ninety six
0:02:40for real application different sure i'm sorry features
0:02:46but what
0:02:47right feature
0:02:48it makes the speech
0:02:51the nist sre ten show
0:02:58i will take years but wasn't makes the
0:03:03mastery power
0:03:05right proposed the first neural network based
0:03:09speaker weighting
0:03:11i also has brought before
0:03:15feature errors
0:03:17final by a couple of its the
0:03:24no milk based speaker eight
0:03:28is the
0:03:29mainstream or coded
0:03:32speaker recognition
0:03:34and thus
0:03:36first speaker
0:03:37speaker mister a
0:03:40t you know based structure
0:03:43you know network structure
0:03:46for
0:03:47two part
0:03:48first
0:03:49the speech you will be cost
0:03:53for label
0:03:55representation
0:03:56followed by rocks the
0:03:58these tickle forty
0:04:02been
0:04:03there are two
0:04:04second but
0:04:05therefore
0:04:07tends to who
0:04:10and you're we
0:04:12is true first
0:04:13there
0:04:14the combined than for others
0:04:17speaker very
0:04:20in this study
0:04:22i for the
0:04:23well
0:04:25we praise
0:04:25the
0:04:26second it so there
0:04:28you can with their
0:04:30robust they're
0:04:31according to
0:04:36work
0:04:37structure
0:04:41in addition
0:04:43i also used
0:04:46attention there too
0:04:48you're right
0:04:50the statistical put it there
0:04:53accordingly
0:04:55what structure press at the receiver tension
0:05:00speaker but
0:05:07in this study
0:05:09but i australian feature extraction are
0:05:13based k to find a good features
0:05:15for speaker rate
0:05:18through acoustic features there are trendy for all go far
0:05:22the first male frequency catch a quite feature
0:05:27i cory and three
0:05:29basically
0:05:30okay recognition
0:05:33you know
0:05:36the service
0:05:37mel-scale filter be attach with each accordingly
0:05:42p
0:05:46to me
0:05:47could be well it backwards with your check
0:05:51for kind of data local station
0:05:54are used
0:05:55took it seven
0:05:56you cultural for each of the top
0:06:00the you're saying and data points that if the
0:06:03current to wrap
0:06:05the original audio file
0:06:07which each but between
0:06:10no
0:06:12utterance
0:06:14no problems
0:06:18in this thing
0:06:20is the simulated impulse response
0:06:24i used to cover all reaching or
0:06:27right column
0:06:29okay
0:06:31right in aspects problems
0:06:34so it
0:06:35speech vision
0:06:38try to one for speech
0:06:40two
0:06:41like that's
0:06:44well just as a
0:06:46original reach
0:06:49the last
0:06:50the that you a patient
0:06:52original
0:06:53what if i
0:06:55gail
0:06:56which the training data
0:06:58very approach or four
0:07:00but you advantage future or right
0:07:04by using
0:07:06such for kernel in addition
0:07:10there are
0:07:11seven corpus
0:07:14origin
0:07:14that are it
0:07:22thus are train artificial
0:07:26instead
0:07:27nist sre
0:07:29switchboard
0:07:30bonastre
0:07:31it aspect
0:07:33that was therefore it after
0:07:35do correctly for
0:07:37q
0:07:38we should okay first and sit
0:07:42i for one clean speech
0:07:45for our molding
0:07:48one utterances from eighty six summon speaker
0:07:52but i
0:07:54it's a huge amount of it
0:07:59well you material should it also nist sre sound and eight
0:08:04it is i two so that night in a heartbeat
0:08:09the most
0:08:10available training data which
0:08:13because the state yes
0:08:16it can be expressed are all speech
0:08:18you know in speech
0:08:21only
0:08:21well do you or but to me but
0:08:26and i
0:08:27so
0:08:28it for me for feature extraction
0:08:34right we are sure
0:08:40a couple minutes the i it weights
0:08:43there
0:08:43national institute of standards
0:08:46and technology matched speaker recognition evaluation task
0:08:52sre it was sort of a start to that night
0:08:59experimental results showed that the cost structure their decision cost function
0:09:07well the
0:09:08going segment
0:09:09two
0:09:10and
0:09:10zero point
0:09:13see
0:09:13right
0:09:14two
0:09:17which the nist
0:09:18this idea to start it
0:09:21and decide to a nightly evaluation it has the respectively
0:09:30this figure this table
0:09:33chaudhari
0:09:36well allows you know that
0:09:39the best performance
0:09:42there are fixed
0:09:46i compare the first and second
0:09:50segment variable speed but it
0:09:53they also come
0:09:55see if a feature
0:09:58well all we can
0:10:00fun
0:10:02filled up in
0:10:03these each feature
0:10:06awful
0:10:06you know this the feature
0:10:11we also
0:10:14so i
0:10:15the first
0:10:16segment i
0:10:18speaker big be weighted a second
0:10:22the speaker but something the second their speaker at
0:10:29for the first their speaker
0:10:32i
0:10:34result
0:10:35so
0:10:36i think
0:10:37both the speaker
0:10:40first bears a bit
0:10:42they for dimension of the image
0:10:46we can use the score fusion
0:10:49okay vector itself
0:10:58since
0:10:59i file
0:11:00filter bank feature was a feature vector function
0:11:07and also be noted that the cost fifty and draws attention c
0:11:12and eighty dollars
0:11:15we so what role
0:11:20extent also mention i'll for sure
0:11:23the next frame
0:11:24therefore it should
0:11:28what are trained based on
0:11:30the pen
0:11:32each feature
0:11:34this type of show
0:11:36we can find
0:11:37by using white
0:11:39role
0:11:40for in
0:11:41they will refer to ensure
0:11:43we can pick the performance
0:11:51finally
0:11:52well so that all call
0:11:55and ninety six and
0:11:57by using expensive but it is that file and feature and then it is the
0:12:02back and scoring
0:12:05why final submission
0:12:08that is
0:12:10where it is
0:12:12much
0:12:14each year suspension
0:12:17bic we wish the
0:12:19so q two
0:12:21one two cards
0:12:24once your feet it's
0:12:28do you got but not for right
0:12:33for
0:12:35pretty much are you
0:12:38this table show
0:12:40by the final file for this site tools on it
0:12:44it is i thought it right
0:12:48you deterioration
0:12:55that we show that a portion
0:13:01this paper to use that system so
0:13:04to a
0:13:05next slide so that night
0:13:08ct has task
0:13:09i'm scroll neural network
0:13:12structure
0:13:13which operates on india and at a at least
0:13:17and you know extra tight shot
0:13:20it showed up and have your
0:13:23and you may speak at
0:13:24there and sixty you know the lp and feature analysis
0:13:30i used
0:13:32channel that's k
0:13:33we did
0:13:34feature
0:13:36mixer six sre
0:13:38so which what a watch therefore
0:13:41that one
0:13:42be a huge
0:13:44six
0:13:46no prior for
0:13:48because our compensation is that what we
0:13:52be well in that the of available training there
0:13:57the proposed mixer shooter it should
0:14:01this year
0:14:02score
0:14:03you or initially suitable for
0:14:07to zero
0:14:09contrary nine five
0:14:12the
0:14:12next
0:14:13this idea to start at sre two thousand nine that the original dataset back
0:14:22thank you
0:14:23thank you very much