i variable
the to have we really fair use the i per speaker characterization using key and
then
sure there's
speaker i four nist sre two so
the right
my and gently one
basically nine
first we like that you a large
my boss range
and that we use the used a
five
and we the tree
about the punch
the network based speaker dataset
and three demonstrate very what also
and
because the mainstream mixture
different
i for one thing works structure what
oops
such as convolution one they work
i did you walk
here
the lowest eer
a vectorized that's or
in speaker baiting
area cordoned sartre soccer but it to freeze
a pension
in that picture
so
we can is t
use a better to better talk of these two to speaker recognition
this paper is process speaker characterization
using active they only work
don't
sure that
then we don't work a protection a call at a robust protection
the speaker
and the
well
right dependability
used
is are
the variation that that's the
the next baseline if the park on speaker recognition evaluation
kentucky by the
you first nation on thirty two hours passed
and there are large
since
nineteen ninety six
for real application different sure i'm sorry features
but what
right feature
it makes the speech
the nist sre ten show
i will take years but wasn't makes the
mastery power
right proposed the first neural network based
speaker weighting
i also has brought before
feature errors
final by a couple of its the
no milk based speaker eight
is the
mainstream or coded
speaker recognition
and thus
first speaker
speaker mister a
t you know based structure
you know network structure
for
two part
first
the speech you will be cost
for label
representation
followed by rocks the
these tickle forty
been
there are two
second but
therefore
tends to who
and you're we
is true first
there
the combined than for others
speaker very
in this study
i for the
well
we praise
the
second it so there
you can with their
robust they're
according to
work
structure
in addition
i also used
attention there too
you're right
the statistical put it there
accordingly
what structure press at the receiver tension
speaker but
in this study
but i australian feature extraction are
based k to find a good features
for speaker rate
through acoustic features there are trendy for all go far
the first male frequency catch a quite feature
i cory and three
basically
okay recognition
you know
the service
mel-scale filter be attach with each accordingly
p
to me
could be well it backwards with your check
for kind of data local station
are used
took it seven
you cultural for each of the top
the you're saying and data points that if the
current to wrap
the original audio file
which each but between
no
utterance
no problems
in this thing
is the simulated impulse response
i used to cover all reaching or
right column
okay
right in aspects problems
so it
speech vision
try to one for speech
two
like that's
well just as a
original reach
the last
the that you a patient
original
what if i
gail
which the training data
very approach or four
but you advantage future or right
by using
such for kernel in addition
there are
seven corpus
origin
that are it
thus are train artificial
instead
nist sre
switchboard
bonastre
it aspect
that was therefore it after
do correctly for
q
we should okay first and sit
i for one clean speech
for our molding
one utterances from eighty six summon speaker
but i
it's a huge amount of it
well you material should it also nist sre sound and eight
it is i two so that night in a heartbeat
the most
available training data which
because the state yes
it can be expressed are all speech
you know in speech
only
well do you or but to me but
and i
so
it for me for feature extraction
right we are sure
a couple minutes the i it weights
there
national institute of standards
and technology matched speaker recognition evaluation task
sre it was sort of a start to that night
experimental results showed that the cost structure their decision cost function
well the
going segment
two
and
zero point
see
right
two
which the nist
this idea to start it
and decide to a nightly evaluation it has the respectively
this figure this table
chaudhari
well allows you know that
the best performance
there are fixed
i compare the first and second
segment variable speed but it
they also come
see if a feature
well all we can
fun
filled up in
these each feature
awful
you know this the feature
we also
so i
the first
segment i
speaker big be weighted a second
the speaker but something the second their speaker at
for the first their speaker
i
result
so
i think
both the speaker
first bears a bit
they for dimension of the image
we can use the score fusion
okay vector itself
since
i file
filter bank feature was a feature vector function
and also be noted that the cost fifty and draws attention c
and eighty dollars
we so what role
extent also mention i'll for sure
the next frame
therefore it should
what are trained based on
the pen
each feature
this type of show
we can find
by using white
role
for in
they will refer to ensure
we can pick the performance
finally
well so that all call
and ninety six and
by using expensive but it is that file and feature and then it is the
back and scoring
why final submission
that is
where it is
much
each year suspension
bic we wish the
so q two
one two cards
once your feet it's
do you got but not for right
for
pretty much are you
this table show
by the final file for this site tools on it
it is i thought it right
you deterioration
that we show that a portion
this paper to use that system so
to a
next slide so that night
ct has task
i'm scroll neural network
structure
which operates on india and at a at least
and you know extra tight shot
it showed up and have your
and you may speak at
there and sixty you know the lp and feature analysis
i used
channel that's k
we did
feature
mixer six sre
so which what a watch therefore
that one
be a huge
six
no prior for
because our compensation is that what we
be well in that the of available training there
the proposed mixer shooter it should
this year
score
you or initially suitable for
to zero
contrary nine five
the
next
this idea to start at sre two thousand nine that the original dataset back
thank you
thank you very much