i everywhere my name used for channel one
i don't formation those are used in university
miss what was done during testing with my advice a trap in the chair
an initial chose john we do
welch and channel shall we shall for small hardly come neighbour countries
the title for paper are used in drawing embedding based neural network speaker recognition
so that speaking
we since there is a lie
first
i would introduce in the speaker recognition and assassins the
the score and the trial experiments
we decided to anyway more than recognition and a science institute
we also introduce the proposed systems
you at r s i ni and ti variation set
right now we can cluster better
so that's following task
the speaker recognition system can be divided into a speaker verification and speaker identification
as shown in figure
when the system used in speech
the speaker verification system where reply is this target speakers voice
and the speaker identification this is that is to find out what's more insidious
and sre so abbreviation of when i was speaker recognition
it is invading based may still pose a contents toolman parts
bounding speech coding bathing trying to break a speaker major component
many studies that an asr system we surely he's paid performance than traditional methods
such as i-vector
it's also more robust thinking in their attempt to bidirectional across languages and errors
but it also has a significant initial coming
that is high computing resources requirement
in this work
we focus on foreign speaker vad trader
especially the new and their work it's training classes
these souls although we use of for high computing resource requirements
we briefly introduce the new and their work and they sat system
the figure shows the process of conventional hand the a neural network to capturing data
the frequencies the future starting point twenty one they work as to where there were
there is
korean air same available there
the output of a single and they one there exist a target
we choose to be in clean vad
however to get across speakers in it
then when they were need to be wheelchair
no change yes
the output of the second and then one layer using close to the classification head
the dimension of classification is the number of speaker in the training data
if the
the those things calculated according to the functional with five
and the depending on a work will be updated according to loss
this process will be rejected this several times a query or lower of completing source
when the number of model are very close to the parser is larger
well computing resources are required
this is why we decided parents appear
that was puking to describe the pilot experiments and discriminant or without
used low experimental safety
we try to simplify the use of dataset
only clustering was used to develop a love one with a model
also avoided using data how attention
and using dimensional mel-frequency cepstral coefficients
just trying to use of the simple
all the of by
it to train energy based vad
utterance and the speed you to your
in the remaining features will be normalized
kinda experiments
we blatantly meetings the training data and we use the neural network training i x
this indian subcellular of computing resources or
you know that can speaker mentioned component
we use gaussian period of the year in do not use score normalization
and experience or evaluate the sre eight evaluation set
we focus on one there was saying it's training process
and trying some more than taken out
no running will show this taken an interesting the corresponding results
there's is experiments are of the clean air in the new and their walk
the thing you are several clusters are pretending probably
the conventional statistical in their next each pair the stairway
however
several attentive clean using the tension model to a pitch for a deeper where it
the screenwriter right shows no situation of them are right
the level features on writing them several parts
which are first its own waiting statical radio
we chartered the attention models and the real number so that
you notation mono form a true results are a continuation function
a real mean concerns where it's a virus
the results actually in the table
we can see the performance of the two attention models we use
when we can also sing the impact of different space numbers
by the way
these settings things instead of noise model number of its in the new and their
work architecture we use
this like shown the or iterative minus of mixed
we also tested its performance in combination with real knows and energy
they o'hare cooper for us
and then circumventing used nine million queens the hyper parameters that need
you use the screen incoming is out of some entire with
only one t v news enables quality a sometimes used for this integration
especially when the man architecture is to the relative knowledge
our experiments and the minimization is very suitable for the anyone the what
and it's always improves the performance
we also evaluate the selection of some hyper parameters
it's also been an initial to increase the best sex and modified and then rescaled
you illustrated
next
i will briefly introduce our proposed is that
can you to the final experiment
the proposed system used large-scale training data
good how commentators it is also used to increase the diversity quantity of training data
making quantity very times ago where you know
in addition
changing a character a house also larger
we call so bucks at this we choose and keeping a more parameters as the
new linear work architecture
this analysis shows the changing the lowest accuracy during training
the bottleneck decreases the results of the training set
in the range you want increases the validation set
it can be found that a when the system used previously mentioned second marriages
no components of follows the accuracy in the two sixteen to be consistent
training of found in model across a lot of computing use also
different begin training strategy is also need to be decided according to the application scenarios
all conditions strategy we deem so that you think of these two issues
as shown in the vq
and can change in the final model
we use subsystems with different began training strategies
finally
and we should note that we propose salaries toolkit
the results of the philistines instead of chips in coral right of by point one
six presents a
exactly eighteen evaluation set
to sum up
there is a constant are able to between speaker recognition is instance
we trained without for the j
so this work some ran several more than two marriages where we believe that there
quickly pack
and problem right the adjustments trajectory so according to the experimental results
besides
the results of the fusing system we developed a are also at least square