i everywhere my name used for channel one

i don't formation those are used in university

miss what was done during testing with my advice a trap in the chair

an initial chose john we do

welch and channel shall we shall for small hardly come neighbour countries

the title for paper are used in drawing embedding based neural network speaker recognition

so that speaking

we since there is a lie

first

i would introduce in the speaker recognition and assassins the

the score and the trial experiments

we decided to anyway more than recognition and a science institute

we also introduce the proposed systems

you at r s i ni and ti variation set

right now we can cluster better

so that's following task

the speaker recognition system can be divided into a speaker verification and speaker identification

as shown in figure

when the system used in speech

the speaker verification system where reply is this target speakers voice

and the speaker identification this is that is to find out what's more insidious

and sre so abbreviation of when i was speaker recognition

it is invading based may still pose a contents toolman parts

bounding speech coding bathing trying to break a speaker major component

many studies that an asr system we surely he's paid performance than traditional methods

such as i-vector

it's also more robust thinking in their attempt to bidirectional across languages and errors

but it also has a significant initial coming

that is high computing resources requirement

in this work

we focus on foreign speaker vad trader

especially the new and their work it's training classes

these souls although we use of for high computing resource requirements

we briefly introduce the new and their work and they sat system

the figure shows the process of conventional hand the a neural network to capturing data

the frequencies the future starting point twenty one they work as to where there were

there is

korean air same available there

the output of a single and they one there exist a target

we choose to be in clean vad

however to get across speakers in it

then when they were need to be wheelchair

no change yes

the output of the second and then one layer using close to the classification head

the dimension of classification is the number of speaker in the training data

if the

the those things calculated according to the functional with five

and the depending on a work will be updated according to loss

this process will be rejected this several times a query or lower of completing source

when the number of model are very close to the parser is larger

well computing resources are required

this is why we decided parents appear

that was puking to describe the pilot experiments and discriminant or without

used low experimental safety

we try to simplify the use of dataset

only clustering was used to develop a love one with a model

also avoided using data how attention

and using dimensional mel-frequency cepstral coefficients

just trying to use of the simple

all the of by

it to train energy based vad

utterance and the speed you to your

in the remaining features will be normalized

kinda experiments

we blatantly meetings the training data and we use the neural network training i x

this indian subcellular of computing resources or

you know that can speaker mentioned component

we use gaussian period of the year in do not use score normalization

and experience or evaluate the sre eight evaluation set

we focus on one there was saying it's training process

and trying some more than taken out

no running will show this taken an interesting the corresponding results

there's is experiments are of the clean air in the new and their walk

the thing you are several clusters are pretending probably

the conventional statistical in their next each pair the stairway

however

several attentive clean using the tension model to a pitch for a deeper where it

the screenwriter right shows no situation of them are right

the level features on writing them several parts

which are first its own waiting statical radio

we chartered the attention models and the real number so that

you notation mono form a true results are a continuation function

a real mean concerns where it's a virus

the results actually in the table

we can see the performance of the two attention models we use

when we can also sing the impact of different space numbers

by the way

these settings things instead of noise model number of its in the new and their

work architecture we use

this like shown the or iterative minus of mixed

we also tested its performance in combination with real knows and energy

they o'hare cooper for us

and then circumventing used nine million queens the hyper parameters that need

you use the screen incoming is out of some entire with

only one t v news enables quality a sometimes used for this integration

especially when the man architecture is to the relative knowledge

our experiments and the minimization is very suitable for the anyone the what

and it's always improves the performance

we also evaluate the selection of some hyper parameters

it's also been an initial to increase the best sex and modified and then rescaled

you illustrated

next

i will briefly introduce our proposed is that

can you to the final experiment

the proposed system used large-scale training data

good how commentators it is also used to increase the diversity quantity of training data

making quantity very times ago where you know

in addition

changing a character a house also larger

we call so bucks at this we choose and keeping a more parameters as the

new linear work architecture

this analysis shows the changing the lowest accuracy during training

the bottleneck decreases the results of the training set

in the range you want increases the validation set

it can be found that a when the system used previously mentioned second marriages

no components of follows the accuracy in the two sixteen to be consistent

training of found in model across a lot of computing use also

different begin training strategy is also need to be decided according to the application scenarios

all conditions strategy we deem so that you think of these two issues

as shown in the vq

and can change in the final model

we use subsystems with different began training strategies

finally

and we should note that we propose salaries toolkit

the results of the philistines instead of chips in coral right of by point one

six presents a

exactly eighteen evaluation set

to sum up

there is a constant are able to between speaker recognition is instance

we trained without for the j

so this work some ran several more than two marriages where we believe that there

quickly pack

and problem right the adjustments trajectory so according to the experimental results

besides

the results of the fusing system we developed a are also at least square