0:00:13i everywhere my name used for channel one
0:00:16i don't formation those are used in university
0:00:20miss what was done during testing with my advice a trap in the chair
0:00:25an initial chose john we do
0:00:28welch and channel shall we shall for small hardly come neighbour countries
0:00:35the title for paper are used in drawing embedding based neural network speaker recognition
0:00:42so that speaking
0:00:51we since there is a lie
0:00:54first
0:00:55i would introduce in the speaker recognition and assassins the
0:01:01the score and the trial experiments
0:01:04we decided to anyway more than recognition and a science institute
0:01:10we also introduce the proposed systems
0:01:14you at r s i ni and ti variation set
0:01:20right now we can cluster better
0:01:25so that's following task
0:01:29the speaker recognition system can be divided into a speaker verification and speaker identification
0:01:38as shown in figure
0:01:41when the system used in speech
0:01:44the speaker verification system where reply is this target speakers voice
0:01:51and the speaker identification this is that is to find out what's more insidious
0:02:02and sre so abbreviation of when i was speaker recognition
0:02:08it is invading based may still pose a contents toolman parts
0:02:13bounding speech coding bathing trying to break a speaker major component
0:02:21many studies that an asr system we surely he's paid performance than traditional methods
0:02:28such as i-vector
0:02:30it's also more robust thinking in their attempt to bidirectional across languages and errors
0:02:38but it also has a significant initial coming
0:02:42that is high computing resources requirement
0:02:48in this work
0:02:50we focus on foreign speaker vad trader
0:02:54especially the new and their work it's training classes
0:02:59these souls although we use of for high computing resource requirements
0:03:08we briefly introduce the new and their work and they sat system
0:03:13the figure shows the process of conventional hand the a neural network to capturing data
0:03:21the frequencies the future starting point twenty one they work as to where there were
0:03:26there is
0:03:27korean air same available there
0:03:31the output of a single and they one there exist a target
0:03:34we choose to be in clean vad
0:03:38however to get across speakers in it
0:03:41then when they were need to be wheelchair
0:03:50no change yes
0:03:52the output of the second and then one layer using close to the classification head
0:03:57the dimension of classification is the number of speaker in the training data
0:04:04if the
0:04:06the those things calculated according to the functional with five
0:04:11and the depending on a work will be updated according to loss
0:04:17this process will be rejected this several times a query or lower of completing source
0:04:24when the number of model are very close to the parser is larger
0:04:29well computing resources are required
0:04:32this is why we decided parents appear
0:04:38that was puking to describe the pilot experiments and discriminant or without
0:04:45used low experimental safety
0:04:48we try to simplify the use of dataset
0:04:52only clustering was used to develop a love one with a model
0:04:57also avoided using data how attention
0:05:04and using dimensional mel-frequency cepstral coefficients
0:05:09just trying to use of the simple
0:05:12all the of by
0:05:14it to train energy based vad
0:05:18utterance and the speed you to your
0:05:21in the remaining features will be normalized
0:05:28kinda experiments
0:05:30we blatantly meetings the training data and we use the neural network training i x
0:05:36this indian subcellular of computing resources or
0:05:44you know that can speaker mentioned component
0:05:48we use gaussian period of the year in do not use score normalization
0:05:53and experience or evaluate the sre eight evaluation set
0:06:03we focus on one there was saying it's training process
0:06:07and trying some more than taken out
0:06:11no running will show this taken an interesting the corresponding results
0:06:18there's is experiments are of the clean air in the new and their walk
0:06:24the thing you are several clusters are pretending probably
0:06:29the conventional statistical in their next each pair the stairway
0:06:35however
0:06:36several attentive clean using the tension model to a pitch for a deeper where it
0:06:44the screenwriter right shows no situation of them are right
0:06:49the level features on writing them several parts
0:06:54which are first its own waiting statical radio
0:07:01we chartered the attention models and the real number so that
0:07:06you notation mono form a true results are a continuation function
0:07:12a real mean concerns where it's a virus
0:07:16the results actually in the table
0:07:19we can see the performance of the two attention models we use
0:07:23when we can also sing the impact of different space numbers
0:07:28by the way
0:07:29these settings things instead of noise model number of its in the new and their
0:07:33work architecture we use
0:07:39this like shown the or iterative minus of mixed
0:07:45we also tested its performance in combination with real knows and energy
0:07:51they o'hare cooper for us
0:07:54and then circumventing used nine million queens the hyper parameters that need
0:08:06you use the screen incoming is out of some entire with
0:08:11only one t v news enables quality a sometimes used for this integration
0:08:18especially when the man architecture is to the relative knowledge
0:08:23our experiments and the minimization is very suitable for the anyone the what
0:08:30and it's always improves the performance
0:08:35we also evaluate the selection of some hyper parameters
0:08:40it's also been an initial to increase the best sex and modified and then rescaled
0:08:46you illustrated
0:08:50next
0:08:51i will briefly introduce our proposed is that
0:08:57can you to the final experiment
0:09:00the proposed system used large-scale training data
0:09:05good how commentators it is also used to increase the diversity quantity of training data
0:09:11making quantity very times ago where you know
0:09:15in addition
0:09:17changing a character a house also larger
0:09:22we call so bucks at this we choose and keeping a more parameters as the
0:09:28new linear work architecture
0:09:34this analysis shows the changing the lowest accuracy during training
0:09:40the bottleneck decreases the results of the training set
0:09:44in the range you want increases the validation set
0:09:48it can be found that a when the system used previously mentioned second marriages
0:09:54no components of follows the accuracy in the two sixteen to be consistent
0:10:04training of found in model across a lot of computing use also
0:10:09different begin training strategy is also need to be decided according to the application scenarios
0:10:17all conditions strategy we deem so that you think of these two issues
0:10:23as shown in the vq
0:10:25and can change in the final model
0:10:27we use subsystems with different began training strategies
0:10:32finally
0:10:33and we should note that we propose salaries toolkit
0:10:39the results of the philistines instead of chips in coral right of by point one
0:10:45six presents a
0:10:47exactly eighteen evaluation set
0:10:51to sum up
0:10:53there is a constant are able to between speaker recognition is instance
0:10:59we trained without for the j
0:11:02so this work some ran several more than two marriages where we believe that there
0:11:08quickly pack
0:11:10and problem right the adjustments trajectory so according to the experimental results
0:11:16besides
0:11:18the results of the fusing system we developed a are also at least square