0:00:13 | i everywhere my name used for channel one |
---|
0:00:16 | i don't formation those are used in university |
---|
0:00:20 | miss what was done during testing with my advice a trap in the chair |
---|
0:00:25 | an initial chose john we do |
---|
0:00:28 | welch and channel shall we shall for small hardly come neighbour countries |
---|
0:00:35 | the title for paper are used in drawing embedding based neural network speaker recognition |
---|
0:00:42 | so that speaking |
---|
0:00:51 | we since there is a lie |
---|
0:00:54 | first |
---|
0:00:55 | i would introduce in the speaker recognition and assassins the |
---|
0:01:01 | the score and the trial experiments |
---|
0:01:04 | we decided to anyway more than recognition and a science institute |
---|
0:01:10 | we also introduce the proposed systems |
---|
0:01:14 | you at r s i ni and ti variation set |
---|
0:01:20 | right now we can cluster better |
---|
0:01:25 | so that's following task |
---|
0:01:29 | the speaker recognition system can be divided into a speaker verification and speaker identification |
---|
0:01:38 | as shown in figure |
---|
0:01:41 | when the system used in speech |
---|
0:01:44 | the speaker verification system where reply is this target speakers voice |
---|
0:01:51 | and the speaker identification this is that is to find out what's more insidious |
---|
0:02:02 | and sre so abbreviation of when i was speaker recognition |
---|
0:02:08 | it is invading based may still pose a contents toolman parts |
---|
0:02:13 | bounding speech coding bathing trying to break a speaker major component |
---|
0:02:21 | many studies that an asr system we surely he's paid performance than traditional methods |
---|
0:02:28 | such as i-vector |
---|
0:02:30 | it's also more robust thinking in their attempt to bidirectional across languages and errors |
---|
0:02:38 | but it also has a significant initial coming |
---|
0:02:42 | that is high computing resources requirement |
---|
0:02:48 | in this work |
---|
0:02:50 | we focus on foreign speaker vad trader |
---|
0:02:54 | especially the new and their work it's training classes |
---|
0:02:59 | these souls although we use of for high computing resource requirements |
---|
0:03:08 | we briefly introduce the new and their work and they sat system |
---|
0:03:13 | the figure shows the process of conventional hand the a neural network to capturing data |
---|
0:03:21 | the frequencies the future starting point twenty one they work as to where there were |
---|
0:03:26 | there is |
---|
0:03:27 | korean air same available there |
---|
0:03:31 | the output of a single and they one there exist a target |
---|
0:03:34 | we choose to be in clean vad |
---|
0:03:38 | however to get across speakers in it |
---|
0:03:41 | then when they were need to be wheelchair |
---|
0:03:50 | no change yes |
---|
0:03:52 | the output of the second and then one layer using close to the classification head |
---|
0:03:57 | the dimension of classification is the number of speaker in the training data |
---|
0:04:04 | if the |
---|
0:04:06 | the those things calculated according to the functional with five |
---|
0:04:11 | and the depending on a work will be updated according to loss |
---|
0:04:17 | this process will be rejected this several times a query or lower of completing source |
---|
0:04:24 | when the number of model are very close to the parser is larger |
---|
0:04:29 | well computing resources are required |
---|
0:04:32 | this is why we decided parents appear |
---|
0:04:38 | that was puking to describe the pilot experiments and discriminant or without |
---|
0:04:45 | used low experimental safety |
---|
0:04:48 | we try to simplify the use of dataset |
---|
0:04:52 | only clustering was used to develop a love one with a model |
---|
0:04:57 | also avoided using data how attention |
---|
0:05:04 | and using dimensional mel-frequency cepstral coefficients |
---|
0:05:09 | just trying to use of the simple |
---|
0:05:12 | all the of by |
---|
0:05:14 | it to train energy based vad |
---|
0:05:18 | utterance and the speed you to your |
---|
0:05:21 | in the remaining features will be normalized |
---|
0:05:28 | kinda experiments |
---|
0:05:30 | we blatantly meetings the training data and we use the neural network training i x |
---|
0:05:36 | this indian subcellular of computing resources or |
---|
0:05:44 | you know that can speaker mentioned component |
---|
0:05:48 | we use gaussian period of the year in do not use score normalization |
---|
0:05:53 | and experience or evaluate the sre eight evaluation set |
---|
0:06:03 | we focus on one there was saying it's training process |
---|
0:06:07 | and trying some more than taken out |
---|
0:06:11 | no running will show this taken an interesting the corresponding results |
---|
0:06:18 | there's is experiments are of the clean air in the new and their walk |
---|
0:06:24 | the thing you are several clusters are pretending probably |
---|
0:06:29 | the conventional statistical in their next each pair the stairway |
---|
0:06:35 | however |
---|
0:06:36 | several attentive clean using the tension model to a pitch for a deeper where it |
---|
0:06:44 | the screenwriter right shows no situation of them are right |
---|
0:06:49 | the level features on writing them several parts |
---|
0:06:54 | which are first its own waiting statical radio |
---|
0:07:01 | we chartered the attention models and the real number so that |
---|
0:07:06 | you notation mono form a true results are a continuation function |
---|
0:07:12 | a real mean concerns where it's a virus |
---|
0:07:16 | the results actually in the table |
---|
0:07:19 | we can see the performance of the two attention models we use |
---|
0:07:23 | when we can also sing the impact of different space numbers |
---|
0:07:28 | by the way |
---|
0:07:29 | these settings things instead of noise model number of its in the new and their |
---|
0:07:33 | work architecture we use |
---|
0:07:39 | this like shown the or iterative minus of mixed |
---|
0:07:45 | we also tested its performance in combination with real knows and energy |
---|
0:07:51 | they o'hare cooper for us |
---|
0:07:54 | and then circumventing used nine million queens the hyper parameters that need |
---|
0:08:06 | you use the screen incoming is out of some entire with |
---|
0:08:11 | only one t v news enables quality a sometimes used for this integration |
---|
0:08:18 | especially when the man architecture is to the relative knowledge |
---|
0:08:23 | our experiments and the minimization is very suitable for the anyone the what |
---|
0:08:30 | and it's always improves the performance |
---|
0:08:35 | we also evaluate the selection of some hyper parameters |
---|
0:08:40 | it's also been an initial to increase the best sex and modified and then rescaled |
---|
0:08:46 | you illustrated |
---|
0:08:50 | next |
---|
0:08:51 | i will briefly introduce our proposed is that |
---|
0:08:57 | can you to the final experiment |
---|
0:09:00 | the proposed system used large-scale training data |
---|
0:09:05 | good how commentators it is also used to increase the diversity quantity of training data |
---|
0:09:11 | making quantity very times ago where you know |
---|
0:09:15 | in addition |
---|
0:09:17 | changing a character a house also larger |
---|
0:09:22 | we call so bucks at this we choose and keeping a more parameters as the |
---|
0:09:28 | new linear work architecture |
---|
0:09:34 | this analysis shows the changing the lowest accuracy during training |
---|
0:09:40 | the bottleneck decreases the results of the training set |
---|
0:09:44 | in the range you want increases the validation set |
---|
0:09:48 | it can be found that a when the system used previously mentioned second marriages |
---|
0:09:54 | no components of follows the accuracy in the two sixteen to be consistent |
---|
0:10:04 | training of found in model across a lot of computing use also |
---|
0:10:09 | different begin training strategy is also need to be decided according to the application scenarios |
---|
0:10:17 | all conditions strategy we deem so that you think of these two issues |
---|
0:10:23 | as shown in the vq |
---|
0:10:25 | and can change in the final model |
---|
0:10:27 | we use subsystems with different began training strategies |
---|
0:10:32 | finally |
---|
0:10:33 | and we should note that we propose salaries toolkit |
---|
0:10:39 | the results of the philistines instead of chips in coral right of by point one |
---|
0:10:45 | six presents a |
---|
0:10:47 | exactly eighteen evaluation set |
---|
0:10:51 | to sum up |
---|
0:10:53 | there is a constant are able to between speaker recognition is instance |
---|
0:10:59 | we trained without for the j |
---|
0:11:02 | so this work some ran several more than two marriages where we believe that there |
---|
0:11:08 | quickly pack |
---|
0:11:10 | and problem right the adjustments trajectory so according to the experimental results |
---|
0:11:16 | besides |
---|
0:11:18 | the results of the fusing system we developed a are also at least square |
---|