0:00:14 | i variable |
---|
0:00:16 | the to have we really fair use the i per speaker characterization using key and |
---|
0:00:22 | then |
---|
0:00:25 | sure there's |
---|
0:00:27 | speaker i four nist sre two so |
---|
0:00:31 | the right |
---|
0:00:33 | my and gently one |
---|
0:00:38 | basically nine |
---|
0:00:40 | first we like that you a large |
---|
0:00:45 | my boss range |
---|
0:00:47 | and that we use the used a |
---|
0:00:50 | five |
---|
0:00:51 | and we the tree |
---|
0:00:56 | about the punch |
---|
0:00:58 | the network based speaker dataset |
---|
0:01:02 | and three demonstrate very what also |
---|
0:01:06 | and |
---|
0:01:07 | because the mainstream mixture |
---|
0:01:10 | different |
---|
0:01:11 | i for one thing works structure what |
---|
0:01:14 | oops |
---|
0:01:15 | such as convolution one they work |
---|
0:01:21 | i did you walk |
---|
0:01:23 | here |
---|
0:01:26 | the lowest eer |
---|
0:01:27 | a vectorized that's or |
---|
0:01:31 | in speaker baiting |
---|
0:01:34 | area cordoned sartre soccer but it to freeze |
---|
0:01:38 | a pension |
---|
0:01:39 | in that picture |
---|
0:01:41 | so |
---|
0:01:42 | we can is t |
---|
0:01:44 | use a better to better talk of these two to speaker recognition |
---|
0:01:52 | this paper is process speaker characterization |
---|
0:01:56 | using active they only work |
---|
0:01:58 | don't |
---|
0:01:59 | sure that |
---|
0:02:00 | then we don't work a protection a call at a robust protection |
---|
0:02:06 | the speaker |
---|
0:02:10 | and the |
---|
0:02:12 | well |
---|
0:02:13 | right dependability |
---|
0:02:15 | used |
---|
0:02:17 | is are |
---|
0:02:18 | the variation that that's the |
---|
0:02:21 | the next baseline if the park on speaker recognition evaluation |
---|
0:02:27 | kentucky by the |
---|
0:02:29 | you first nation on thirty two hours passed |
---|
0:02:32 | and there are large |
---|
0:02:34 | since |
---|
0:02:35 | nineteen ninety six |
---|
0:02:40 | for real application different sure i'm sorry features |
---|
0:02:46 | but what |
---|
0:02:47 | right feature |
---|
0:02:48 | it makes the speech |
---|
0:02:51 | the nist sre ten show |
---|
0:02:58 | i will take years but wasn't makes the |
---|
0:03:03 | mastery power |
---|
0:03:05 | right proposed the first neural network based |
---|
0:03:09 | speaker weighting |
---|
0:03:11 | i also has brought before |
---|
0:03:15 | feature errors |
---|
0:03:17 | final by a couple of its the |
---|
0:03:24 | no milk based speaker eight |
---|
0:03:28 | is the |
---|
0:03:29 | mainstream or coded |
---|
0:03:32 | speaker recognition |
---|
0:03:34 | and thus |
---|
0:03:36 | first speaker |
---|
0:03:37 | speaker mister a |
---|
0:03:40 | t you know based structure |
---|
0:03:43 | you know network structure |
---|
0:03:46 | for |
---|
0:03:47 | two part |
---|
0:03:48 | first |
---|
0:03:49 | the speech you will be cost |
---|
0:03:53 | for label |
---|
0:03:55 | representation |
---|
0:03:56 | followed by rocks the |
---|
0:03:58 | these tickle forty |
---|
0:04:02 | been |
---|
0:04:03 | there are two |
---|
0:04:04 | second but |
---|
0:04:05 | therefore |
---|
0:04:07 | tends to who |
---|
0:04:10 | and you're we |
---|
0:04:12 | is true first |
---|
0:04:13 | there |
---|
0:04:14 | the combined than for others |
---|
0:04:17 | speaker very |
---|
0:04:20 | in this study |
---|
0:04:22 | i for the |
---|
0:04:23 | well |
---|
0:04:25 | we praise |
---|
0:04:25 | the |
---|
0:04:26 | second it so there |
---|
0:04:28 | you can with their |
---|
0:04:30 | robust they're |
---|
0:04:31 | according to |
---|
0:04:36 | work |
---|
0:04:37 | structure |
---|
0:04:41 | in addition |
---|
0:04:43 | i also used |
---|
0:04:46 | attention there too |
---|
0:04:48 | you're right |
---|
0:04:50 | the statistical put it there |
---|
0:04:53 | accordingly |
---|
0:04:55 | what structure press at the receiver tension |
---|
0:05:00 | speaker but |
---|
0:05:07 | in this study |
---|
0:05:09 | but i australian feature extraction are |
---|
0:05:13 | based k to find a good features |
---|
0:05:15 | for speaker rate |
---|
0:05:18 | through acoustic features there are trendy for all go far |
---|
0:05:22 | the first male frequency catch a quite feature |
---|
0:05:27 | i cory and three |
---|
0:05:29 | basically |
---|
0:05:30 | okay recognition |
---|
0:05:33 | you know |
---|
0:05:36 | the service |
---|
0:05:37 | mel-scale filter be attach with each accordingly |
---|
0:05:42 | p |
---|
0:05:46 | to me |
---|
0:05:47 | could be well it backwards with your check |
---|
0:05:51 | for kind of data local station |
---|
0:05:54 | are used |
---|
0:05:55 | took it seven |
---|
0:05:56 | you cultural for each of the top |
---|
0:06:00 | the you're saying and data points that if the |
---|
0:06:03 | current to wrap |
---|
0:06:05 | the original audio file |
---|
0:06:07 | which each but between |
---|
0:06:10 | no |
---|
0:06:12 | utterance |
---|
0:06:14 | no problems |
---|
0:06:18 | in this thing |
---|
0:06:20 | is the simulated impulse response |
---|
0:06:24 | i used to cover all reaching or |
---|
0:06:27 | right column |
---|
0:06:29 | okay |
---|
0:06:31 | right in aspects problems |
---|
0:06:34 | so it |
---|
0:06:35 | speech vision |
---|
0:06:38 | try to one for speech |
---|
0:06:40 | two |
---|
0:06:41 | like that's |
---|
0:06:44 | well just as a |
---|
0:06:46 | original reach |
---|
0:06:49 | the last |
---|
0:06:50 | the that you a patient |
---|
0:06:52 | original |
---|
0:06:53 | what if i |
---|
0:06:55 | gail |
---|
0:06:56 | which the training data |
---|
0:06:58 | very approach or four |
---|
0:07:00 | but you advantage future or right |
---|
0:07:04 | by using |
---|
0:07:06 | such for kernel in addition |
---|
0:07:10 | there are |
---|
0:07:11 | seven corpus |
---|
0:07:14 | origin |
---|
0:07:14 | that are it |
---|
0:07:22 | thus are train artificial |
---|
0:07:26 | instead |
---|
0:07:27 | nist sre |
---|
0:07:29 | switchboard |
---|
0:07:30 | bonastre |
---|
0:07:31 | it aspect |
---|
0:07:33 | that was therefore it after |
---|
0:07:35 | do correctly for |
---|
0:07:37 | q |
---|
0:07:38 | we should okay first and sit |
---|
0:07:42 | i for one clean speech |
---|
0:07:45 | for our molding |
---|
0:07:48 | one utterances from eighty six summon speaker |
---|
0:07:52 | but i |
---|
0:07:54 | it's a huge amount of it |
---|
0:07:59 | well you material should it also nist sre sound and eight |
---|
0:08:04 | it is i two so that night in a heartbeat |
---|
0:08:09 | the most |
---|
0:08:10 | available training data which |
---|
0:08:13 | because the state yes |
---|
0:08:16 | it can be expressed are all speech |
---|
0:08:18 | you know in speech |
---|
0:08:21 | only |
---|
0:08:21 | well do you or but to me but |
---|
0:08:26 | and i |
---|
0:08:27 | so |
---|
0:08:28 | it for me for feature extraction |
---|
0:08:34 | right we are sure |
---|
0:08:40 | a couple minutes the i it weights |
---|
0:08:43 | there |
---|
0:08:43 | national institute of standards |
---|
0:08:46 | and technology matched speaker recognition evaluation task |
---|
0:08:52 | sre it was sort of a start to that night |
---|
0:08:59 | experimental results showed that the cost structure their decision cost function |
---|
0:09:07 | well the |
---|
0:09:08 | going segment |
---|
0:09:09 | two |
---|
0:09:10 | and |
---|
0:09:10 | zero point |
---|
0:09:13 | see |
---|
0:09:13 | right |
---|
0:09:14 | two |
---|
0:09:17 | which the nist |
---|
0:09:18 | this idea to start it |
---|
0:09:21 | and decide to a nightly evaluation it has the respectively |
---|
0:09:30 | this figure this table |
---|
0:09:33 | chaudhari |
---|
0:09:36 | well allows you know that |
---|
0:09:39 | the best performance |
---|
0:09:42 | there are fixed |
---|
0:09:46 | i compare the first and second |
---|
0:09:50 | segment variable speed but it |
---|
0:09:53 | they also come |
---|
0:09:55 | see if a feature |
---|
0:09:58 | well all we can |
---|
0:10:00 | fun |
---|
0:10:02 | filled up in |
---|
0:10:03 | these each feature |
---|
0:10:06 | awful |
---|
0:10:06 | you know this the feature |
---|
0:10:11 | we also |
---|
0:10:14 | so i |
---|
0:10:15 | the first |
---|
0:10:16 | segment i |
---|
0:10:18 | speaker big be weighted a second |
---|
0:10:22 | the speaker but something the second their speaker at |
---|
0:10:29 | for the first their speaker |
---|
0:10:32 | i |
---|
0:10:34 | result |
---|
0:10:35 | so |
---|
0:10:36 | i think |
---|
0:10:37 | both the speaker |
---|
0:10:40 | first bears a bit |
---|
0:10:42 | they for dimension of the image |
---|
0:10:46 | we can use the score fusion |
---|
0:10:49 | okay vector itself |
---|
0:10:58 | since |
---|
0:10:59 | i file |
---|
0:11:00 | filter bank feature was a feature vector function |
---|
0:11:07 | and also be noted that the cost fifty and draws attention c |
---|
0:11:12 | and eighty dollars |
---|
0:11:15 | we so what role |
---|
0:11:20 | extent also mention i'll for sure |
---|
0:11:23 | the next frame |
---|
0:11:24 | therefore it should |
---|
0:11:28 | what are trained based on |
---|
0:11:30 | the pen |
---|
0:11:32 | each feature |
---|
0:11:34 | this type of show |
---|
0:11:36 | we can find |
---|
0:11:37 | by using white |
---|
0:11:39 | role |
---|
0:11:40 | for in |
---|
0:11:41 | they will refer to ensure |
---|
0:11:43 | we can pick the performance |
---|
0:11:51 | finally |
---|
0:11:52 | well so that all call |
---|
0:11:55 | and ninety six and |
---|
0:11:57 | by using expensive but it is that file and feature and then it is the |
---|
0:12:02 | back and scoring |
---|
0:12:05 | why final submission |
---|
0:12:08 | that is |
---|
0:12:10 | where it is |
---|
0:12:12 | much |
---|
0:12:14 | each year suspension |
---|
0:12:17 | bic we wish the |
---|
0:12:19 | so q two |
---|
0:12:21 | one two cards |
---|
0:12:24 | once your feet it's |
---|
0:12:28 | do you got but not for right |
---|
0:12:33 | for |
---|
0:12:35 | pretty much are you |
---|
0:12:38 | this table show |
---|
0:12:40 | by the final file for this site tools on it |
---|
0:12:44 | it is i thought it right |
---|
0:12:48 | you deterioration |
---|
0:12:55 | that we show that a portion |
---|
0:13:01 | this paper to use that system so |
---|
0:13:04 | to a |
---|
0:13:05 | next slide so that night |
---|
0:13:08 | ct has task |
---|
0:13:09 | i'm scroll neural network |
---|
0:13:12 | structure |
---|
0:13:13 | which operates on india and at a at least |
---|
0:13:17 | and you know extra tight shot |
---|
0:13:20 | it showed up and have your |
---|
0:13:23 | and you may speak at |
---|
0:13:24 | there and sixty you know the lp and feature analysis |
---|
0:13:30 | i used |
---|
0:13:32 | channel that's k |
---|
0:13:33 | we did |
---|
0:13:34 | feature |
---|
0:13:36 | mixer six sre |
---|
0:13:38 | so which what a watch therefore |
---|
0:13:41 | that one |
---|
0:13:42 | be a huge |
---|
0:13:44 | six |
---|
0:13:46 | no prior for |
---|
0:13:48 | because our compensation is that what we |
---|
0:13:52 | be well in that the of available training there |
---|
0:13:57 | the proposed mixer shooter it should |
---|
0:14:01 | this year |
---|
0:14:02 | score |
---|
0:14:03 | you or initially suitable for |
---|
0:14:07 | to zero |
---|
0:14:09 | contrary nine five |
---|
0:14:12 | the |
---|
0:14:12 | next |
---|
0:14:13 | this idea to start at sre two thousand nine that the original dataset back |
---|
0:14:22 | thank you |
---|
0:14:23 | thank you very much |
---|