0:00:14 | and really everyone my name is strong and you from university was so |
---|
0:00:19 | today |
---|
0:00:19 | i will talk about |
---|
0:00:21 | wait spectral time flipped speech signals for all those this whooping detection |
---|
0:00:28 | forest |
---|
0:00:28 | well let me introduce would be indication for automatic speaker verification |
---|
0:00:36 | nist |
---|
0:00:37 | which is sure |
---|
0:00:38 | for |
---|
0:00:38 | automatic speaker verification |
---|
0:00:41 | as well reliability |
---|
0:00:44 | a little level to a s p means |
---|
0:00:47 | you is |
---|
0:00:48 | or in the remote swooping on x |
---|
0:00:52 | that's pool being a take is somewhat tend to this u k if this is |
---|
0:00:56 | okay we soap opera six |
---|
0:01:00 | this book authors ease you are close artificially produced for sounded like the target speakers |
---|
0:01:07 | are press |
---|
0:01:09 | so |
---|
0:01:10 | the impostor speaker |
---|
0:01:11 | who of pent |
---|
0:01:13 | pooping okay |
---|
0:01:14 | can be both that the yes the target speaker |
---|
0:01:20 | there are some types of us will be not x |
---|
0:01:22 | each can be actually detected |
---|
0:01:26 | text to speech sympathies |
---|
0:01:28 | with conversion |
---|
0:01:30 | and |
---|
0:01:31 | we like okay |
---|
0:01:35 | food being detection is that okay put distinguish |
---|
0:01:39 | right are given a cross he's |
---|
0:01:41 | genuine authors |
---|
0:01:43 | or soap operas |
---|
0:01:46 | you identity claim |
---|
0:01:48 | we spoke authors is exactly the |
---|
0:01:52 | we gotta level |
---|
0:01:54 | how similar those who are classic he's put target speakers utterance |
---|
0:02:00 | therefore |
---|
0:02:01 | whooping detection can protect it is system |
---|
0:02:05 | okay |
---|
0:02:07 | various to being a tax |
---|
0:02:11 | work attacking spoofing attacks |
---|
0:02:13 | we should capture the differences of the frequency response well |
---|
0:02:17 | as shown in this figure |
---|
0:02:20 | the frequency responses |
---|
0:02:22 | between training utterance as food utterance |
---|
0:02:27 | are different |
---|
0:02:28 | for example |
---|
0:02:30 | spoof utterances produced by likely okay |
---|
0:02:33 | contain the attribute |
---|
0:02:35 | of the device e |
---|
0:02:36 | used for the league playoff k |
---|
0:02:38 | such as quite a device |
---|
0:02:40 | and the recording device |
---|
0:02:42 | also to put the utterances |
---|
0:02:44 | produced by speech synthesis and ways combos ms source |
---|
0:02:48 | do not contain the proper dynamic information and the phase information of genuine utterances |
---|
0:02:56 | many researchers |
---|
0:02:58 | convolutional neural networks |
---|
0:03:00 | have been used to capture every available for frequency responses |
---|
0:03:04 | in spectrum based acoustic features |
---|
0:03:11 | as a side note |
---|
0:03:13 | color describe about the spectrum of each signal gleefully |
---|
0:03:18 | the spectrum of speech signal |
---|
0:03:21 | use |
---|
0:03:23 | consistently well |
---|
0:03:24 | two kinds of spectrum |
---|
0:03:26 | one is magnitude spectrum |
---|
0:03:29 | and the other this phase spectrum |
---|
0:03:33 | men into spectrum pace the features have been widely used for sweeping kick action |
---|
0:03:40 | there are some kinds of vanity the spectrum based features |
---|
0:03:44 | such as low power spectrum |
---|
0:03:48 | constant q cepstral coefficients |
---|
0:03:51 | linear frequency cepstral coefficients |
---|
0:03:54 | and so on |
---|
0:03:57 | we are is |
---|
0:03:58 | phase spectrum based the features in less used then |
---|
0:04:02 | and into the spectrum based features |
---|
0:04:06 | well |
---|
0:04:07 | the phase spectrum based features |
---|
0:04:09 | contain |
---|
0:04:10 | useful information for swooping detection |
---|
0:04:13 | there is not contained in many to spectrum |
---|
0:04:17 | in our research |
---|
0:04:19 | we focused on phase spectrum |
---|
0:04:21 | especially |
---|
0:04:23 | we used |
---|
0:04:24 | group delay |
---|
0:04:26 | as of phase spectrum based feature |
---|
0:04:28 | the group delay d is defined |
---|
0:04:31 | yes |
---|
0:04:31 | these you creation |
---|
0:04:35 | in this section also introduce our proposed in this so |
---|
0:04:40 | forest are explainable |
---|
0:04:42 | hi flicking for what people's vector |
---|
0:04:46 | managed to that spectrum is not affected by the time order of the signal |
---|
0:04:52 | so |
---|
0:04:53 | the manager spectrum will the will of |
---|
0:04:55 | original signal and pamphlet signal |
---|
0:04:59 | are the same |
---|
0:05:00 | however |
---|
0:05:01 | of phase spectrum used changed |
---|
0:05:04 | when the time order of the signal peacefully |
---|
0:05:07 | it means that |
---|
0:05:09 | you attributes although phase spectrum are changed |
---|
0:05:12 | when the time or notable c or not he's fully |
---|
0:05:16 | based on this fact |
---|
0:05:19 | we also when the time or total the signal is related |
---|
0:05:24 | you identities are not related to spoofing attacks |
---|
0:05:29 | such as language information and |
---|
0:05:32 | right information |
---|
0:05:35 | are changed |
---|
0:05:37 | in contrast |
---|
0:05:39 | you identities |
---|
0:05:40 | that are related to spoofing attacks |
---|
0:05:43 | such as well i victimise information and the recording device information |
---|
0:05:48 | are not changed |
---|
0:05:51 | motivated by these of function |
---|
0:05:55 | we proposed a mess sold |
---|
0:05:57 | using |
---|
0:05:58 | two types of phase spectrum based features to get |
---|
0:06:03 | on to now |
---|
0:06:05 | combination as will be in contention systems |
---|
0:06:07 | have used of a spectrum based features |
---|
0:06:10 | from the original signal only |
---|
0:06:13 | in our research we use |
---|
0:06:15 | not only eight of phase spectrum based feature from the original signal all also |
---|
0:06:21 | of feature |
---|
0:06:23 | from the pine flip signal |
---|
0:06:28 | if a raw some holes |
---|
0:06:31 | we can generate |
---|
0:06:33 | new speech signals |
---|
0:06:35 | have on seen in fact live conditions |
---|
0:06:39 | by using the proposed method |
---|
0:06:42 | and |
---|
0:06:44 | use all both |
---|
0:06:45 | i think than others |
---|
0:06:46 | as you effect well we do seen in fact that variance more efficiently |
---|
0:06:53 | which is are sitting |
---|
0:06:54 | or promising improvements |
---|
0:06:58 | by using two types of features at one time |
---|
0:07:02 | we propose those three kinds of feature combination methods |
---|
0:07:07 | before introducing the feature combination methods |
---|
0:07:11 | are we introduce our baseline |
---|
0:07:13 | the end of base model or just |
---|
0:07:19 | of course you can use any kinds of c n based models |
---|
0:07:23 | and you a in our research |
---|
0:07:25 | we used |
---|
0:07:26 | s here is necessary for |
---|
0:07:28 | after the nn based model |
---|
0:07:32 | as it is necessary for |
---|
0:07:33 | is the fashion police now |
---|
0:07:35 | where |
---|
0:07:36 | s c blocks are integrity into each residual raw |
---|
0:07:41 | only calibrating |
---|
0:07:43 | channelwise responses |
---|
0:07:45 | and as it is necessary for was high rank in a space poop at nineteen |
---|
0:07:50 | challenge |
---|
0:07:55 | one combination mess so |
---|
0:07:57 | is |
---|
0:07:58 | two channel amp |
---|
0:08:00 | where |
---|
0:08:01 | two types of features |
---|
0:08:03 | home ceased well |
---|
0:08:04 | one improve |
---|
0:08:07 | another combination muscled he's embedding level combination |
---|
0:08:13 | the embedding |
---|
0:08:17 | corresponds to |
---|
0:08:17 | all these are still global average probably |
---|
0:08:23 | is met so that can be divided into three missiles |
---|
0:08:27 | the first pass of his |
---|
0:08:29 | concatenate to embedding |
---|
0:08:33 | to make up one emitting vector |
---|
0:08:36 | the second method used to compute a learned a lot of maximum hope to embedding |
---|
0:08:43 | the sort method used to compute element-wise averaging over to embedding |
---|
0:08:51 | you other combination method he's feature metalevel combination |
---|
0:08:56 | the feature and it corresponds to |
---|
0:08:58 | you operable c n |
---|
0:09:01 | if we're competing in billings |
---|
0:09:04 | we compute element-wise |
---|
0:09:06 | maximum or two feature ms |
---|
0:09:10 | and then compute emitting from the combined to feature |
---|
0:09:16 | next |
---|
0:09:17 | our describe the experiments and it results |
---|
0:09:23 | we used a usb throughput twenty nineteen |
---|
0:09:26 | what school |
---|
0:09:27 | and physical access scenario data bases |
---|
0:09:32 | it is widely used |
---|
0:09:35 | it conveys in the field of the swooping detection |
---|
0:09:40 | what's called access |
---|
0:09:43 | quarters the detection of speech synthesis and voice conversion |
---|
0:09:47 | it's got access |
---|
0:09:49 | cars the detection we play okay |
---|
0:09:55 | we used acoustic feature |
---|
0:09:58 | all |
---|
0:09:58 | two hundred fifty seven dimensional |
---|
0:10:01 | group |
---|
0:10:01 | you like |
---|
0:10:04 | fast in for c n |
---|
0:10:07 | for each utterance |
---|
0:10:10 | we extract |
---|
0:10:12 | two types of group delay k |
---|
0:10:16 | one is from the original utterance |
---|
0:10:19 | and the other is from |
---|
0:10:21 | the time flip utterance |
---|
0:10:25 | after the feature extraction we divided each |
---|
0:10:29 | variable length feature |
---|
0:10:31 | into fixed length |
---|
0:10:32 | segments |
---|
0:10:34 | to handle |
---|
0:10:35 | a doublings all utterances |
---|
0:10:38 | in our experiments we set the segment |
---|
0:10:41 | thanks to four hundred frames |
---|
0:10:47 | we use to the evaluation metrics |
---|
0:10:51 | one is |
---|
0:10:53 | eer |
---|
0:10:54 | and the arteries |
---|
0:10:55 | he dcf |
---|
0:10:59 | used paper shows the or policies |
---|
0:11:02 | on the ldc value |
---|
0:11:04 | we highlight the s performance important |
---|
0:11:08 | we mean that so |
---|
0:11:10 | sure that performance on evaluation trials |
---|
0:11:16 | and the f next method |
---|
0:11:18 | sure |
---|
0:11:19 | the best performance on development trials |
---|
0:11:23 | you are don't mess source |
---|
0:11:25 | generally showed offers or promises then |
---|
0:11:28 | baseline |
---|
0:11:33 | is table shows |
---|
0:11:34 | well |
---|
0:11:36 | or policies |
---|
0:11:38 | one the p eight trials |
---|
0:11:41 | the proposed method was sure the error or policies and the baseline |
---|
0:11:47 | except the eer or |
---|
0:11:50 | the two channel missiles one people not tried |
---|
0:11:55 | we mismatch sources |
---|
0:11:57 | sure the best performance on both development and evaluation types |
---|
0:12:05 | in the beginning |
---|
0:12:06 | we mention it |
---|
0:12:09 | magnitude spectrum and a spectrum contain different information |
---|
0:12:14 | so |
---|
0:12:16 | we also be rude |
---|
0:12:17 | the baseline systems that |
---|
0:12:19 | use |
---|
0:12:20 | manage to spectrum based feature |
---|
0:12:23 | in our research |
---|
0:12:26 | we used real power spectrum |
---|
0:12:29 | s the many to spectrum based feature |
---|
0:12:33 | ease baseline systems |
---|
0:12:35 | our fourth fusion be a systems |
---|
0:12:38 | that use |
---|
0:12:39 | phase spectrum based feature |
---|
0:12:43 | i fusion |
---|
0:12:45 | we can utilize information go |
---|
0:12:48 | well as many to and phase spectrum |
---|
0:12:52 | really score level fusion |
---|
0:12:58 | use table shows |
---|
0:12:59 | a performance is |
---|
0:13:01 | all the baseline system that |
---|
0:13:04 | use |
---|
0:13:04 | many to spectrum space it sure as input |
---|
0:13:11 | establish rules |
---|
0:13:12 | or policies of the fused system |
---|
0:13:16 | on the at any scenarios |
---|
0:13:20 | all the systems |
---|
0:13:22 | art showed error or policies that you for fusion |
---|
0:13:27 | the same trend can be shown |
---|
0:13:30 | in the results |
---|
0:13:32 | all though fused system |
---|
0:13:34 | when the pac now you |
---|
0:13:39 | finally conclusions |
---|
0:13:43 | but conventional method |
---|
0:13:44 | you see still phase spectrum |
---|
0:13:47 | problem only something along only |
---|
0:13:50 | in contrast the proposed method |
---|
0:13:53 | you see still based spectrum |
---|
0:13:55 | from the only small and the high flick signals together |
---|
0:14:02 | it has effect on reducing the impact that various |
---|
0:14:08 | and |
---|
0:14:09 | shows what was performance |
---|
0:14:13 | additionally |
---|
0:14:15 | we can achieve |
---|
0:14:16 | more better or policies |
---|
0:14:17 | i fusion with those systems that use |
---|
0:14:21 | many to the spectrum based |
---|
0:14:23 | feature |
---|
0:14:26 | and compare watching my presentation |
---|
0:14:29 | with by |
---|