0:00:14and really everyone my name is strong and you from university was so
0:00:19today
0:00:19i will talk about
0:00:21wait spectral time flipped speech signals for all those this whooping detection
0:00:28forest
0:00:28well let me introduce would be indication for automatic speaker verification
0:00:36nist
0:00:37which is sure
0:00:38for
0:00:38automatic speaker verification
0:00:41as well reliability
0:00:44a little level to a s p means
0:00:47you is
0:00:48or in the remote swooping on x
0:00:52that's pool being a take is somewhat tend to this u k if this is
0:00:56okay we soap opera six
0:01:00this book authors ease you are close artificially produced for sounded like the target speakers
0:01:07are press
0:01:09so
0:01:10the impostor speaker
0:01:11who of pent
0:01:13pooping okay
0:01:14can be both that the yes the target speaker
0:01:20there are some types of us will be not x
0:01:22each can be actually detected
0:01:26text to speech sympathies
0:01:28with conversion
0:01:30and
0:01:31we like okay
0:01:35food being detection is that okay put distinguish
0:01:39right are given a cross he's
0:01:41genuine authors
0:01:43or soap operas
0:01:46you identity claim
0:01:48we spoke authors is exactly the
0:01:52we gotta level
0:01:54how similar those who are classic he's put target speakers utterance
0:02:00therefore
0:02:01whooping detection can protect it is system
0:02:05okay
0:02:07various to being a tax
0:02:11work attacking spoofing attacks
0:02:13we should capture the differences of the frequency response well
0:02:17as shown in this figure
0:02:20the frequency responses
0:02:22between training utterance as food utterance
0:02:27are different
0:02:28for example
0:02:30spoof utterances produced by likely okay
0:02:33contain the attribute
0:02:35of the device e
0:02:36used for the league playoff k
0:02:38such as quite a device
0:02:40and the recording device
0:02:42also to put the utterances
0:02:44produced by speech synthesis and ways combos ms source
0:02:48do not contain the proper dynamic information and the phase information of genuine utterances
0:02:56many researchers
0:02:58convolutional neural networks
0:03:00have been used to capture every available for frequency responses
0:03:04in spectrum based acoustic features
0:03:11as a side note
0:03:13color describe about the spectrum of each signal gleefully
0:03:18the spectrum of speech signal
0:03:21use
0:03:23consistently well
0:03:24two kinds of spectrum
0:03:26one is magnitude spectrum
0:03:29and the other this phase spectrum
0:03:33men into spectrum pace the features have been widely used for sweeping kick action
0:03:40there are some kinds of vanity the spectrum based features
0:03:44such as low power spectrum
0:03:48constant q cepstral coefficients
0:03:51linear frequency cepstral coefficients
0:03:54and so on
0:03:57we are is
0:03:58phase spectrum based the features in less used then
0:04:02and into the spectrum based features
0:04:06well
0:04:07the phase spectrum based features
0:04:09contain
0:04:10useful information for swooping detection
0:04:13there is not contained in many to spectrum
0:04:17in our research
0:04:19we focused on phase spectrum
0:04:21especially
0:04:23we used
0:04:24group delay
0:04:26as of phase spectrum based feature
0:04:28the group delay d is defined
0:04:31yes
0:04:31these you creation
0:04:35in this section also introduce our proposed in this so
0:04:40forest are explainable
0:04:42hi flicking for what people's vector
0:04:46managed to that spectrum is not affected by the time order of the signal
0:04:52so
0:04:53the manager spectrum will the will of
0:04:55original signal and pamphlet signal
0:04:59are the same
0:05:00however
0:05:01of phase spectrum used changed
0:05:04when the time order of the signal peacefully
0:05:07it means that
0:05:09you attributes although phase spectrum are changed
0:05:12when the time or notable c or not he's fully
0:05:16based on this fact
0:05:19we also when the time or total the signal is related
0:05:24you identities are not related to spoofing attacks
0:05:29such as language information and
0:05:32right information
0:05:35are changed
0:05:37in contrast
0:05:39you identities
0:05:40that are related to spoofing attacks
0:05:43such as well i victimise information and the recording device information
0:05:48are not changed
0:05:51motivated by these of function
0:05:55we proposed a mess sold
0:05:57using
0:05:58two types of phase spectrum based features to get
0:06:03on to now
0:06:05combination as will be in contention systems
0:06:07have used of a spectrum based features
0:06:10from the original signal only
0:06:13in our research we use
0:06:15not only eight of phase spectrum based feature from the original signal all also
0:06:21of feature
0:06:23from the pine flip signal
0:06:28if a raw some holes
0:06:31we can generate
0:06:33new speech signals
0:06:35have on seen in fact live conditions
0:06:39by using the proposed method
0:06:42and
0:06:44use all both
0:06:45i think than others
0:06:46as you effect well we do seen in fact that variance more efficiently
0:06:53which is are sitting
0:06:54or promising improvements
0:06:58by using two types of features at one time
0:07:02we propose those three kinds of feature combination methods
0:07:07before introducing the feature combination methods
0:07:11are we introduce our baseline
0:07:13the end of base model or just
0:07:19of course you can use any kinds of c n based models
0:07:23and you a in our research
0:07:25we used
0:07:26s here is necessary for
0:07:28after the nn based model
0:07:32as it is necessary for
0:07:33is the fashion police now
0:07:35where
0:07:36s c blocks are integrity into each residual raw
0:07:41only calibrating
0:07:43channelwise responses
0:07:45and as it is necessary for was high rank in a space poop at nineteen
0:07:50challenge
0:07:55one combination mess so
0:07:57is
0:07:58two channel amp
0:08:00where
0:08:01two types of features
0:08:03home ceased well
0:08:04one improve
0:08:07another combination muscled he's embedding level combination
0:08:13the embedding
0:08:17corresponds to
0:08:17all these are still global average probably
0:08:23is met so that can be divided into three missiles
0:08:27the first pass of his
0:08:29concatenate to embedding
0:08:33to make up one emitting vector
0:08:36the second method used to compute a learned a lot of maximum hope to embedding
0:08:43the sort method used to compute element-wise averaging over to embedding
0:08:51you other combination method he's feature metalevel combination
0:08:56the feature and it corresponds to
0:08:58you operable c n
0:09:01if we're competing in billings
0:09:04we compute element-wise
0:09:06maximum or two feature ms
0:09:10and then compute emitting from the combined to feature
0:09:16next
0:09:17our describe the experiments and it results
0:09:23we used a usb throughput twenty nineteen
0:09:26what school
0:09:27and physical access scenario data bases
0:09:32it is widely used
0:09:35it conveys in the field of the swooping detection
0:09:40what's called access
0:09:43quarters the detection of speech synthesis and voice conversion
0:09:47it's got access
0:09:49cars the detection we play okay
0:09:55we used acoustic feature
0:09:58all
0:09:58two hundred fifty seven dimensional
0:10:01group
0:10:01you like
0:10:04fast in for c n
0:10:07for each utterance
0:10:10we extract
0:10:12two types of group delay k
0:10:16one is from the original utterance
0:10:19and the other is from
0:10:21the time flip utterance
0:10:25after the feature extraction we divided each
0:10:29variable length feature
0:10:31into fixed length
0:10:32segments
0:10:34to handle
0:10:35a doublings all utterances
0:10:38in our experiments we set the segment
0:10:41thanks to four hundred frames
0:10:47we use to the evaluation metrics
0:10:51one is
0:10:53eer
0:10:54and the arteries
0:10:55he dcf
0:10:59used paper shows the or policies
0:11:02on the ldc value
0:11:04we highlight the s performance important
0:11:08we mean that so
0:11:10sure that performance on evaluation trials
0:11:16and the f next method
0:11:18sure
0:11:19the best performance on development trials
0:11:23you are don't mess source
0:11:25generally showed offers or promises then
0:11:28baseline
0:11:33is table shows
0:11:34well
0:11:36or policies
0:11:38one the p eight trials
0:11:41the proposed method was sure the error or policies and the baseline
0:11:47except the eer or
0:11:50the two channel missiles one people not tried
0:11:55we mismatch sources
0:11:57sure the best performance on both development and evaluation types
0:12:05in the beginning
0:12:06we mention it
0:12:09magnitude spectrum and a spectrum contain different information
0:12:14so
0:12:16we also be rude
0:12:17the baseline systems that
0:12:19use
0:12:20manage to spectrum based feature
0:12:23in our research
0:12:26we used real power spectrum
0:12:29s the many to spectrum based feature
0:12:33ease baseline systems
0:12:35our fourth fusion be a systems
0:12:38that use
0:12:39phase spectrum based feature
0:12:43i fusion
0:12:45we can utilize information go
0:12:48well as many to and phase spectrum
0:12:52really score level fusion
0:12:58use table shows
0:12:59a performance is
0:13:01all the baseline system that
0:13:04use
0:13:04many to spectrum space it sure as input
0:13:11establish rules
0:13:12or policies of the fused system
0:13:16on the at any scenarios
0:13:20all the systems
0:13:22art showed error or policies that you for fusion
0:13:27the same trend can be shown
0:13:30in the results
0:13:32all though fused system
0:13:34when the pac now you
0:13:39finally conclusions
0:13:43but conventional method
0:13:44you see still phase spectrum
0:13:47problem only something along only
0:13:50in contrast the proposed method
0:13:53you see still based spectrum
0:13:55from the only small and the high flick signals together
0:14:02it has effect on reducing the impact that various
0:14:08and
0:14:09shows what was performance
0:14:13additionally
0:14:15we can achieve
0:14:16more better or policies
0:14:17i fusion with those systems that use
0:14:21many to the spectrum based
0:14:23feature
0:14:26and compare watching my presentation
0:14:29with by