and really everyone my name is strong and you from university was so
today
i will talk about
wait spectral time flipped speech signals for all those this whooping detection
forest
well let me introduce would be indication for automatic speaker verification
nist
which is sure
for
automatic speaker verification
as well reliability
a little level to a s p means
you is
or in the remote swooping on x
that's pool being a take is somewhat tend to this u k if this is
okay we soap opera six
this book authors ease you are close artificially produced for sounded like the target speakers
are press
so
the impostor speaker
who of pent
pooping okay
can be both that the yes the target speaker
there are some types of us will be not x
each can be actually detected
text to speech sympathies
with conversion
and
we like okay
food being detection is that okay put distinguish
right are given a cross he's
genuine authors
or soap operas
you identity claim
we spoke authors is exactly the
we gotta level
how similar those who are classic he's put target speakers utterance
therefore
whooping detection can protect it is system
okay
various to being a tax
work attacking spoofing attacks
we should capture the differences of the frequency response well
as shown in this figure
the frequency responses
between training utterance as food utterance
are different
for example
spoof utterances produced by likely okay
contain the attribute
of the device e
used for the league playoff k
such as quite a device
and the recording device
also to put the utterances
produced by speech synthesis and ways combos ms source
do not contain the proper dynamic information and the phase information of genuine utterances
many researchers
convolutional neural networks
have been used to capture every available for frequency responses
in spectrum based acoustic features
as a side note
color describe about the spectrum of each signal gleefully
the spectrum of speech signal
use
consistently well
two kinds of spectrum
one is magnitude spectrum
and the other this phase spectrum
men into spectrum pace the features have been widely used for sweeping kick action
there are some kinds of vanity the spectrum based features
such as low power spectrum
constant q cepstral coefficients
linear frequency cepstral coefficients
and so on
we are is
phase spectrum based the features in less used then
and into the spectrum based features
well
the phase spectrum based features
contain
useful information for swooping detection
there is not contained in many to spectrum
in our research
we focused on phase spectrum
especially
we used
group delay
as of phase spectrum based feature
the group delay d is defined
yes
these you creation
in this section also introduce our proposed in this so
forest are explainable
hi flicking for what people's vector
managed to that spectrum is not affected by the time order of the signal
so
the manager spectrum will the will of
original signal and pamphlet signal
are the same
however
of phase spectrum used changed
when the time order of the signal peacefully
it means that
you attributes although phase spectrum are changed
when the time or notable c or not he's fully
based on this fact
we also when the time or total the signal is related
you identities are not related to spoofing attacks
such as language information and
right information
are changed
in contrast
you identities
that are related to spoofing attacks
such as well i victimise information and the recording device information
are not changed
motivated by these of function
we proposed a mess sold
using
two types of phase spectrum based features to get
on to now
combination as will be in contention systems
have used of a spectrum based features
from the original signal only
in our research we use
not only eight of phase spectrum based feature from the original signal all also
of feature
from the pine flip signal
if a raw some holes
we can generate
new speech signals
have on seen in fact live conditions
by using the proposed method
and
use all both
i think than others
as you effect well we do seen in fact that variance more efficiently
which is are sitting
or promising improvements
by using two types of features at one time
we propose those three kinds of feature combination methods
before introducing the feature combination methods
are we introduce our baseline
the end of base model or just
of course you can use any kinds of c n based models
and you a in our research
we used
s here is necessary for
after the nn based model
as it is necessary for
is the fashion police now
where
s c blocks are integrity into each residual raw
only calibrating
channelwise responses
and as it is necessary for was high rank in a space poop at nineteen
challenge
one combination mess so
is
two channel amp
where
two types of features
home ceased well
one improve
another combination muscled he's embedding level combination
the embedding
corresponds to
all these are still global average probably
is met so that can be divided into three missiles
the first pass of his
concatenate to embedding
to make up one emitting vector
the second method used to compute a learned a lot of maximum hope to embedding
the sort method used to compute element-wise averaging over to embedding
you other combination method he's feature metalevel combination
the feature and it corresponds to
you operable c n
if we're competing in billings
we compute element-wise
maximum or two feature ms
and then compute emitting from the combined to feature
next
our describe the experiments and it results
we used a usb throughput twenty nineteen
what school
and physical access scenario data bases
it is widely used
it conveys in the field of the swooping detection
what's called access
quarters the detection of speech synthesis and voice conversion
it's got access
cars the detection we play okay
we used acoustic feature
all
two hundred fifty seven dimensional
group
you like
fast in for c n
for each utterance
we extract
two types of group delay k
one is from the original utterance
and the other is from
the time flip utterance
after the feature extraction we divided each
variable length feature
into fixed length
segments
to handle
a doublings all utterances
in our experiments we set the segment
thanks to four hundred frames
we use to the evaluation metrics
one is
eer
and the arteries
he dcf
used paper shows the or policies
on the ldc value
we highlight the s performance important
we mean that so
sure that performance on evaluation trials
and the f next method
sure
the best performance on development trials
you are don't mess source
generally showed offers or promises then
baseline
is table shows
well
or policies
one the p eight trials
the proposed method was sure the error or policies and the baseline
except the eer or
the two channel missiles one people not tried
we mismatch sources
sure the best performance on both development and evaluation types
in the beginning
we mention it
magnitude spectrum and a spectrum contain different information
so
we also be rude
the baseline systems that
use
manage to spectrum based feature
in our research
we used real power spectrum
s the many to spectrum based feature
ease baseline systems
our fourth fusion be a systems
that use
phase spectrum based feature
i fusion
we can utilize information go
well as many to and phase spectrum
really score level fusion
use table shows
a performance is
all the baseline system that
use
many to spectrum space it sure as input
establish rules
or policies of the fused system
on the at any scenarios
all the systems
art showed error or policies that you for fusion
the same trend can be shown
in the results
all though fused system
when the pac now you
finally conclusions
but conventional method
you see still phase spectrum
problem only something along only
in contrast the proposed method
you see still based spectrum
from the only small and the high flick signals together
it has effect on reducing the impact that various
and
shows what was performance
additionally
we can achieve
more better or policies
i fusion with those systems that use
many to the spectrum based
feature
and compare watching my presentation
with by