and really everyone my name is strong and you from university was so

today

i will talk about

wait spectral time flipped speech signals for all those this whooping detection

forest

well let me introduce would be indication for automatic speaker verification

nist

which is sure

for

automatic speaker verification

as well reliability

a little level to a s p means

you is

or in the remote swooping on x

that's pool being a take is somewhat tend to this u k if this is

okay we soap opera six

this book authors ease you are close artificially produced for sounded like the target speakers

are press

so

the impostor speaker

who of pent

pooping okay

can be both that the yes the target speaker

there are some types of us will be not x

each can be actually detected

text to speech sympathies

with conversion

and

we like okay

food being detection is that okay put distinguish

right are given a cross he's

genuine authors

or soap operas

you identity claim

we spoke authors is exactly the

we gotta level

how similar those who are classic he's put target speakers utterance

therefore

whooping detection can protect it is system

okay

various to being a tax

work attacking spoofing attacks

we should capture the differences of the frequency response well

as shown in this figure

the frequency responses

between training utterance as food utterance

are different

for example

spoof utterances produced by likely okay

contain the attribute

of the device e

used for the league playoff k

such as quite a device

and the recording device

also to put the utterances

produced by speech synthesis and ways combos ms source

do not contain the proper dynamic information and the phase information of genuine utterances

many researchers

convolutional neural networks

have been used to capture every available for frequency responses

in spectrum based acoustic features

as a side note

color describe about the spectrum of each signal gleefully

the spectrum of speech signal

use

consistently well

two kinds of spectrum

one is magnitude spectrum

and the other this phase spectrum

men into spectrum pace the features have been widely used for sweeping kick action

there are some kinds of vanity the spectrum based features

such as low power spectrum

constant q cepstral coefficients

linear frequency cepstral coefficients

and so on

we are is

phase spectrum based the features in less used then

and into the spectrum based features

well

the phase spectrum based features

contain

useful information for swooping detection

there is not contained in many to spectrum

in our research

we focused on phase spectrum

especially

we used

group delay

as of phase spectrum based feature

the group delay d is defined

yes

these you creation

in this section also introduce our proposed in this so

forest are explainable

hi flicking for what people's vector

managed to that spectrum is not affected by the time order of the signal

so

the manager spectrum will the will of

original signal and pamphlet signal

are the same

however

of phase spectrum used changed

when the time order of the signal peacefully

it means that

you attributes although phase spectrum are changed

when the time or notable c or not he's fully

based on this fact

we also when the time or total the signal is related

you identities are not related to spoofing attacks

such as language information and

right information

are changed

in contrast

you identities

that are related to spoofing attacks

such as well i victimise information and the recording device information

are not changed

motivated by these of function

we proposed a mess sold

using

two types of phase spectrum based features to get

on to now

combination as will be in contention systems

have used of a spectrum based features

from the original signal only

in our research we use

not only eight of phase spectrum based feature from the original signal all also

of feature

from the pine flip signal

if a raw some holes

we can generate

new speech signals

have on seen in fact live conditions

by using the proposed method

and

use all both

i think than others

as you effect well we do seen in fact that variance more efficiently

which is are sitting

or promising improvements

by using two types of features at one time

we propose those three kinds of feature combination methods

before introducing the feature combination methods

are we introduce our baseline

the end of base model or just

of course you can use any kinds of c n based models

and you a in our research

we used

s here is necessary for

after the nn based model

as it is necessary for

is the fashion police now

where

s c blocks are integrity into each residual raw

only calibrating

channelwise responses

and as it is necessary for was high rank in a space poop at nineteen

challenge

one combination mess so

is

two channel amp

where

two types of features

home ceased well

one improve

another combination muscled he's embedding level combination

the embedding

corresponds to

all these are still global average probably

is met so that can be divided into three missiles

the first pass of his

concatenate to embedding

to make up one emitting vector

the second method used to compute a learned a lot of maximum hope to embedding

the sort method used to compute element-wise averaging over to embedding

you other combination method he's feature metalevel combination

the feature and it corresponds to

you operable c n

if we're competing in billings

we compute element-wise

maximum or two feature ms

and then compute emitting from the combined to feature

next

our describe the experiments and it results

we used a usb throughput twenty nineteen

what school

and physical access scenario data bases

it is widely used

it conveys in the field of the swooping detection

what's called access

quarters the detection of speech synthesis and voice conversion

it's got access

cars the detection we play okay

we used acoustic feature

all

two hundred fifty seven dimensional

group

you like

fast in for c n

for each utterance

we extract

two types of group delay k

one is from the original utterance

and the other is from

the time flip utterance

after the feature extraction we divided each

variable length feature

into fixed length

segments

to handle

a doublings all utterances

in our experiments we set the segment

thanks to four hundred frames

we use to the evaluation metrics

one is

eer

and the arteries

he dcf

used paper shows the or policies

on the ldc value

we highlight the s performance important

we mean that so

sure that performance on evaluation trials

and the f next method

sure

the best performance on development trials

you are don't mess source

generally showed offers or promises then

baseline

is table shows

well

or policies

one the p eight trials

the proposed method was sure the error or policies and the baseline

except the eer or

the two channel missiles one people not tried

we mismatch sources

sure the best performance on both development and evaluation types

in the beginning

we mention it

magnitude spectrum and a spectrum contain different information

so

we also be rude

the baseline systems that

use

manage to spectrum based feature

in our research

we used real power spectrum

s the many to spectrum based feature

ease baseline systems

our fourth fusion be a systems

that use

phase spectrum based feature

i fusion

we can utilize information go

well as many to and phase spectrum

really score level fusion

use table shows

a performance is

all the baseline system that

use

many to spectrum space it sure as input

establish rules

or policies of the fused system

on the at any scenarios

all the systems

art showed error or policies that you for fusion

the same trend can be shown

in the results

all though fused system

when the pac now you

finally conclusions

but conventional method

you see still phase spectrum

problem only something along only

in contrast the proposed method

you see still based spectrum

from the only small and the high flick signals together

it has effect on reducing the impact that various

and

shows what was performance

additionally

we can achieve

more better or policies

i fusion with those systems that use

many to the spectrum based

feature

and compare watching my presentation

with by