i mean system we don't and i don't we are guide for the next twenty

minutes if you have questions please press the power button and whatever you won't

meanwhile lists internet and actual three

okay

this one bound together with the shuffle file now so

we work on effect of the waveform we may have on this point detection in

this time it's for clean data or physical condition

it is a continuation of the were deemed on the same

challenge

for the most common conditions

we define the problem you the motivation why to use the waveform

we show will show several examples

this way for be and have

and

will describe you know musician process

which changes in may have all their plane data

and show how to fix

on the

i is moving recognition and other effects

the examples we show the results of the evaluation and then the big or

so

we can

five problem then the three

one two

classify speech segment rather means gene speech

or one speech

one generally small speech can be synthesized over a door may

or any other way but this work will focus on the data

the motivation for this work is due to the thing that a lot of more

than on i spoofing in the frequency domain

maybe features were applied like mfcc uses c and the c

and more

but not much down with time domain

and we want to learn what happens

with the time domain statistics of the wave form

and see how

we can find changes between the union speech and

shall let's take an example

of a speech segment

and see what

if we look at the waveform and able to model

we see john speech segment

and then

we want to find the probability mass function

of the art students

this statement will of sample queries

sixteen b

we also a person

so we have our sixteen uniform distribution to be between minus one and one

we show here only those two

in the

no range between mind zero one three and zero one three

it can be seen that the

i do

and system will do you

very similar to the last distribution normal distribution

and its well known in the literature at least the speech

no

let's the they the samples for the evaluation of is reasonable

twenty nineteen physical condition

we

evaluated the be an f

all the genes speech brought about

and this was speech the raw

below

and we see that there is the

big difference between them

especially

around zero

so

it can put on the

maybe easy even

by human only by looking at the b m f

to distinguish between

these two

classes union and

replay data

so if you want to make a group of feeding

of course not too if so using to distinguish between them

and we would like to have a similar distributions for all class

so this process we then

is a generalization

will style shows from continues random variable

and then goal is for example of a temporal

to show how we

d is

our one dies samples

so soon we have

source in the f

and

we want to make transformation that it will have

the

pdf of the destination

maybe f

so we have

two probability distribution function

all the sort of

and all the destination

in our case the stores it is well speech while the destination is the engine

speech is we want to convert the

spoof

same and to have the same statistics as the gmm speech

so first for every sample

from the possible speech

we wanna we will find v

value of the

c d f

then we will go in the general speech and you have

where am will be the same value

all the c d f

and the range

vector you're on the

several i will be

so

i have to zero

for this one speech will have no new value of better zero

s in simple

and these procedure we can do sample by sample for all the samples in this

world speech

of course in our case the distributions are no you know but

discrete

and the algorithm the legion be more again

in discrete case

the line is not movement email but have this continues

and

it looks like steps

so for each time a from the small speech

we see why use the

a c m f relative mass function

and now we will move and engine each have

and it's not exactly this that's the values and the same place

so we decided to take the lower bound

in this case

instead of this statement for four we have

still you equal for the new value but it's not true for every

so that it can change from sample stuff

and of course we do it

for all the samples here of the exact boundaries

three increase in our case yes sixteen weeks

so for my own

the logical conditions

and we see the results

the graph about

is the graph of the

suppose speech

while in the middle it's a graph of this of speech

a little aging decision process

and below use the

be a ubm have all the original speech

we can see that the algorithm works well

and the

generalize speech read

is similar to gmm speech

however when we try to apply the same algorithm

for physical conditions

we have a phenomena

that

in the engineering guys speech in the middle

we have like in a bunch around zero

jehovah sees the y-axis of the ml

for speech

the maximum zero one while other grass the maximum zero one four ensures

vol in to make it better visible but we see that

then generalize speech is far away for jane speech

this phenomena was french and we wanted to

understand what happened

so we can see and in the these video

around zero this speech

we have a very big

john responding

which are several

levels

of a window of

the may have been gmm speech

so in when we

convert

this both speech would you know speech writing iteration process

all three levels in this example

of four and five

are you and get an o b

in the engine you guys five

so to overcome these

problem

we can certainly db or duration of each

so i performance of speech

we had it is for small noise

and such way

we have more steps

more available from invisible speech in these investment

we had indeed

three beats

of uniform loans

so we have

eight times more

dis-continuous level

and that josh a lot more in this way now we can reach

and level

in the gmm speech

in our case

in real experiment

to sixteen be additional noise of five b

it means

each level

now have sort into

levels of floors of

when we apply these algorithm

we can see the results

the p m f or generalize speech is very similar religion speech

so we or are the problem of the four previously

of course we tried we also be the logical conditions

and the results were who is pretty with

so it doesn't diminish the previous results of logical conditions

but i improved dramatically the results

all of the generalization process with physical condition

now we want to see what happens with and spoofing system

well we use the generalization process

so

we to the baseline system that will provide by the organisers

in one

two classes for gmm speech and four

speech in each class is a gmm with five hundred twelve gaussian mixtures

there are two models well i four think uses in features and graph for eliciting

features

the baseline results are shown

it didn't column of the baseline

the next goal

we used a miss the

original gmm models but now try

tools

the one of the that a generalization

so righteously the results

all the models problem

in the next step

this data okay we will stay with real data before generalization

by the gmm and

of this model we are currently

generalized

data

and we see that

the generalization probability is very poor results

are very big

when we train

and then we generalize speech

the results are very on

we can say okay

we trained with one data and that the same data

logical of the results are

but i think a lot of

and

the control manager

is to

be able to recognize no admittance of a one thing because all the time you

matters timing algorithms

and

if

the system what well

vulnerable to the

new algorithms

and it's not robust it's not little because we never and always will be the

actual algorithm

so

to summarize

well maybe

we show that there is a big difference between the

waveform distributions of the

to really do you know speech

and the

speech

a the doors

a replay

and effective way

be easy to recognise in the time-domain the

as both speech

so

firstly try present unionisation process how we can convert of the

speech would be statistically more similar to human speech

and we show love it

it's better to a star

noise

to sample

so means of noise and

and better

and unionisation

then we tried this the control measure and we so that the results can vary

dramatically

with a friend use one data and try

is that a or of spoofing

in the form of understand the extendible

for a moving system

to behave like these

because it

must have very good generalization for be and

neither one will the

by national will have to be done

this direction to

may

seized and much more we will i

thank you very much and if you enjoy at all

you can press play and listen to be again and again

stay healthy by