0:00:13 | i mean system we don't and i don't we are guide for the next twenty |
---|
0:00:17 | minutes if you have questions please press the power button and whatever you won't |
---|
0:00:23 | meanwhile lists internet and actual three |
---|
0:00:38 | okay |
---|
0:00:39 | this one bound together with the shuffle file now so |
---|
0:00:43 | we work on effect of the waveform we may have on this point detection in |
---|
0:00:48 | this time it's for clean data or physical condition |
---|
0:00:52 | it is a continuation of the were deemed on the same |
---|
0:00:56 | challenge |
---|
0:00:58 | for the most common conditions |
---|
0:01:01 | we define the problem you the motivation why to use the waveform |
---|
0:01:06 | we show will show several examples |
---|
0:01:09 | this way for be and have |
---|
0:01:11 | and |
---|
0:01:13 | will describe you know musician process |
---|
0:01:16 | which changes in may have all their plane data |
---|
0:01:20 | and show how to fix |
---|
0:01:22 | on the |
---|
0:01:23 | i is moving recognition and other effects |
---|
0:01:29 | the examples we show the results of the evaluation and then the big or |
---|
0:01:37 | so |
---|
0:01:39 | we can |
---|
0:01:39 | five problem then the three |
---|
0:01:42 | one two |
---|
0:01:44 | classify speech segment rather means gene speech |
---|
0:01:48 | or one speech |
---|
0:01:50 | one generally small speech can be synthesized over a door may |
---|
0:01:55 | or any other way but this work will focus on the data |
---|
0:02:04 | the motivation for this work is due to the thing that a lot of more |
---|
0:02:08 | than on i spoofing in the frequency domain |
---|
0:02:12 | maybe features were applied like mfcc uses c and the c |
---|
0:02:19 | and more |
---|
0:02:21 | but not much down with time domain |
---|
0:02:25 | and we want to learn what happens |
---|
0:02:28 | with the time domain statistics of the wave form |
---|
0:02:32 | and see how |
---|
0:02:36 | we can find changes between the union speech and |
---|
0:02:42 | shall let's take an example |
---|
0:02:45 | of a speech segment |
---|
0:02:48 | and see what |
---|
0:02:50 | if we look at the waveform and able to model |
---|
0:02:54 | we see john speech segment |
---|
0:02:57 | and then |
---|
0:02:58 | we want to find the probability mass function |
---|
0:03:02 | of the art students |
---|
0:03:04 | this statement will of sample queries |
---|
0:03:07 | sixteen b |
---|
0:03:09 | we also a person |
---|
0:03:11 | so we have our sixteen uniform distribution to be between minus one and one |
---|
0:03:21 | we show here only those two |
---|
0:03:24 | in the |
---|
0:03:25 | no range between mind zero one three and zero one three |
---|
0:03:31 | it can be seen that the |
---|
0:03:33 | i do |
---|
0:03:34 | and system will do you |
---|
0:03:35 | very similar to the last distribution normal distribution |
---|
0:03:40 | and its well known in the literature at least the speech |
---|
0:03:46 | no |
---|
0:03:47 | let's the they the samples for the evaluation of is reasonable |
---|
0:03:54 | twenty nineteen physical condition |
---|
0:03:58 | we |
---|
0:04:00 | evaluated the be an f |
---|
0:04:02 | all the genes speech brought about |
---|
0:04:05 | and this was speech the raw |
---|
0:04:09 | below |
---|
0:04:10 | and we see that there is the |
---|
0:04:12 | big difference between them |
---|
0:04:14 | especially |
---|
0:04:17 | around zero |
---|
0:04:19 | so |
---|
0:04:21 | it can put on the |
---|
0:04:23 | maybe easy even |
---|
0:04:25 | by human only by looking at the b m f |
---|
0:04:30 | to distinguish between |
---|
0:04:32 | these two |
---|
0:04:34 | classes union and |
---|
0:04:36 | replay data |
---|
0:04:38 | so if you want to make a group of feeding |
---|
0:04:41 | of course not too if so using to distinguish between them |
---|
0:04:47 | and we would like to have a similar distributions for all class |
---|
0:04:52 | so this process we then |
---|
0:04:54 | is a generalization |
---|
0:04:59 | will style shows from continues random variable |
---|
0:05:02 | and then goal is for example of a temporal |
---|
0:05:07 | to show how we |
---|
0:05:09 | d is |
---|
0:05:11 | our one dies samples |
---|
0:05:13 | so soon we have |
---|
0:05:16 | source in the f |
---|
0:05:18 | and |
---|
0:05:19 | we want to make transformation that it will have |
---|
0:05:24 | the |
---|
0:05:26 | pdf of the destination |
---|
0:05:28 | maybe f |
---|
0:05:30 | so we have |
---|
0:05:31 | two probability distribution function |
---|
0:05:33 | all the sort of |
---|
0:05:35 | and all the destination |
---|
0:05:37 | in our case the stores it is well speech while the destination is the engine |
---|
0:05:43 | speech is we want to convert the |
---|
0:05:47 | spoof |
---|
0:05:48 | same and to have the same statistics as the gmm speech |
---|
0:05:54 | so first for every sample |
---|
0:05:58 | from the possible speech |
---|
0:06:01 | we wanna we will find v |
---|
0:06:05 | value of the |
---|
0:06:07 | c d f |
---|
0:06:09 | then we will go in the general speech and you have |
---|
0:06:14 | where am will be the same value |
---|
0:06:17 | all the c d f |
---|
0:06:19 | and the range |
---|
0:06:21 | vector you're on the |
---|
0:06:24 | several i will be |
---|
0:06:27 | so |
---|
0:06:28 | i have to zero |
---|
0:06:30 | for this one speech will have no new value of better zero |
---|
0:06:37 | s in simple |
---|
0:06:38 | and these procedure we can do sample by sample for all the samples in this |
---|
0:06:44 | world speech |
---|
0:06:47 | of course in our case the distributions are no you know but |
---|
0:06:52 | discrete |
---|
0:06:54 | and the algorithm the legion be more again |
---|
0:06:59 | in discrete case |
---|
0:07:01 | the line is not movement email but have this continues |
---|
0:07:06 | and |
---|
0:07:07 | it looks like steps |
---|
0:07:09 | so for each time a from the small speech |
---|
0:07:13 | we see why use the |
---|
0:07:16 | a c m f relative mass function |
---|
0:07:20 | and now we will move and engine each have |
---|
0:07:24 | and it's not exactly this that's the values and the same place |
---|
0:07:29 | so we decided to take the lower bound |
---|
0:07:32 | in this case |
---|
0:07:34 | instead of this statement for four we have |
---|
0:07:39 | still you equal for the new value but it's not true for every |
---|
0:07:46 | so that it can change from sample stuff |
---|
0:07:50 | and of course we do it |
---|
0:07:52 | for all the samples here of the exact boundaries |
---|
0:07:56 | three increase in our case yes sixteen weeks |
---|
0:08:02 | so for my own |
---|
0:08:04 | the logical conditions |
---|
0:08:07 | and we see the results |
---|
0:08:10 | the graph about |
---|
0:08:11 | is the graph of the |
---|
0:08:13 | suppose speech |
---|
0:08:15 | while in the middle it's a graph of this of speech |
---|
0:08:19 | a little aging decision process |
---|
0:08:22 | and below use the |
---|
0:08:24 | be a ubm have all the original speech |
---|
0:08:27 | we can see that the algorithm works well |
---|
0:08:29 | and the |
---|
0:08:31 | generalize speech read |
---|
0:08:33 | is similar to gmm speech |
---|
0:08:37 | however when we try to apply the same algorithm |
---|
0:08:41 | for physical conditions |
---|
0:08:44 | we have a phenomena |
---|
0:08:46 | that |
---|
0:08:49 | in the engineering guys speech in the middle |
---|
0:08:52 | we have like in a bunch around zero |
---|
0:08:58 | jehovah sees the y-axis of the ml |
---|
0:09:02 | for speech |
---|
0:09:03 | the maximum zero one while other grass the maximum zero one four ensures |
---|
0:09:10 | vol in to make it better visible but we see that |
---|
0:09:16 | then generalize speech is far away for jane speech |
---|
0:09:22 | this phenomena was french and we wanted to |
---|
0:09:26 | understand what happened |
---|
0:09:29 | so we can see and in the these video |
---|
0:09:34 | around zero this speech |
---|
0:09:37 | we have a very big |
---|
0:09:39 | john responding |
---|
0:09:41 | which are several |
---|
0:09:44 | levels |
---|
0:09:46 | of a window of |
---|
0:09:48 | the may have been gmm speech |
---|
0:09:52 | so in when we |
---|
0:09:54 | convert |
---|
0:09:56 | this both speech would you know speech writing iteration process |
---|
0:10:02 | all three levels in this example |
---|
0:10:06 | of four and five |
---|
0:10:08 | are you and get an o b |
---|
0:10:11 | in the engine you guys five |
---|
0:10:14 | so to overcome these |
---|
0:10:17 | problem |
---|
0:10:18 | we can certainly db or duration of each |
---|
0:10:22 | so i performance of speech |
---|
0:10:24 | we had it is for small noise |
---|
0:10:28 | and such way |
---|
0:10:30 | we have more steps |
---|
0:10:32 | more available from invisible speech in these investment |
---|
0:10:36 | we had indeed |
---|
0:10:37 | three beats |
---|
0:10:39 | of uniform loans |
---|
0:10:42 | so we have |
---|
0:10:45 | eight times more |
---|
0:10:48 | dis-continuous level |
---|
0:10:49 | and that josh a lot more in this way now we can reach |
---|
0:10:55 | and level |
---|
0:10:56 | in the gmm speech |
---|
0:10:59 | in our case |
---|
0:11:00 | in real experiment |
---|
0:11:03 | to sixteen be additional noise of five b |
---|
0:11:07 | it means |
---|
0:11:09 | each level |
---|
0:11:11 | now have sort into |
---|
0:11:13 | levels of floors of |
---|
0:11:16 | when we apply these algorithm |
---|
0:11:18 | we can see the results |
---|
0:11:21 | the p m f or generalize speech is very similar religion speech |
---|
0:11:27 | so we or are the problem of the four previously |
---|
0:11:32 | of course we tried we also be the logical conditions |
---|
0:11:37 | and the results were who is pretty with |
---|
0:11:41 | so it doesn't diminish the previous results of logical conditions |
---|
0:11:47 | but i improved dramatically the results |
---|
0:11:50 | all of the generalization process with physical condition |
---|
0:11:56 | now we want to see what happens with and spoofing system |
---|
0:12:03 | well we use the generalization process |
---|
0:12:08 | so |
---|
0:12:09 | we to the baseline system that will provide by the organisers |
---|
0:12:14 | in one |
---|
0:12:16 | two classes for gmm speech and four |
---|
0:12:20 | speech in each class is a gmm with five hundred twelve gaussian mixtures |
---|
0:12:28 | there are two models well i four think uses in features and graph for eliciting |
---|
0:12:34 | features |
---|
0:12:35 | the baseline results are shown |
---|
0:12:38 | it didn't column of the baseline |
---|
0:12:42 | the next goal |
---|
0:12:43 | we used a miss the |
---|
0:12:47 | original gmm models but now try |
---|
0:12:50 | tools |
---|
0:12:52 | the one of the that a generalization |
---|
0:12:56 | so righteously the results |
---|
0:12:58 | all the models problem |
---|
0:13:01 | in the next step |
---|
0:13:02 | this data okay we will stay with real data before generalization |
---|
0:13:08 | by the gmm and |
---|
0:13:10 | of this model we are currently |
---|
0:13:14 | generalized |
---|
0:13:16 | data |
---|
0:13:17 | and we see that |
---|
0:13:18 | the generalization probability is very poor results |
---|
0:13:23 | are very big |
---|
0:13:25 | when we train |
---|
0:13:27 | and then we generalize speech |
---|
0:13:29 | the results are very on |
---|
0:13:35 | we can say okay |
---|
0:13:37 | we trained with one data and that the same data |
---|
0:13:42 | logical of the results are |
---|
0:13:46 | but i think a lot of |
---|
0:13:49 | and |
---|
0:13:50 | the control manager |
---|
0:13:52 | is to |
---|
0:13:53 | be able to recognize no admittance of a one thing because all the time you |
---|
0:14:00 | matters timing algorithms |
---|
0:14:03 | and |
---|
0:14:04 | if |
---|
0:14:05 | the system what well |
---|
0:14:07 | vulnerable to the |
---|
0:14:10 | new algorithms |
---|
0:14:14 | and it's not robust it's not little because we never and always will be the |
---|
0:14:19 | actual algorithm |
---|
0:14:23 | so |
---|
0:14:24 | to summarize |
---|
0:14:25 | well maybe |
---|
0:14:27 | we show that there is a big difference between the |
---|
0:14:32 | waveform distributions of the |
---|
0:14:37 | to really do you know speech |
---|
0:14:39 | and the |
---|
0:14:40 | speech |
---|
0:14:41 | a the doors |
---|
0:14:43 | a replay |
---|
0:14:45 | and effective way |
---|
0:14:48 | be easy to recognise in the time-domain the |
---|
0:14:54 | as both speech |
---|
0:14:56 | so |
---|
0:14:57 | firstly try present unionisation process how we can convert of the |
---|
0:15:05 | speech would be statistically more similar to human speech |
---|
0:15:11 | and we show love it |
---|
0:15:12 | it's better to a star |
---|
0:15:14 | noise |
---|
0:15:16 | to sample |
---|
0:15:18 | so means of noise and |
---|
0:15:20 | and better |
---|
0:15:22 | and unionisation |
---|
0:15:25 | then we tried this the control measure and we so that the results can vary |
---|
0:15:32 | dramatically |
---|
0:15:33 | with a friend use one data and try |
---|
0:15:37 | is that a or of spoofing |
---|
0:15:41 | in the form of understand the extendible |
---|
0:15:45 | for a moving system |
---|
0:15:48 | to behave like these |
---|
0:15:50 | because it |
---|
0:15:51 | must have very good generalization for be and |
---|
0:15:55 | neither one will the |
---|
0:15:57 | by national will have to be done |
---|
0:16:00 | this direction to |
---|
0:16:02 | may |
---|
0:16:04 | seized and much more we will i |
---|
0:16:14 | thank you very much and if you enjoy at all |
---|
0:16:17 | you can press play and listen to be again and again |
---|
0:16:23 | stay healthy by |
---|