0:00:14 | and only one |
---|
0:00:15 | i am whether a student formula can |
---|
0:00:18 | one and one causing a lot of a banana split in previous to deal was |
---|
0:00:22 | interested recognition is performed with a probabilistic verification |
---|
0:00:28 | so yes ones everybody's causing you was having we present the motivation |
---|
0:00:33 | there have been introduced in this is hypothesis |
---|
0:00:36 | after that i will talk about a constant residual coefficient |
---|
0:00:39 | then i describe what is one percent |
---|
0:00:43 | and the discipline analysis of the i'm about this program or resolution and baseline completely |
---|
0:00:52 | and i don't think will addition model and i'm in every and example |
---|
0:01:00 | so next time maybe more relation between the thing dysphonia to be found okay |
---|
0:01:05 | there is a really useful was not i'll find out what i also be included |
---|
0:01:09 | by different from confirmation it means |
---|
0:01:12 | what i had to use |
---|
0:01:15 | for the one of my speech |
---|
0:01:17 | so we don't |
---|
0:01:19 | and is the more |
---|
0:01:21 | i per cent signal or this might help us to discriminate |
---|
0:01:26 | the more not quite useful for speech |
---|
0:01:28 | and those kind of understanding |
---|
0:01:31 | i was to design a better and more reliable numbers of detection |
---|
0:01:37 | we get it was taken motivation and he went down with |
---|
0:01:41 | a visual within need to different |
---|
0:01:43 | a front end performance on is rather than continuing this so it can see everything |
---|
0:01:49 | considered |
---|
0:01:51 | mean speaker-specinc which is especially more effective in detecting as an |
---|
0:01:55 | when we listen to it in a less effective in detecting the one okay just |
---|
0:02:00 | is eight |
---|
0:02:01 | you money or |
---|
0:02:04 | sequences as well collect is equal error |
---|
0:02:07 | there are very seriously |
---|
0:02:10 | so |
---|
0:02:11 | similar kind of observations regarding both my differences people associated with it is in is |
---|
0:02:17 | greater than i can challenge |
---|
0:02:19 | an external data sequences in front end of all right |
---|
0:02:23 | and the case |
---|
0:02:25 | for the finals |
---|
0:02:27 | so can be okay |
---|
0:02:29 | why sequence is different from the positive and six |
---|
0:02:35 | no less |
---|
0:02:36 | but whatever this is how this is so |
---|
0:02:38 | is we know is finally this can utilise in the spectrum for example in high |
---|
0:02:43 | hiding behind |
---|
0:02:45 | the mailman or indian or whatever |
---|
0:02:47 | so the use of investments analysis that would be information across different manner |
---|
0:02:53 | and i will be localised information |
---|
0:02:56 | and there is no degraded performance |
---|
0:02:59 | so we can see |
---|
0:03:01 | more reliable detection with the features that precise information is available |
---|
0:03:09 | no discuss the differences they have anything can be so as you know in the |
---|
0:03:15 | mean and they were available in the early nativized the and then window is for |
---|
0:03:20 | this gaussian means |
---|
0:03:21 | they're exactly once |
---|
0:03:24 | and temporal resolution |
---|
0:03:26 | and again singing |
---|
0:03:27 | the inference is quite well |
---|
0:03:30 | in contrast a security which we develop a spectral and temporal resolution |
---|
0:03:37 | so i think in seen once and you press continues better really know what that |
---|
0:03:43 | was then |
---|
0:03:44 | the |
---|
0:03:45 | high resolution in the lower frequency and the higher than within the temporal resolution with |
---|
0:03:54 | the |
---|
0:03:54 | that means |
---|
0:03:56 | the synchrony with late fusion one solution more realistic than fifty |
---|
0:04:04 | no |
---|
0:04:05 | in this i in this line we will it's pretty |
---|
0:04:09 | considering use the solution using the cost of possible |
---|
0:04:12 | so |
---|
0:04:13 | given a within the by imposing is as a feature extraction the we live the |
---|
0:04:19 | constraint on the human audio file |
---|
0:04:22 | to illustrate the audio file is basically is you in one you for the spectral |
---|
0:04:28 | resolution of this paper |
---|
0:04:30 | so do not need to adapt and in the frequency domain |
---|
0:04:34 | we live in form with something clusters or endorsement power spectral density |
---|
0:04:39 | which can be no performance is good of you can be |
---|
0:04:43 | to it you know what is good |
---|
0:04:46 | like giving infinitely information across the voice vector |
---|
0:04:50 | and finally we will explain the cepstral recursion |
---|
0:04:54 | we apply the discrete cosine non-uniform sampling |
---|
0:04:58 | this is what is it was to use cepstral coefficient feature |
---|
0:05:06 | no i don't want it is mainly focus on those of police is the result |
---|
0:05:10 | is visible nineteen change |
---|
0:05:12 | and we use the standard problem or |
---|
0:05:15 | for a policeman he was relatively is applied mimicry really implement |
---|
0:05:22 | the difference of automation |
---|
0:05:25 | and |
---|
0:05:25 | in the following experiment |
---|
0:05:27 | we used |
---|
0:05:27 | standard is reasonable to the nineteen baseline system |
---|
0:05:31 | so this is a gmm based system and b l is a gmm based system |
---|
0:05:36 | so |
---|
0:05:37 | for one point is exactly the database and the baseline system description you can therefore |
---|
0:05:42 | before and references |
---|
0:05:47 | no there is no knowledge of the baseline results on is feasible doesn't i database |
---|
0:05:54 | is the most substantial variation in the performance of is baseline system |
---|
0:06:00 | yes we can see in the human eye as an additional the for so no |
---|
0:06:05 | they for example is the same thing is sixteen and eighty nine |
---|
0:06:09 | where |
---|
0:06:10 | this is a gmm based system |
---|
0:06:12 | give them better performance and bubble |
---|
0:06:15 | where there is a gmm based system |
---|
0:06:18 | where s |
---|
0:06:19 | for either incorrectly for in estimating the l s is a gmm based system used |
---|
0:06:25 | to better performance |
---|
0:06:27 | so while it is in difference in performance |
---|
0:06:30 | using more differently |
---|
0:06:32 | because |
---|
0:06:33 | the difference in this paper or solution |
---|
0:06:37 | insecurity which might suggest that be i think that it is to use this one |
---|
0:06:43 | hundred |
---|
0:06:44 | my representing the specifics right and then people |
---|
0:06:49 | so that a nine |
---|
0:06:50 | where the difference it is something you would basically the difference in the performance using |
---|
0:06:55 | c and the mfcc representation |
---|
0:06:59 | so we use so that analysis |
---|
0:07:03 | so in this little someone analysis we propose in will be emailing representation then nutritional |
---|
0:07:10 | i five tokenizer present in the spectral this domain representation you realise |
---|
0:07:17 | e |
---|
0:07:17 | what i think it'd implement the information they represent different are scored |
---|
0:07:23 | so in this time |
---|
0:07:24 | the thing i don't be many presentation of a specific is something i |
---|
0:07:29 | you |
---|
0:07:30 | okay you didn't seem to me |
---|
0:07:33 | genetics within got frequency mean and the lexus there was because can see it makes |
---|
0:07:41 | and in the and the leftmost autonomy human that it was in the market is |
---|
0:07:46 | a localising the low frequency of this is what was in there was a localiser |
---|
0:07:53 | five was spectrum and |
---|
0:07:55 | that i in the weighted within the i-vectors are presented for the signal |
---|
0:08:00 | and in my eyes in the email that imposing that are compared to a single |
---|
0:08:05 | band-pass filter |
---|
0:08:07 | so |
---|
0:08:08 | for some time analysis |
---|
0:08:09 | the remaining where can you denies gaussian with the ones |
---|
0:08:15 | i by integrating |
---|
0:08:18 | existing using the specific content of imposing |
---|
0:08:22 | so |
---|
0:08:24 | definitely representation signifying the performance of a different is performed combination in the damsel lately |
---|
0:08:35 | no |
---|
0:08:37 | something so i in this line mean within a do you might representation all |
---|
0:08:43 | of six different specific is performed okay well i roll single within the represent the |
---|
0:08:51 | representational |
---|
0:08:52 | and using the secrecy sequences in gmm based system |
---|
0:08:57 | leaving i think is a c and d processing be a more general representing |
---|
0:09:03 | he may representation using the ellipses in gmms listed in gmm based system |
---|
0:09:09 | so you can see that |
---|
0:09:11 | you for specifically for example is the same thing is sixteen in may nineteen |
---|
0:09:17 | where |
---|
0:09:18 | this is in gmm based system |
---|
0:09:19 | the estimated performance and then |
---|
0:09:22 | well i yes for identity in country and fourteen nist nineteen |
---|
0:09:27 | when extending this is not use the better performance and on the gmm based |
---|
0:09:34 | so probably you when he was addition we can see that for those or three |
---|
0:09:39 | is the main sixty and may nineteen |
---|
0:09:42 | i think that legalising d i i think of this better where is the thing |
---|
0:09:49 | "'cause" it is important to the data |
---|
0:09:52 | but details are really and you the better performance |
---|
0:09:55 | where is it is still |
---|
0:09:57 | it could mean and importantly where i think so localising the |
---|
0:10:02 | i don't with the presidential elections seems to have that the and you the better |
---|
0:10:08 | performance |
---|
0:10:10 | no i guess of is defined in a day where is the performance work because |
---|
0:10:15 | the i for initial immunity the i-vectors are not explicitly localised spectrum so for example |
---|
0:10:22 | in the need a sequence is he or no ellipses in front end |
---|
0:10:31 | no that don't temporal resolution and maybe of wasn't feasible fishing |
---|
0:10:38 | so in this light we will explain why i think is the same front-end format |
---|
0:10:43 | for |
---|
0:10:44 | so i x |
---|
0:10:49 | so |
---|
0:10:49 | in this data is shown on the classical be split off and highly speech frame |
---|
0:10:54 | which represent the new nine |
---|
0:10:56 | and use it in this city |
---|
0:10:58 | taking this is obviously that was a good lately |
---|
0:11:01 | please remember that the other thing in a possible solution is represented by the area |
---|
0:11:08 | defined by |
---|
0:11:09 | what we are looking like |
---|
0:11:12 | so |
---|
0:11:12 | probably one finger against and now we can see that |
---|
0:11:15 | if they are compressed using d i i was in part of the spectral then |
---|
0:11:22 | how this particular |
---|
0:11:24 | it means that only invading is also this area is contaminated |
---|
0:11:30 | two additional cepstral coefficient |
---|
0:11:33 | that is only bring reading the |
---|
0:11:36 | and then it is okay in these diversity in the women |
---|
0:11:41 | s a single in the windows |
---|
0:11:44 | we aim to the |
---|
0:11:45 | investigating more contribution to the computational distribution which means |
---|
0:11:50 | they're eating i is to deal with one second only |
---|
0:11:54 | and you have one single |
---|
0:11:57 | no |
---|
0:11:58 | this control which is if it is forcing frame when using the uniform recently all |
---|
0:12:03 | be uniform resembling ones not seem to be |
---|
0:12:05 | it is normally using the |
---|
0:12:07 | sequences of feature extraction |
---|
0:12:09 | so no hannah |
---|
0:12:11 | well |
---|
0:12:12 | we don't know why the within the how |
---|
0:12:16 | exactly |
---|
0:12:17 | and unionise |
---|
0:12:19 | needed something in the frequency domain |
---|
0:12:21 | so in this in this problem can see that |
---|
0:12:25 | e |
---|
0:12:27 | it was it is before |
---|
0:12:28 | no there is no what you contribution we got stuck a traditional cepstral coefficient daisy |
---|
0:12:35 | higher |
---|
0:12:36 | usually motivation the cepstral a |
---|
0:12:39 | a computational cepstral coefficient which means |
---|
0:12:42 | i don't information in government and giving more if the size of one second |
---|
0:12:48 | is known to be consistently for women is different for the first low frequency scale |
---|
0:12:54 | is with me |
---|
0:12:55 | and for the signal treatment is gonna union |
---|
0:12:59 | and lastly we show that |
---|
0:13:01 | us spend a of them you need i spent on the |
---|
0:13:05 | they |
---|
0:13:06 | this shows because motion is sixteen cepstral coefficient is uniform |
---|
0:13:11 | i don't is better which means |
---|
0:13:13 | when i first was in any way to spend on |
---|
0:13:16 | then it would be better to use i |
---|
0:13:19 | localised there are a total successfully |
---|
0:13:24 | and |
---|
0:13:25 | that you can use the cost in order to a constant is a solution was |
---|
0:13:30 | different spectral |
---|
0:13:31 | no only one thing that is k |
---|
0:13:34 | when i based on the polite the women |
---|
0:13:37 | then he was also can be |
---|
0:13:39 | using the challenge is good |
---|
0:13:41 | given the one of the size of the lower bound and able to capture the |
---|
0:13:45 | a difference when the other realising over right |
---|
0:13:47 | where s |
---|
0:13:48 | when i based on lies in the way the use of sequences using that is |
---|
0:13:52 | thinking |
---|
0:13:53 | the engine |
---|
0:13:54 | the okay to those are persuaded and you the better performance and when i was |
---|
0:13:59 | i wasn't anything the spectrum that and it is different then |
---|
0:14:04 | it will look at those i think |
---|
0:14:06 | then and you get better performance |
---|
0:14:08 | then the secrecy using a recently |
---|
0:14:17 | now |
---|
0:14:19 | it's just an no mandatory have in the i-vector nonetheless global warning behind |
---|
0:14:26 | use of sequences the using the dramatically scale and he news good if the performance |
---|
0:14:32 | based on maybe i'll fix the log spectrum |
---|
0:14:35 | so in this thing here in this role |
---|
0:14:39 | all singing voices to represent the |
---|
0:14:41 | there will be made within twenty minutes ambition using the or something custody |
---|
0:14:46 | that means because he's using that exactly |
---|
0:14:49 | where is it wouldn't the closing we didn't show that it was in there |
---|
0:14:55 | they do not even representation using the gmm based system |
---|
0:14:59 | where |
---|
0:15:00 | beginning |
---|
0:15:01 | that's true within their |
---|
0:15:03 | the only male presentation using the efficiency with the german task which is a anything |
---|
0:15:11 | so we you remain |
---|
0:15:13 | s goes we are using the original signal being systems are statistically or we will |
---|
0:15:19 | go in it was it would be nice to |
---|
0:15:22 | no |
---|
0:15:23 | again in there |
---|
0:15:24 | if we can say they are specific what they contain in and fourteen |
---|
0:15:28 | where we can see that you are used a specially localising be |
---|
0:15:32 | logan |
---|
0:15:33 | and now you know there |
---|
0:15:36 | in our previous presenting overdemand use of sequences the user dramatically scale |
---|
0:15:42 | hindi better performance and table two shows are explained that she |
---|
0:15:47 | think this is in here |
---|
0:15:49 | this is because as in business |
---|
0:15:51 | it's a question you know right |
---|
0:15:57 | no only in thinking one representation |
---|
0:16:00 | this can afford it is from the decision a fourteen thirteen and fourteen and be |
---|
0:16:05 | used to think is residual material based front end |
---|
0:16:09 | a the idea that multiplying being by giving substantially nobody they were using it |
---|
0:16:18 | it is really is |
---|
0:16:22 | so no i |
---|
0:16:24 | i that imposing a when the i-vectors analysing the woodbury then you also decreases the |
---|
0:16:29 | original article is good this one day they're having the size of those of us |
---|
0:16:36 | and those are frequently but the woman |
---|
0:16:42 | no |
---|
0:16:42 | the conditional condition |
---|
0:16:45 | so |
---|
0:16:46 | if you already |
---|
0:16:47 | seen a linguistically and presentation you might hear i you might the idea would be |
---|
0:16:53 | presentation |
---|
0:16:54 | originally proposed in this |
---|
0:16:56 | well |
---|
0:16:57 | for something analysis to identify localiser representing this problem |
---|
0:17:03 | we define the also find that the different exactly the i think within the different |
---|
0:17:09 | something and |
---|
0:17:11 | but it was activated in front end which imprecise information relayed consuming |
---|
0:17:18 | and |
---|
0:17:19 | it was also they're using the front end and vocal qualities of the database |
---|
0:17:25 | so this finding explain why |
---|
0:17:29 | that is simply a back to estimate the solution is |
---|
0:17:33 | so what i in this thing |
---|
0:17:39 | bengio |
---|
0:17:40 | and if you have any portion a have little as follows |
---|