0:00:14 | but the way as in addition on a big the or at the moment variable |
---|
0:00:19 | and together energy profiles for this assumption |
---|
0:00:23 | please call per week the nist model or something |
---|
0:00:29 | presentational by a discussion of speech data or regression |
---|
0:00:34 | then twenty thousand eleven a short almost is ugly but also a unity it will |
---|
0:00:41 | be really didn't you feel |
---|
0:00:44 | and that in addition to |
---|
0:00:45 | no two thousand three |
---|
0:00:48 | in order to discuss different all come from each at least one but you |
---|
0:00:53 | estimate of the system finally |
---|
0:00:56 | so in this work we will be |
---|
0:01:00 | the baseline system and the other site by d |
---|
0:01:05 | but is if only a little training |
---|
0:01:09 | and then vol challenge in |
---|
0:01:11 | one seventeen leading system and i just |
---|
0:01:15 | so we will be |
---|
0:01:17 | formulated as stand alone and we consider a |
---|
0:01:21 | and are the natural just you know speech |
---|
0:01:26 | and then otherwise you will be system which is there |
---|
0:01:29 | is a story or something what are you hungry if it is only genuine coming |
---|
0:01:34 | as an actual speech |
---|
0:01:36 | comparison |
---|
0:01:39 | in this will be a excluding concentrating on the speech production |
---|
0:01:46 | so that there is a little or lossless wealthy and single where they not be |
---|
0:01:51 | honest you know |
---|
0:01:53 | a little variations the |
---|
0:01:56 | has basically three aspects lately ready one |
---|
0:02:00 | environment according to one |
---|
0:02:02 | so when we also for smart phones five or not using my |
---|
0:02:08 | i quality that all speaker b |
---|
0:02:10 | one and conical |
---|
0:02:13 | the different |
---|
0:02:14 | experiments i well for these recordings which is |
---|
0:02:19 | well only is also one in which all |
---|
0:02:23 | and this is a |
---|
0:02:27 | no |
---|
0:02:28 | we will discuss the modelling all a list of |
---|
0:02:31 | it in the list movies model you one is a new one next even that's |
---|
0:02:36 | a strange |
---|
0:02:37 | during this whole signal s b is not an estimation of the impulse response |
---|
0:02:43 | of the |
---|
0:02:45 | the recording device |
---|
0:02:47 | microphone |
---|
0:02:48 | i mean what |
---|
0:02:49 | plus the known model |
---|
0:02:51 | you don't |
---|
0:02:52 | so this imposes was a copy of this series convolutional we build upon so on |
---|
0:03:00 | then the recording device and as a speakers characteristics |
---|
0:03:04 | that is multimedia speakers and the and |
---|
0:03:08 | so relating the blissful means also in signal characteristics differences of iterations models |
---|
0:03:16 | there were involved only in the |
---|
0:03:18 | jane speech |
---|
0:03:21 | to his audibly distorted data so that sources speech you can see that |
---|
0:03:27 | this is a |
---|
0:03:29 | actually lower than the worst |
---|
0:03:31 | this is an actual speech and e g |
---|
0:03:35 | or |
---|
0:03:37 | so then what went on according will only |
---|
0:03:40 | especially |
---|
0:03:45 | and then only isn't getting or |
---|
0:03:51 | or a roller or |
---|
0:03:54 | and that was one instead of just one here is the presentation |
---|
0:04:00 | so one of the important characteristic that is here is that it should also gonna |
---|
0:04:05 | some |
---|
0:04:05 | in the differences between genuine and it |
---|
0:04:11 | speech |
---|
0:04:11 | and are the ones in a distribution because we can see there |
---|
0:04:16 | most of the day |
---|
0:04:18 | these changes are data |
---|
0:04:21 | it is the distortion that is being nervous in the high frequency regions |
---|
0:04:26 | because of illegal immigrants |
---|
0:04:28 | and of just like expected because the their transmissions at their sticks on the acoustic |
---|
0:04:34 | characteristics of gonna the phone |
---|
0:04:36 | and women |
---|
0:04:37 | is expected to be bandpass region because |
---|
0:04:40 | only one can be is the error or because you becomes a |
---|
0:04:44 | we'll have more stands opinion is |
---|
0:04:47 | i don't have been system is responsible must therefore |
---|
0:04:51 | okay was characteristics in the in the previously |
---|
0:04:55 | in your dynamo in but it just a model of speech function |
---|
0:05:00 | a we can see that maybe a speech has basically by first order statistics on |
---|
0:05:07 | the right you can is on which involves only |
---|
0:05:12 | these concealment speech coding speech |
---|
0:05:16 | no in this work addressing the stars in addition there me |
---|
0:05:22 | concentrate on a weekly |
---|
0:05:25 | on a new data |
---|
0:05:27 | additional industry on |
---|
0:05:31 | nolan available in you |
---|
0:05:33 | the idea is there you know a companion but we discuss the fundamentals of do |
---|
0:05:38 | you wanna do not you so well before that |
---|
0:05:43 | the initial requires the basis accent but instead of discontent signal x and then it's |
---|
0:05:49 | previous that unveiling something minus one |
---|
0:05:52 | an additional next congestion something that is |
---|
0:05:56 | we find that it is your data in the next experiment and minus one in |
---|
0:06:03 | so that you and then because of this on the amount of structured utilizes the |
---|
0:06:07 | desire was |
---|
0:06:08 | but their own meeting the |
---|
0:06:10 | but |
---|
0:06:11 | in boston immediate future in the presence |
---|
0:06:14 | however within a is actually an actual speech signal catches the dependencies |
---|
0:06:20 | in the signal is also has a signal and these different independent signal is not |
---|
0:06:24 | a lie i can be your like having something minus one or the u v |
---|
0:06:30 | mobile |
---|
0:06:32 | well in the context of speech production and perception we know little or no control |
---|
0:06:37 | recognition and i for this |
---|
0:06:40 | no one she |
---|
0:06:41 | so mostly for something or an introduction cognition perception |
---|
0:06:45 | well whatever you want to thank you speech |
---|
0:06:50 | so well motivated by this kind of |
---|
0:06:53 | okay statistics of the natural speech we also exploit a pu you know if available |
---|
0:06:59 | in one |
---|
0:07:01 | while i mean |
---|
0:07:02 | then we consider only the a initial clustering just and we consider |
---|
0:07:08 | the i se but in the past and i in which |
---|
0:07:12 | sure |
---|
0:07:12 | and as this one and it is not really meant that is a mathematical |
---|
0:07:17 | in addition |
---|
0:07:18 | all the sinusoidal previous i and the previous section seven |
---|
0:07:22 | and a we can see that use basically the new location as explained minus plus |
---|
0:07:28 | one |
---|
0:07:29 | excluding and less time based on and |
---|
0:07:32 | but i score as defined in |
---|
0:07:35 | one is in this because it captures of bananas yours was |
---|
0:07:39 | based on the |
---|
0:07:40 | reynolds |
---|
0:07:42 | so this is and everything the |
---|
0:07:43 | we begin this is the pu and this is that it wouldn't exist |
---|
0:07:47 | consider |
---|
0:07:48 | there is only just where you're being nor do i even it is |
---|
0:07:52 | in this isn't the video games or two |
---|
0:07:56 | it is because there isn't dependency structure because of the pu for these kinds of |
---|
0:08:00 | and in this case |
---|
0:08:02 | you the minutes in this is |
---|
0:08:05 | i feel that it was a good why don't we discussed in |
---|
0:08:10 | we can see that you also used the described next sure you domain and then |
---|
0:08:16 | a justice of you in the netherlands |
---|
0:08:21 | not in this but are we extend our recently proposed remote the actual and b |
---|
0:08:26 | c basically women because not more |
---|
0:08:30 | that is used in these easy we have an input speech |
---|
0:08:35 | preemphasis problem and yet been investigated the and then be cleaning everything more than fifty |
---|
0:08:42 | one from a nation |
---|
0:08:45 | so miserably you'll be explained remedial the reason is there anyone you know |
---|
0:08:50 | well actually better sticks all basically and dependencies and sequence of both genders speech and |
---|
0:08:57 | then how did better than the |
---|
0:08:59 | there is a question that only |
---|
0:09:01 | a in this is a screen |
---|
0:09:04 | so for example this is the |
---|
0:09:06 | two can assume one all basically the speech |
---|
0:09:10 | you know various acoustic and one that we discuss they can control spending analysis and |
---|
0:09:15 | can see that the view point has got the ones which we discuss not be |
---|
0:09:20 | a and b |
---|
0:09:22 | their ability to just really the speech forty one |
---|
0:09:26 | the final and one |
---|
0:09:29 | no this is that he's for the initial clusters of similar to speech that we |
---|
0:09:33 | "'cause" there's not as is that in the component other |
---|
0:09:36 | was is trained using that as convolutional physically |
---|
0:09:41 | impulse response will be these the resulting was it is obvious cases of this |
---|
0:09:48 | all signals are inverse discrete cosine and sinus |
---|
0:09:50 | that is themselves layers |
---|
0:09:52 | we examine the impostors |
---|
0:09:55 | the man digging out of the impostors and |
---|
0:09:58 | and weddings an option |
---|
0:10:00 | we can see that the pu provider maintains the high energy pulses and an additional |
---|
0:10:07 | okay |
---|
0:10:08 | the there's characteristics within that will use in which their children adaptation transforms used by |
---|
0:10:15 | one morning or anything they're also that also for the natural speech in the u |
---|
0:10:19 | a visiting then more only |
---|
0:10:23 | so that it is which means you think that in cases in a considerable so |
---|
0:10:27 | that it almost |
---|
0:10:29 | with a single moment |
---|
0:10:31 | earlier |
---|
0:10:32 | which is basically in the next to the model mice is channel factors something more |
---|
0:10:38 | than one indicating in the morning shows that are running and you changed |
---|
0:10:43 | speech production that which should also direct relation |
---|
0:10:47 | i in this study we are really |
---|
0:10:51 | this |
---|
0:10:52 | characteristics |
---|
0:10:53 | this decoder just basically |
---|
0:10:57 | speech |
---|
0:10:57 | only the achievable when the anyone who have a variable are you gonna show a |
---|
0:11:03 | well fine |
---|
0:11:05 | for actually than that of speech shown basically well why a beautiful place |
---|
0:11:11 | corresponding this is a to be |
---|
0:11:13 | but in this feature a fight for |
---|
0:11:17 | the weather the rest of your own et al |
---|
0:11:21 | i just t |
---|
0:11:22 | and the different for different values of the difference in the next layer for example |
---|
0:11:27 | for this |
---|
0:11:29 | and allows |
---|
0:11:30 | and the elements in this is one is to show all three and one as |
---|
0:11:34 | well you know that for different is the next |
---|
0:11:38 | five |
---|
0:11:40 | when using basically here with additional features and then there's each one woman and an |
---|
0:11:45 | actual speech and hence we consider this s |
---|
0:11:48 | secondly |
---|
0:11:50 | better a discriminative you |
---|
0:11:52 | for using the |
---|
0:11:54 | and b a value in the pca projection |
---|
0:11:58 | so these differences are also clearly better for the prior knowledge of multiple files are |
---|
0:12:03 | used for the natural and you than the one of the financial speech and |
---|
0:12:08 | but in order to utilize probability speech and that or distinctions this |
---|
0:12:13 | not be quite and we plan and text i is differences |
---|
0:12:18 | for doing that she |
---|
0:12:21 | this is innovation |
---|
0:12:23 | slightly distribution it's a function |
---|
0:12:26 | for |
---|
0:12:27 | okay well i mean i speech and the speech bin z there and the standard |
---|
0:12:31 | batteries but this in figure you're to be figure in the world melodies |
---|
0:12:37 | for spectral |
---|
0:12:38 | i suspect and this is because |
---|
0:12:41 | ten miliseconds each other ones for an s and h |
---|
0:12:46 | we can see clearly there at the start with just here in both cases are |
---|
0:12:51 | lower there are one get better result shows you are working together and it is |
---|
0:12:55 | features really able to see what |
---|
0:12:58 | focused |
---|
0:12:59 | and high resolution of formant structure an overall distortion |
---|
0:13:04 | one an active speech |
---|
0:13:05 | and the signal doesn't features which are no |
---|
0:13:08 | can be captured but only |
---|
0:13:10 | well known as features in the residual so which is being unity |
---|
0:13:15 | namely |
---|
0:13:16 | during speech |
---|
0:13:19 | and this is in profile the textual this profile always be there for the various |
---|
0:13:25 | values of an index that is |
---|
0:13:27 | well human speech |
---|
0:13:28 | we used to using only the energy based vad point five we used and is |
---|
0:13:33 | a ribbons in next |
---|
0:13:34 | thus the phone recognition as well |
---|
0:13:37 | and then be seen that are really |
---|
0:13:40 | we see their four one and can see that for the various different as in |
---|
0:13:46 | this distribution as producing features and testing a each altogether as it is there is |
---|
0:13:54 | measured for different values of you |
---|
0:13:56 | a one different messages consider |
---|
0:13:59 | most of the five one solution to capture some features general capturing the traditional table |
---|
0:14:06 | for that this is the t i one |
---|
0:14:10 | in this thing using the standard statistically meaningful |
---|
0:14:13 | is longer than surrounding wasn't two database |
---|
0:14:16 | and it is that you one i in this work you the initial search |
---|
0:14:22 | and in experiment is a little difference you the |
---|
0:14:24 | these are not i logistic thing and assisting different no matter just like |
---|
0:14:30 | for each of these features are going fourteen on the cross gender engine is varying |
---|
0:14:35 | from one twenty thirty nine ninety |
---|
0:14:37 | the motivation z-norm mixture component gmms okay and ninety five one |
---|
0:14:43 | and we use basically different ones |
---|
0:14:45 | in this work is in gmm simple gmms |
---|
0:14:50 | this is a for successful results using a it is interesting |
---|
0:14:57 | with that is for refinement is a dependence you next |
---|
0:14:59 | you can see that basically |
---|
0:15:02 | and anything that's |
---|
0:15:04 | forty eight was the one they but it is my final and basically we consider |
---|
0:15:09 | forty eight to five they are used for six point five significant and is a |
---|
0:15:15 | twenty five percent |
---|
0:15:16 | or represented as |
---|
0:15:19 | which in the usual significant improvement in |
---|
0:15:23 | and has fewer can be a to find an optimal choice of measurements index for |
---|
0:15:27 | this experiment |
---|
0:15:29 | and this is basically the |
---|
0:15:31 | locatable score retire fungus you gmm and was you sure well based on all distributions |
---|
0:15:38 | of the solutions |
---|
0:15:40 | all sequences e |
---|
0:15:43 | and this is an analysis e |
---|
0:15:44 | and is mfcc and matrix |
---|
0:15:47 | you can see that for you just distribution has to be well signal estimation whereas |
---|
0:15:52 | for residual different for a gmm |
---|
0:15:59 | this on the development |
---|
0:16:01 | no you're not experiments for basically these features are like a combination of them but |
---|
0:16:06 | you forty eight one |
---|
0:16:08 | well and then you mfcc and the n six z |
---|
0:16:12 | if it also there is used a list of the unlike this is from not |
---|
0:16:17 | just like mfcc |
---|
0:16:20 | i this is e |
---|
0:16:21 | and six is significant performance improvement then |
---|
0:16:24 | we both models going able to model well basically smooth and phone it is easy |
---|
0:16:29 | features we can |
---|
0:16:30 | so it is easy |
---|
0:16:31 | then we'll is used in the ecstasy |
---|
0:16:33 | and m c and we also there is really can strategy |
---|
0:16:37 | this result is as you we just use this almost indicating that |
---|
0:16:42 | but with features it also captures complementary information |
---|
0:16:46 | then the baseline of the challenge can and |
---|
0:16:50 | is systems on t |
---|
0:16:52 | wasn't retirees wasn't one iteration |
---|
0:16:55 | so |
---|
0:16:56 | this is the and already you know |
---|
0:16:59 | and then we also show the performance using a detection error tradeoff curve so we |
---|
0:17:03 | can also they're the performance of the det calls for a way to one this |
---|
0:17:08 | is one is basically |
---|
0:17:10 | mfcc then security |
---|
0:17:12 | this is one |
---|
0:17:13 | and this is an existing data for me from clean the proposed features and screams |
---|
0:17:19 | and |
---|
0:17:19 | and similar training actually almost or with |
---|
0:17:24 | also features are only one |
---|
0:17:27 | so |
---|
0:17:28 | however the fuses well formed elements are defined in and function indicating there |
---|
0:17:34 | you models there is resigning |
---|
0:17:38 | but are trained using the that the justice to perform better than the engine just |
---|
0:17:43 | the decision features |
---|
0:17:45 | i don't use and |
---|
0:17:49 | here is an analysis or physically and one or more efficient well money well mauritius |
---|
0:17:55 | physically model issues |
---|
0:17:57 | and i saw in one additional from the perspective |
---|
0:18:02 | so that reducing the problem first final one is okay |
---|
0:18:06 | new the bar e |
---|
0:18:09 | e here is for the natural and this a three different this from the different |
---|
0:18:13 | characteristics like benefit |
---|
0:18:15 | a high quality classes |
---|
0:18:18 | three one and you'll be playing on the only problem |
---|
0:18:22 | a message in which |
---|
0:18:24 | so we can see their this is the sum of implementation and fast implementation and |
---|
0:18:29 | they are very real distinct and is a weighting |
---|
0:18:32 | involve a harmonic structure is or |
---|
0:18:34 | in an actual speech but there is no result obviously |
---|
0:18:38 | you need for |
---|
0:18:39 | this is definitely a cost you difference between the natural and the |
---|
0:18:43 | i |
---|
0:18:46 | finally we evaluated the sickly |
---|
0:18:48 | using this costings you can see that |
---|
0:18:53 | the views different contributions like environment acoustic environment that voice recording ways |
---|
0:18:59 | and we can see their own but for the proposed features to meet is even |
---|
0:19:03 | da was to find the list equal and |
---|
0:19:06 | existing we just like dimensions using consisting and |
---|
0:19:09 | so this is showing me for an answer was you just on different conditions |
---|
0:19:15 | find it was always in this work we take your batteries exploiting question |
---|
0:19:20 | the idea was features to d c you know okay the menu |
---|
0:19:26 | movies easy and everything but |
---|
0:19:29 | and this is only on better for different decomposition of a controversial but was not |
---|
0:19:35 | affected by the owners of different one |
---|
0:19:37 | number of channels but she was adamant use this for most beneficial for the two |
---|
0:19:43 | streams |
---|
0:19:43 | this forms as a |
---|
0:19:45 | well |
---|
0:19:46 | on the final experimentation |
---|
0:19:48 | i don't know we need was actually impulse response of random should be my acoustic |
---|
0:19:53 | environment |
---|
0:19:53 | we should definitely a landing on the nist is immensely challenging |
---|
0:19:58 | things as well |
---|
0:19:59 | with this knowledge yet results using line data and in as you gonna condition a |
---|
0:20:05 | one time someone colours audio research |
---|
0:20:09 | we also kind of the organisers of recognition workshop twenty and challenges of this is |
---|
0:20:15 | what we also want to challenge |
---|
0:20:17 | really and also |
---|
0:20:19 | indeed it was made available but not from in this experiment be |
---|
0:20:23 | sarcastically meaningful system not |
---|
0:20:26 | and finally the citizens just |
---|
0:20:29 | and we i |
---|
0:20:30 | on the phone and h |
---|