0:00:06 | i don't everyone sounds fortune this detail this field with for the children session long |
---|
0:00:11 | nor automatic speech recognition and i not are a global from google research a total |
---|
0:00:17 | it just started |
---|
0:00:21 | this sixty minutes be they'll will be organised into boards the fast but will be |
---|
0:00:26 | written by mean explaining basic formulations and some algorithms from your speech recognition |
---|
0:00:33 | and the second but well cover software and implementations phone your speech recognition |
---|
0:00:39 | and this but will be read by my coworker she gave me |
---|
0:00:43 | it's going to the fast about |
---|
0:00:47 | after more i want to define what is in your or speech recognition |
---|
0:00:51 | in decision i used this down for we farting techniques for we are rising and |
---|
0:00:57 | do and speech recognition but chose techniques sometimes can be also applied to know into |
---|
0:01:04 | in speech recognition systems |
---|
0:01:07 | and to and speech recognition is a time for speech recognition that involves neural networks |
---|
0:01:12 | combining acoustic features directly into words |
---|
0:01:17 | and you may know already a conventional speech recognizer cost consisting over every three parts |
---|
0:01:23 | acoustic model pronunciation model and it's more detail |
---|
0:01:29 | mm on the represents a probabilistic combustion |
---|
0:01:33 | and this site are wasn't here find the best possible hypothesis from joe smallest |
---|
0:01:41 | one just a hunt |
---|
0:01:43 | and two and approach uses systems that this |
---|
0:01:47 | here and you wanna talk |
---|
0:01:49 | the diet equal but feature us into procedure is are used to represent forest equation |
---|
0:01:57 | for speech recognition |
---|
0:01:59 | well obvious advantage of this approach is |
---|
0:02:03 | simplistic of the system |
---|
0:02:05 | it's very make it comes with such algorithms can start in higher internal combustion can |
---|
0:02:11 | be very complicated doing agreement |
---|
0:02:14 | within three and two and approach is even extended to directly hunter role will form |
---|
0:02:20 | signals is that all pre-computed feature vectors |
---|
0:02:25 | discussion express how to design joe's neural networks that dynasty outputs words wrong feature with |
---|
0:02:32 | the or role of all signals |
---|
0:02:45 | easy as in the in this fast but iris brand three approaches for and speech |
---|
0:02:51 | recognition and |
---|
0:02:53 | also recent advances over chose three |
---|
0:02:58 | it's called is a fast section |
---|
0:03:02 | most of classical speech recognition models use this integration unit score |
---|
0:03:08 | because the generative story or feature vector sequence x |
---|
0:03:13 | and a procedure as well i |
---|
0:03:15 | and b models the distribution of joel two variables by introducing |
---|
0:03:20 | as shown to latent variables |
---|
0:03:24 | so phonemes cd as well in here and the related hmms to she guessed s |
---|
0:03:32 | usually be decomposed into by assuming that phoneme z yes |
---|
0:03:36 | is generated depending on the word |
---|
0:03:39 | and an si hmm states are generated depending on phoneme sequences |
---|
0:03:45 | and features that yes |
---|
0:03:47 | is depending on the agent states |
---|
0:03:50 | so here me to carry assume draws independency assumption between introduced variables |
---|
0:03:58 | yes in this assumption looks okay but yes in section will result in some languages |
---|
0:04:06 | in conventional approach driving techniques introduced in each component of this decomposition |
---|
0:04:14 | for example fold and that's what are we often used |
---|
0:04:17 | i ran in and it's model here for getting better prediction all words marshy genesis |
---|
0:04:23 | well as of what acoustic monitoring people when used t and even your network or |
---|
0:04:28 | a recurrent neural network for |
---|
0:04:31 | one thing this emission probability of features you guess is |
---|
0:04:35 | in the next size i review joe's or rolled used to enhance components with it |
---|
0:04:41 | writing techniques |
---|
0:04:46 | t n and german hybrid approaches are very famous way to enhance the |
---|
0:04:51 | conventional acoustic models |
---|
0:04:54 | in this approach this definition the emission probability it used as an acoustic model of |
---|
0:04:59 | the conventional speech recognition |
---|
0:05:02 | she a the probability p java that even the hmms the there is transformed into |
---|
0:05:08 | a probability that is proportional to this special |
---|
0:05:12 | this is the ratio between the pretty if a probability of all agents today the |
---|
0:05:16 | given the feature vector and some as not probability all the agents date |
---|
0:05:23 | the predictive distribution is modeled by a neural net and the marginal distribution is modeled |
---|
0:05:29 | by a margin on the other wall categorical distribution |
---|
0:05:33 | this is a convenient way to bring expression ability of neural nets into |
---|
0:05:39 | the conventional speech recognizers however |
---|
0:05:43 | this have similar programs actually |
---|
0:05:47 | cost |
---|
0:05:48 | be used as division in you and then to permit parameter we use marginal distribution |
---|
0:05:53 | independently parameterized by different parameters |
---|
0:05:57 | so baseball's used here is just an approximation because the different modeling parameters used for |
---|
0:06:04 | the marginal probability and predictive probability |
---|
0:06:09 | secondary |
---|
0:06:11 | it is known that a gmm stay there is a very difficult times it's been |
---|
0:06:15 | be to estimate it |
---|
0:06:16 | a classifier was yours |
---|
0:06:20 | classifiers i mean you're metaquest for us |
---|
0:06:24 | cost a for example some stationary bothers |
---|
0:06:29 | is very difficult to classify the acoustic feature vector with a is belongs to the |
---|
0:06:35 | fast all the phonemes segment was a second how all the phoneme segment |
---|
0:06:40 | this fact makes training and prediction of the classifier more confusing what a stable in |
---|
0:06:47 | other words |
---|
0:06:51 | connectionist temporal classification can be regarded as a remedy for the |
---|
0:06:56 | that program here |
---|
0:06:58 | is easy more than each time today where is represented only by a few points |
---|
0:07:03 | in the c yes |
---|
0:07:05 | is done by introducing tommy a view here we according to brown |
---|
0:07:11 | and associate most all input vectors to the rank k |
---|
0:07:16 | only few input frames that i kind of center over poignant continuous to the final |
---|
0:07:22 | output |
---|
0:07:25 | this diagram shows |
---|
0:07:27 | the speech to or your network this easy approach in this case is |
---|
0:07:34 | when we have infancy yes with eight elements |
---|
0:07:39 | each in the to be that is classified into name is augmented with the banks |
---|
0:07:44 | in more |
---|
0:07:45 | and the final result is defined by removing banks imports from the output |
---|
0:07:52 | one advantage of this you want it is that be no longer i'm used to |
---|
0:07:57 | estimate a gmm is data davis with using commission a speech recognition systems |
---|
0:08:04 | so it is possible to train neural networks from scratch |
---|
0:08:08 | also dct is in your in it is |
---|
0:08:11 | jerry that we can use eight four laboratory see us to seek yes task encoding |
---|
0:08:19 | and in speech recognition |
---|
0:08:21 | so it can be used either to estimate phonemes you can write conveys no religious |
---|
0:08:26 | order to estimate or she can or grapheme cts data into and approaches |
---|
0:08:31 | however each day the here is estimated independently so there's not able to david dependency |
---|
0:08:44 | it's a and b elaborate on the didn't is the induced by c d c |
---|
0:08:50 | it is known that run a session one graphically move or in c d c |
---|
0:08:54 | is ensuring be written represented by finite state transducer |
---|
0:09:00 | if we present it in transducers be seen that the conventional left-to-right hmms and c |
---|
0:09:07 | d's in your minutes |
---|
0:09:08 | have a quite similar event is used for your |
---|
0:09:12 | so in fact using only c d z for speech recognition is |
---|
0:09:17 | in fact very similar to doing speech recognition results using language models |
---|
0:09:23 | however still see it is you have some good properties |
---|
0:09:28 | well the is it |
---|
0:09:31 | in that |
---|
0:09:32 | it can perform better combination with down sampling approaches in neural networks |
---|
0:09:38 | commissioner broad needed |
---|
0:09:40 | gmm based i meant that doesn't work very where with down sampled features |
---|
0:09:46 | also even after obtaining an hmm state alignments the conversion i'm chinese to associate single |
---|
0:09:52 | related to each time star |
---|
0:09:55 | that makes the a very information or on the bus in the regional planning boundaries |
---|
0:10:02 | and this ambiguity becomes more if we if the feature is downsampled |
---|
0:10:08 | so it is you only classifiers |
---|
0:10:10 | some kind of center of segments so we apply this i'm bus today is |
---|
0:10:16 | related to that the second advantage is that we don't need to classify some phonemes |
---|
0:10:21 | structure |
---|
0:10:23 | nice the fast and second how full bottle |
---|
0:10:26 | this makes training was terrible and also prediction more complicated |
---|
0:10:32 | that means that it is combined with some such are voice and sid using neural |
---|
0:10:36 | nets tends to make score defined as roger for each examples |
---|
0:10:45 | so using cd see for classical speech you speech recognition is a good idea because |
---|
0:10:51 | it needs down dating wanted to within the labels |
---|
0:10:55 | event is e |
---|
0:10:56 | even if c t z is used as a part of the system we still |
---|
0:11:00 | have advantages described before |
---|
0:11:03 | so |
---|
0:11:04 | don't somewhere each and every be applied and also it can form a good combination |
---|
0:11:10 | with that such algorithm |
---|
0:11:14 | is brought presented by stack |
---|
0:11:17 | so our indiana well there are eight cars of commerce now hybrid approach is unseat |
---|
0:11:23 | is the approach |
---|
0:11:26 | this is that c disease either want a just also vocal tract in conventional is |
---|
0:11:30 | our systems |
---|
0:11:34 | it doesn't next component |
---|
0:11:37 | now there's more less channel be enhanced by introducing are enhanced recurrent neural nets what |
---|
0:11:43 | is the atoms |
---|
0:11:44 | long short time a more in your on its base order regression inputs |
---|
0:11:49 | are in a language model by the x |
---|
0:11:51 | this division over the next word by r antennas |
---|
0:11:56 | that ingested always afraid guess boards |
---|
0:12:00 | unlike previous n-gram round is more approaches are and then someone i did a word |
---|
0:12:05 | and its context in a continuous vector |
---|
0:12:09 | and use it to make a prediction the next work |
---|
0:12:14 | since we used a reference for making dis-continuous context you please in addition |
---|
0:12:20 | irina spanish monitors channel in theory hunting |
---|
0:12:24 | and no infinite drinks of our history |
---|
0:12:28 | even so in practice it often very difficult to optimize someone or in that very |
---|
0:12:33 | nice to see significant improvements from n-gram language models |
---|
0:12:38 | as a downside context representation are analyzed models i e |
---|
0:12:43 | in n-gram approaches |
---|
0:12:46 | the number of possible context is bounded by the number of different war history that |
---|
0:12:52 | is finite |
---|
0:12:55 | however a four hour and if you wanna be do not |
---|
0:12:59 | do not used extending over the context to be used |
---|
0:13:03 | so each different work is to have the defining context a good representation |
---|
0:13:08 | one can say |
---|
0:13:09 | this is issues downside for computation |
---|
0:13:14 | but in fact it's not that inefficient |
---|
0:13:17 | is very easy this idea is models |
---|
0:13:21 | this going to presentation to carry you guys should space to store in memory wiring |
---|
0:13:26 | harness was somewhat something |
---|
0:13:28 | maybe compare the size of speech recognition systems with a conventional approach and free neural |
---|
0:13:34 | network approach the size is actually compare or and your and it's are you was |
---|
0:13:38 | more as on the tree expand it |
---|
0:13:41 | a weighted finite state transducers |
---|
0:13:46 | so it might be a bit counterintuitive button urinal neural net approaches actually fit very |
---|
0:13:52 | well with |
---|
0:13:53 | mobile devices to |
---|
0:13:56 | it's very if the device has a some accelerators full matrix multiplication well example |
---|
0:14:06 | another important property that inference is the competition or if you change is to organization |
---|
0:14:12 | she's done in conventional approaches use |
---|
0:14:16 | takes the rents context for making a prediction each part of token used to be |
---|
0:14:20 | long enough for making i'd rate reduction |
---|
0:14:24 | however irina stand out from when the context |
---|
0:14:28 | that means that we can use finite organization metal that is some word tokens well |
---|
0:14:34 | maybe we can use a grapheme based or close to |
---|
0:14:38 | to document organisers used reason you unacknowledged monitors |
---|
0:14:42 | most are very similar in the in the sense that talk as all the data |
---|
0:14:47 | by matching existing control and the algorithms that these chaps database tokens |
---|
0:14:53 | and they gradually margins in |
---|
0:14:56 | both select pair or tokens marks might in some criteria |
---|
0:15:02 | but encoding pde use these |
---|
0:15:05 | the number i just and occurrences of tokens in the dataset whereas |
---|
0:15:11 | work this approach evaluate the likelihood well what dataset we do things simply not models |
---|
0:15:17 | over defined tokens |
---|
0:15:19 | using the draws final vocals decoding result in a smaller tokens that |
---|
0:15:25 | and the number with different tokens |
---|
0:15:27 | in the system is often corresponds to the size of out three open your networks |
---|
0:15:33 | thus |
---|
0:15:34 | it also contributes to the computational efficiency of neural nets |
---|
0:15:40 | now who introduced in additional c disease and advantages of around the dance |
---|
0:15:47 | the distinction is about hiring transduced us that can one strings or bottom results |
---|
0:15:57 | as i mentioned she did she turned out to be sensitive and it should be |
---|
0:16:00 | doing output tokens |
---|
0:16:02 | i don't and channel be used as a component that inject the household event is |
---|
0:16:08 | a so |
---|
0:16:09 | by combining cd z based prediction with are in an n-best contest hundred we get |
---|
0:16:14 | are and transducer |
---|
0:16:17 | this diagram shows the |
---|
0:16:19 | the as texture or are in a transducers |
---|
0:16:24 | this thought of as a director |
---|
0:16:27 | corresponds to c t z predictor |
---|
0:16:30 | despite compares distribution over the nist tokens |
---|
0:16:34 | we have the tokens it is all demanded by all made it with a down |
---|
0:16:39 | symbol |
---|
0:16:43 | and this but correspond to our own in and |
---|
0:16:46 | this of feedback loop next the prediction to be dependent to the previous words this |
---|
0:16:52 | actually inject the dependence you to the previous of talkers |
---|
0:17:00 | c d c and r and d is yes us a common structure that use |
---|
0:17:05 | rank to and the input and output elements |
---|
0:17:09 | as i shows in the cities each s it is it a free corresponding to |
---|
0:17:13 | the |
---|
0:17:15 | hmm states in the conventional acoustic model |
---|
0:17:18 | and a similar to the agents days it is handled as a latent variable in |
---|
0:17:23 | the likelihood function |
---|
0:17:26 | as you are |
---|
0:17:28 | this latent variable is marginalised out |
---|
0:17:31 | two defines a likelihood function and a logistic function |
---|
0:17:35 | here |
---|
0:17:36 | or c d c and a rarity models with brock symbol use this |
---|
0:17:42 | simple handcrafted model for probability old wires regions given the alignment c guess one |
---|
0:17:50 | due to this simple definition of the probability all by |
---|
0:17:55 | given by brian |
---|
0:17:57 | the likelihood function can be simplified in this way |
---|
0:18:05 | difference between c d c and r n and t appears in the second component |
---|
0:18:09 | probability all i meant |
---|
0:18:12 | given the input feature vectors here x |
---|
0:18:17 | c vc introduces frame wise independency here we identity introduces the and i'm in predictions |
---|
0:18:25 | that is depending on the previous i meant variables |
---|
0:18:33 | to explain how i'm it is more the reading and t is process shows the |
---|
0:18:39 | case that be how for input vectors |
---|
0:18:42 | e one e to easily and e full and really fast or the u s |
---|
0:18:48 | c yes |
---|
0:18:51 | low and word |
---|
0:18:55 | we show the case when the difference was a fixed as in the training phase |
---|
0:19:02 | i'll original joint network denoted as if here |
---|
0:19:08 | it has defined by the corresponding to different times stand for the other thing that |
---|
0:19:12 | and |
---|
0:19:13 | evaluated things of the context in your handling |
---|
0:19:18 | to fast estimation is given by feeding the fast in court |
---|
0:19:22 | eva and initial context here she's there to the joint network |
---|
0:19:29 | if we close to that the fast output of the model to be block back |
---|
0:19:33 | need to be finished reading from the current encode either |
---|
0:19:38 | so the more the start switching that i two |
---|
0:19:44 | if the second element of the i'm and see us to be the fast talking |
---|
0:19:48 | in the reference |
---|
0:19:51 | that is he huh |
---|
0:19:54 | it changes the context with stuff from c zero to see one |
---|
0:19:59 | and |
---|
0:20:00 | the model continues to pretty if the nist of but should be back why should |
---|
0:20:04 | be some other words |
---|
0:20:08 | for example |
---|
0:20:09 | if the past that outputs is to control can low is chosen |
---|
0:20:14 | so context mister will be changed from semen to see two |
---|
0:20:19 | by repeating the same process until we reached as a final step here |
---|
0:20:24 | we get the posterior annotation knows single alignment cost |
---|
0:20:33 | for the training also neural networks we they didn't diamond variables |
---|
0:20:39 | we need to compute and expectation of agrarian visitors with given the alignment variable |
---|
0:20:45 | well as the posterior distribution of the alignment whatever's here |
---|
0:20:50 | and study |
---|
0:20:51 | for what are wasn't is |
---|
0:20:54 | used for this purpose |
---|
0:20:56 | how we have a for a lot colour wasn't although generate graph is not computationally |
---|
0:21:00 | efficient |
---|
0:21:02 | to say it's not |
---|
0:21:05 | g u r t but for entry |
---|
0:21:09 | however i meant defined in are in energy in bright is good it's shaped event |
---|
0:21:15 | is you structure |
---|
0:21:16 | for this kind of stress enough to read "'em" problem for what i wouldn't sufficiently |
---|
0:21:20 | fast to be some can be you or gpu accelerate arts |
---|
0:21:26 | in this case we need to compute the sum of probability or what for the |
---|
0:21:30 | past |
---|
0:21:31 | generally you know them us or a rose |
---|
0:21:35 | and the prior probability that is a sum well probabilities |
---|
0:21:39 | wow colour cost in order to buy greene story are hours |
---|
0:21:44 | since well as summation term be written as |
---|
0:21:50 | operations annex sifting and summation is done be efficiently implemented to be t b you |
---|
0:21:57 | for example |
---|
0:22:02 | i know i'll try to introduce encoder decoder neural networks enhanced with attention recognition |
---|
0:22:10 | c d c and r and d house i'm and variables to actually this size |
---|
0:22:15 | to encode out to be if that shouldn't be used for making prediction of the |
---|
0:22:19 | next token |
---|
0:22:21 | this kind of information is all formally five us attention |
---|
0:22:27 | if the point is about estimating we have to |
---|
0:22:31 | we've got |
---|
0:22:32 | we don't models of probabilities division one that times time varying we're directory that these |
---|
0:22:38 | where |
---|
0:22:40 | i is the timestamp we should regard for making prediction for ice world |
---|
0:22:47 | we can construct is by using softmax at a young with in that computed from |
---|
0:22:52 | the input see gen x on the previous two words why well do i minus |
---|
0:22:58 | one |
---|
0:23:01 | we combined attention probability into simpler are an n-best encode the and are in like |
---|
0:23:08 | decoder |
---|
0:23:10 | this is inspired neural networks defined |
---|
0:23:14 | that is |
---|
0:23:16 | we introduce addition one true |
---|
0:23:19 | task it's the information from or encode all the and the decoder thus there a |
---|
0:23:24 | state of the previous time stuff |
---|
0:23:28 | this internal computed a |
---|
0:23:31 | tension probability |
---|
0:23:33 | i mentioned before people given |
---|
0:23:37 | p o a |
---|
0:23:38 | given the context and go the outputs |
---|
0:23:42 | and in this module outputs a summary big summary bit the by comparing this expectation |
---|
0:23:49 | the addition probability introduce the here is typically defined |
---|
0:23:53 | by introducing a function that you believe then smudging score was similarity be doing decoder |
---|
0:23:59 | context information and the encoder output |
---|
0:24:03 | that is the t-norm as well as a here |
---|
0:24:08 | if you have this a be represented by in your pet |
---|
0:24:13 | all the components including composition of expectation one this probability distribution function can be optimized |
---|
0:24:20 | by us improve about repetition for minimizing cross entropy criterion |
---|
0:24:28 | compared to a rarity alignments here is internally represented in neural net where are energy |
---|
0:24:35 | handle it as a latent variable in likelihood function that is actually objective function to |
---|
0:24:42 | this is of course of attention right soft adaptation since we already used in court |
---|
0:24:47 | output via and expectation as a relative prediction is made after deciding feature quote unquote |
---|
0:24:54 | output to be used |
---|
0:24:57 | so foundation is better in terms of a simple still be implementation around the also |
---|
0:25:02 | optimisation |
---|
0:25:04 | and it's also vegas that it has no few |
---|
0:25:07 | it has only few wanting assumptions |
---|
0:25:11 | however a combat the identity it's harder to enforce monotonicity of alignment |
---|
0:25:18 | in speech recognition |
---|
0:25:20 | same as well and corresponding of acoustic features assumed to be in the same order |
---|
0:25:26 | we assumed that additional should be |
---|
0:25:29 | monotonic |
---|
0:25:31 | if we if we brought addition probability like this problem where y-axis is a tradition |
---|
0:25:37 | in the right of tokens each and x-axis is a rotational in the encoded feature |
---|
0:25:41 | sequence |
---|
0:25:43 | the most probably most probability mass should be on the diagonal region |
---|
0:25:50 | however us as soft adaptation is to flexible we sometimes see of diagonal peaks that |
---|
0:25:55 | these |
---|
0:25:57 | we decoding is more data for resolving such programs |
---|
0:26:06 | well known to work extension force of traditional roles itself attention on transform us |
---|
0:26:12 | okay jamaican can be viewed as a achieve area store where curry is computed from |
---|
0:26:17 | the decoder state and itchy and variance is |
---|
0:26:21 | but i are computed from the encoder output |
---|
0:26:25 | so far addition is an additional attention components are computed everything queries cheese and of |
---|
0:26:31 | various from the previously as output |
---|
0:26:34 | a frisbee speaking this corresponds to g attention to the input from as a time |
---|
0:26:40 | stamps |
---|
0:26:42 | and z is of great human to joe's adaptation is also computed based on the |
---|
0:26:47 | previously you out |
---|
0:26:50 | transform is a neural net component activities this separation the us multiple times to integrate |
---|
0:26:56 | information from |
---|
0:26:58 | in that i as the timestamps |
---|
0:27:01 | we just construct |
---|
0:27:02 | both encoder and decoder based on this transform or |
---|
0:27:07 | okay very transformers and nowadays used as a drawing you go is made of our |
---|
0:27:12 | antennas |
---|
0:27:14 | so we can use it for constructing acoustic model for almost a hybrid speech recognizers |
---|
0:27:19 | or region defined transform a transducer we have transform a is used is that all |
---|
0:27:24 | are in it or are introduced us |
---|
0:27:30 | the last section of this but is for introducing within the elements is in your |
---|
0:27:35 | speech recognition |
---|
0:27:36 | even so and the in speech recognition and its related technologies in disagreeing with it |
---|
0:27:42 | missed you how we have this element it is compared to the conventional speech recognizers |
---|
0:27:49 | i will focus on the united disadvantages |
---|
0:27:53 | the first one is that with the conventional system is very easy to integrate side |
---|
0:27:58 | information to bias the recognition result |
---|
0:28:03 | and the four and architecture is not trivial to do so |
---|
0:28:08 | the second point is that into and speech recognizers in general requires huge amount of |
---|
0:28:13 | training data to make it work |
---|
0:28:16 | so in this in a method to overcome data sparsity issue is also important |
---|
0:28:23 | the starting point is that in conventional system it's relatively easy to use compares it |
---|
0:28:29 | does such as text data or no transcribed audio data |
---|
0:28:34 | in this section i various miss some examples all studies |
---|
0:28:38 | for all welcoming joe's conditions |
---|
0:28:42 | possibly is about biasing results |
---|
0:28:45 | by things is particularly important for real applications |
---|
0:28:50 | speech recognition all used to find something in the database for example if we want |
---|
0:28:55 | to build a system to make a phone call |
---|
0:28:58 | speech recognizers shows a button name in the user's context are used |
---|
0:29:04 | same kind of behaviour is needed for various kinds of entire eighties |
---|
0:29:11 | like sometimes or what names |
---|
0:29:15 | in commissioner is biasing speech recognizer is very easy it can be done just by |
---|
0:29:20 | integrating additional language models that has enhanced |
---|
0:29:25 | probability for such but in cities |
---|
0:29:29 | well solution for this into and rows is introducing another addition we can see that |
---|
0:29:35 | focuses on |
---|
0:29:37 | predefined set or context vectors |
---|
0:29:41 | i we explain the middle of cortical texture us one text out this the utterance |
---|
0:29:46 | where |
---|
0:29:48 | in this method context for at such as a names or sometimes i encoded to |
---|
0:29:53 | single vector |
---|
0:29:55 | on the other jamaican detect pitch context of it does should be activated to the |
---|
0:30:00 | court to estimate the next word |
---|
0:30:04 | and just an example were normalization probabilities |
---|
0:30:08 | well as it out that |
---|
0:30:11 | talk to |
---|
0:30:14 | is addition we can start to think that some biasing for it is like but |
---|
0:30:17 | fruit are you all want to brew joe's actually corresponding to some names |
---|
0:30:25 | and this additional input vector representing context |
---|
0:30:28 | is expected to have the rest of the decoding process |
---|
0:30:32 | so after saying after the user saying talk to it is expected that some i |
---|
0:30:38 | can imbue for all |
---|
0:30:41 | and this context is attention mechanism can |
---|
0:30:46 | so we still behave via by a by adding additional probability to joe's a name |
---|
0:30:53 | context against us |
---|
0:31:00 | the next topic is about marriage would get a model for welcome data sparsity i |
---|
0:31:06 | will introduce a method proposed by d |
---|
0:31:10 | dismissal is simple |
---|
0:31:12 | that just a i-vector model vector representing dialect as an input |
---|
0:31:19 | and use that it does that constructed by pooling the data in all the dialects |
---|
0:31:25 | if we do have decided to dialect id in but consistent during training and decoding |
---|
0:31:31 | speech recognizer trained in this may cancel each some more |
---|
0:31:35 | depending on these input data is dialect |
---|
0:31:41 | is a multi rate |
---|
0:31:43 | from this role showing the base turns out it's |
---|
0:31:46 | we see that just training into in speech recognizer result in stairways mass there does |
---|
0:31:51 | it is not a good idea of the performance significantly worse in dialects with smaller |
---|
0:31:57 | datasets |
---|
0:32:02 | this will shows the result with transfer learning here transfer and you fast |
---|
0:32:07 | the miss out that fast that price training will include it does it |
---|
0:32:11 | and then applies the oppressed training on the matched to dataset |
---|
0:32:17 | transfer aligning thickening actually improve the result |
---|
0:32:21 | however we could all the dating further improvement just by integrating that is a dialect |
---|
0:32:26 | id in |
---|
0:32:29 | including the previous method i explained |
---|
0:32:33 | before contextual a s having additional method in time that is have people were coming |
---|
0:32:39 | knuckle dataset |
---|
0:32:42 | so sitting in your architecture that can probably handle such additional metadata in but is |
---|
0:32:47 | in the important nowadays |
---|
0:32:54 | the last the is about the musical on data |
---|
0:32:58 | as i have already mentioned and speech recognition "'cause" huge amount of training data |
---|
0:33:04 | and is even worse because it's not true or how to use their data |
---|
0:33:10 | conventional speech recognition can be found at least privilege test only data for language modeling |
---|
0:33:16 | and also it's relatively easy to use a possible by mit line in the top |
---|
0:33:21 | only one data |
---|
0:33:26 | overcoming these issues of the bubble retraining is no again getting four |
---|
0:33:33 | here |
---|
0:33:35 | we want to optimize encoder all speech recognizer only by using non transcribed data |
---|
0:33:41 | of course it is not possible to powerful cross entropy pruning or was the neighbours |
---|
0:33:46 | if we the if the data is not transcribed |
---|
0:33:50 | inspired by bottom involved in that are not image processing field |
---|
0:33:55 | within the missiles use richer information to be context information on the instantaneous information |
---|
0:34:03 | mutual information is engine there are very difficult to optimize but recent middle we are |
---|
0:34:08 | as it by drawing |
---|
0:34:10 | the missile correlates contrastive estimation |
---|
0:34:18 | in this i want to explain the famous network called we have developed to one |
---|
0:34:22 | or |
---|
0:34:23 | this is a diagram for the wave double two point one you're |
---|
0:34:28 | this mental is aiming a pre-training all she nn based in quarter by maximizing mutual |
---|
0:34:33 | information between input outputs |
---|
0:34:36 | and its surrounding context |
---|
0:34:41 | context surrounding context is actually summarized by a random transformers |
---|
0:34:48 | we baseline want to maximize |
---|
0:34:51 | in formulation of infancy we describe want to maximize similarity between projected in order out |
---|
0:34:56 | on context vector |
---|
0:34:59 | are we have a is not assumption if we only do that similarity between |
---|
0:35:04 | and what i'll put on qantas with the because |
---|
0:35:08 | the similarity becomes maximum maybe enhance the in order that matt all the data points |
---|
0:35:13 | into a single course of what's that one zero vector |
---|
0:35:19 | in fine is the introduces another somewhere here in all the all split from random |
---|
0:35:24 | times files |
---|
0:35:26 | and in fantasy tries to minimize similarity between context and random resampling in order |
---|
0:35:34 | so this famous so that we can maximise you know maybe doing contents untied in |
---|
0:35:41 | all the all but |
---|
0:35:42 | but |
---|
0:35:43 | it minimizes melodically with the mean the |
---|
0:35:47 | context and randomly sample in without |
---|
0:35:51 | we have the victim point well it's very famous because of its surprising performance speech |
---|
0:35:56 | recognition problems |
---|
0:35:58 | it is reporting that only few minutes of training data that is option for i |
---|
0:36:03 | mean and in speech recognizers if the encoder is trained with |
---|
0:36:08 | well that was fifty thousand hours of training it contrast them training |
---|
0:36:15 | so this amazon want right plots from training data is actually shows but it should |
---|
0:36:20 | be we have three year old a need combatted to utterance to data |
---|
0:36:31 | okay and you minimize for watching these but is it for my part |
---|
0:36:36 | then it will be the best you key and then this but we have you |
---|
0:36:40 | about |
---|
0:36:41 | software aspects or and in speech you've right |
---|
0:36:47 | probably rate on this is typically from google research that's okay so you implementation for |
---|
0:36:53 | a total and eurospeech question |
---|
0:36:57 | today for talk about the two kids well what you're in five minutes |
---|
0:37:03 | and then |
---|
0:37:04 | we will try pretty doing model was in the toolkit |
---|
0:37:08 | introducing the protection |
---|
0:37:10 | after that we'll trained you |
---|
0:37:13 | neural speech recognition to model from score parts and ten minutes |
---|
0:37:19 | so far and we are we show how to extend the money out and tasks |
---|
0:37:26 | introduced in your little section for example how to the sorry the transform of knowing |
---|
0:37:32 | state-of-the-art and or something like that |
---|
0:37:38 | so we'll forcible i'm sure that to locate all of you |
---|
0:37:43 | this table is |
---|
0:37:46 | introducing |
---|
0:37:48 | you mean magnets |
---|
0:37:49 | a c l paper |
---|
0:37:52 | this table briefly summarize the |
---|
0:37:55 | kind of comparison between the various to the kids |
---|
0:38:01 | in this table all the |
---|
0:38:03 | posted to okay supports the |
---|
0:38:06 | automatic speech recognition tasks |
---|
0:38:10 | and |
---|
0:38:11 | some of them |
---|
0:38:12 | also supports the |
---|
0:38:14 | different tasks like speech transformation on the central station |
---|
0:38:20 | and text-to-speech test |
---|
0:38:24 | and |
---|
0:38:25 | note that there is |
---|
0:38:29 | pre-trained models are available in several to get |
---|
0:38:34 | so |
---|
0:38:37 | in this tutorial |
---|
0:38:39 | we will focus on the svm |
---|
0:38:41 | because it's doubles many |
---|
0:38:43 | tasks |
---|
0:38:44 | for as the and two in modeling |
---|
0:38:47 | and also it's of boats to train the model |
---|
0:38:50 | so i think it is easy to |
---|
0:38:53 | try |
---|
0:38:58 | its implementation can is host it at peak at |
---|
0:39:03 | and if you want to know more digit result |
---|
0:39:07 | they are described in this paper this paper was |
---|
0:39:11 | no is a speech recognition and text to speech |
---|
0:39:15 | speech on section reports all the obvious on the part of the |
---|
0:39:19 | news speech enhancement |
---|
0:39:22 | feature we will be coming soon so respect that rate |
---|
0:39:28 | in this to treat you know |
---|
0:39:30 | we have try yes mean of two |
---|
0:39:34 | it is kind of major update from the yes one okay there is |
---|
0:39:40 | so there are differences |
---|
0:39:42 | in the between them but measure origins |
---|
0:39:45 | for example |
---|
0:39:46 | if you using it is to depends upon any primaries for example county is to |
---|
0:39:53 | get sent a however |
---|
0:39:56 | used to taste minimalist approach |
---|
0:39:59 | it mainly depends on title ish and it all from we can use integrate it |
---|
0:40:05 | scully |
---|
0:40:08 | and the world model |
---|
0:40:10 | almost same |
---|
0:40:12 | especially tts models more used in a long |
---|
0:40:17 | and however this tuple the task is |
---|
0:40:21 | kindly well in progress |
---|
0:40:23 | however |
---|
0:40:25 | this meant to all visible once it's all so it is nice to try if |
---|
0:40:30 | you're interested in itself on tts |
---|
0:40:32 | and also speech enhancement previous |
---|
0:40:36 | if you into a sitting yes one please visit this you all out |
---|
0:40:42 | it is show you the usage of the use of long |
---|
0:40:46 | there was to use the speech tutorial |
---|
0:40:52 | and this tutorial have long ago example posted not go crawl |
---|
0:40:59 | good across from a base |
---|
0:41:01 | pricing to print the in a web page |
---|
0:41:06 | and you just can't to also samples to a after a to so |
---|
0:41:11 | but is make sure that you are using could one time in court probable |
---|
0:41:16 | by this thing |
---|
0:41:17 | when you visit this very page because the one of them called |
---|
0:41:23 | we used in this tutorial |
---|
0:41:27 | this just the introduces |
---|
0:41:30 | pre-training model |
---|
0:41:31 | that means |
---|
0:41:33 | the models or really train by |
---|
0:41:37 | one on and some tasks and dataset |
---|
0:41:43 | yes in until |
---|
0:41:45 | the such and models |
---|
0:41:48 | in |
---|
0:41:49 | yes peanut models to report three |
---|
0:41:52 | and hosted that senator |
---|
0:41:55 | for example thing as all task there is |
---|
0:41:59 | they're already speech and a mistake for english speech recognition |
---|
0:42:04 | and t s j for japanese |
---|
0:42:07 | so a score young for very on and so long |
---|
0:42:11 | and tts have |
---|
0:42:13 | also have already model |
---|
0:42:16 | there |
---|
0:42:17 | if we wanna |
---|
0:42:18 | see the fruitless angle of the a novel |
---|
0:42:22 | pretty c this you although |
---|
0:42:27 | this |
---|
0:42:29 | cindy s shows the how to use that |
---|
0:42:32 | in python |
---|
0:42:35 | for two right so |
---|
0:42:37 | we have performed the |
---|
0:42:39 | not controlled |
---|
0:42:40 | to ignore the checkpoint for channel though |
---|
0:42:43 | and i'm fact that to do this model object |
---|
0:42:47 | after that |
---|
0:42:49 | well you can believe that |
---|
0:42:52 | some we wait for on |
---|
0:42:54 | in you will call environment and its transcribed the result |
---|
0:43:00 | to do this results |
---|
0:43:02 | now so rats |
---|
0:43:05 | get started and crawl |
---|
0:43:09 | so basically the you all out in the page eight |
---|
0:43:14 | you will find |
---|
0:43:17 | e |
---|
0:43:18 | note of it |
---|
0:43:19 | like this |
---|
0:43:21 | therefore trying we will |
---|
0:43:24 | in still use |
---|
0:43:28 | and before |
---|
0:43:30 | running at feast make sure you all collecting |
---|
0:43:34 | the i could a long time |
---|
0:43:37 | it is |
---|
0:43:38 | available |
---|
0:43:40 | on |
---|
0:43:42 | right corner |
---|
0:43:44 | and priest select the change runtime five and |
---|
0:43:51 | check the gpu we selected |
---|
0:43:55 | note that the u is not |
---|
0:43:57 | what it and see if you might be |
---|
0:44:01 | so you want the training |
---|
0:44:04 | so forth trying to instill using it because it is not default to install just |
---|
0:44:10 | be at u |
---|
0:44:11 | in a single core |
---|
0:44:14 | so i can see if you press how many dependencies |
---|
0:44:19 | because you can still for both you can't one used to |
---|
0:44:26 | so or s |
---|
0:44:28 | provide a pre-training model |
---|
0:44:32 | so |
---|
0:44:33 | first |
---|
0:44:35 | i downloaded the waveform file for them |
---|
0:44:40 | i resist this dataset |
---|
0:44:43 | and i try to |
---|
0:44:46 | for phone to is not all |
---|
0:44:49 | on the |
---|
0:44:50 | downloaded waving |
---|
0:44:53 | so that before this |
---|
0:44:57 | forced to right |
---|
0:44:58 | you download that a pre-training what we'll |
---|
0:45:02 | for example this mateo is trained by stingy button okay |
---|
0:45:07 | using the unlabeled speech |
---|
0:45:09 | yes all task |
---|
0:45:10 | and he seems to you is to transform a picture |
---|
0:45:15 | for neural networks |
---|
0:45:19 | and |
---|
0:45:22 | i think roll the waveform here and feed into a more below albeit |
---|
0:45:30 | and that's right there |
---|
0:45:32 | i'll but is a and the best result so well i selected the best one |
---|
0:45:38 | to see how it looks like |
---|
0:45:40 | so this is the result probably read speech model |
---|
0:45:44 | sound check the |
---|
0:45:46 | but the wave onset separation just starts |
---|
0:45:51 | since i |
---|
0:45:58 | false pretty well |
---|
0:46:00 | so let's go back to the slide |
---|
0:46:07 | just after we're show you how to use so for the wrong defined tasks |
---|
0:46:13 | testing it directory your is the x two |
---|
0:46:17 | that contains a although it so that sets inside that |
---|
0:46:21 | and you five and the static content with same fires on directories |
---|
0:46:27 | right column the yes onto |
---|
0:46:30 | you basically you're on this says created from the cell |
---|
0:46:36 | you produce the results reported in this really mean file |
---|
0:46:41 | so i'll show you do well |
---|
0:46:45 | kind of stage is inside the we want to sell you can start point two |
---|
0:46:50 | stages or of people |
---|
0:46:51 | but in the us stages |
---|
0:46:54 | a score |
---|
0:46:56 | specify the command drawing for box |
---|
0:47:00 | one to five state it is |
---|
0:47:04 | perform data preparation and six to eight for as long as model training and ninety |
---|
0:47:10 | temporal bones is all training and after that the sre variation be performed |
---|
0:47:16 | and very you got brought to you or entering the more used to put into |
---|
0:47:23 | so that's need of it is of the data preparation stages |
---|
0:47:27 | in just a very all we're focus on and four task |
---|
0:47:31 | that is very small right images nice to come from i |
---|
0:47:36 | for fast experiment |
---|
0:47:38 | and the for a very fast daisy performance the positive and then data before reaching |
---|
0:47:43 | utilize it is the task and then fires at the other everything and four but |
---|
0:47:49 | it into the cup of these style that and after that we performed some preprocessing |
---|
0:47:56 | the speech and text is it |
---|
0:47:58 | as |
---|
0:48:00 | value that was set in |
---|
0:48:02 | the case so i a we use the you dior sentence please a lot of |
---|
0:48:09 | to the text representation |
---|
0:48:13 | so that representation we used in the training and evaluation stages |
---|
0:48:18 | the six to a stages we performed a long as model training and intermediate a |
---|
0:48:25 | very efficient like a public key and after that the itself training and decoding and |
---|
0:48:31 | evaluation is performed |
---|
0:48:34 | so you can |
---|
0:48:35 | one of the training |
---|
0:48:39 | the board using the purpose of or even go control it is okay |
---|
0:48:44 | so you can monitor the log of the of the softmax output over a wide |
---|
0:48:49 | or something alright though the c g c out of it can be morning to |
---|
0:48:52 | during training |
---|
0:48:55 | and this is a example the is it or you corporations scoring results looks like |
---|
0:49:01 | these s a wide full |
---|
0:49:06 | very efficient tool and reformatting results with the amount that because it it's more readable |
---|
0:49:12 | and as you can see here for each opposable error rate and also something that |
---|
0:49:18 | both the cup of the right was talking or rate |
---|
0:49:22 | and finally we have had to train the model and you can use to exactly |
---|
0:49:26 | saying you maybe i'd draw it is out inference you think more using a v |
---|
0:49:33 | i |
---|
0:49:33 | like okay i'm in the results in the beginning |
---|
0:49:38 | if you specified |
---|
0:49:40 | but kind of confusion six point two you use |
---|
0:49:43 | so now it's got to the court |
---|
0:49:48 | no way not to the controller |
---|
0:49:52 | so |
---|
0:49:53 | let's see the how example two directives like |
---|
0:50:00 | okay you can |
---|
0:50:02 | used |
---|
0:50:02 | come on the right thing |
---|
0:50:06 | like usual not work and you can also use the file explore from |
---|
0:50:13 | this icon |
---|
0:50:15 | and you define a many |
---|
0:50:17 | datasets are available on the is to and |
---|
0:50:22 | in this study we focus on and for all and decision is all one task |
---|
0:50:28 | and |
---|
0:50:30 | for now we are on london style |
---|
0:50:33 | in just |
---|
0:50:34 | israel |
---|
0:50:36 | so |
---|
0:50:37 | before are running the associated to any more but dependencies |
---|
0:50:44 | two one training |
---|
0:50:46 | a carry enough always |
---|
0:50:48 | you quiet currently unfortunately so we needed are all the pretty complies |
---|
0:50:55 | to use and also we need from |
---|
0:50:58 | binary whom have to get and after install everything you're the you're on the on |
---|
0:51:04 | to sell |
---|
0:51:06 | here |
---|
0:51:08 | so |
---|
0:51:10 | for star |
---|
0:51:11 | the |
---|
0:51:12 | that is you down and of all four from cmu store because it is really |
---|
0:51:19 | of a novel so after the enrollment is substituted the data preparation movie again |
---|
0:51:27 | and |
---|
0:51:28 | you can see here there is a menu will finds the and data training is |
---|
0:51:34 | performed and the state five spoken addition |
---|
0:51:39 | or text that cystic cooperation really figure ten |
---|
0:51:44 | and this five results of this from the set s ps |
---|
0:51:49 | so yes |
---|
0:51:51 | and a for a few used to the sentence piece as a focalization |
---|
0:51:56 | and after the center this |
---|
0:51:59 | training is finished |
---|
0:52:00 | the target money would be retrained |
---|
0:52:03 | let's see here |
---|
0:52:04 | and after that the sound training here starting |
---|
0:52:08 | however i drafted to use of training because it if you're wrong |
---|
0:52:14 | i finished this |
---|
0:52:16 | training and ten minutes and i think it is reasonable |
---|
0:52:22 | so let's see how the video data looks like the but that is distorting down |
---|
0:52:29 | and we can find some |
---|
0:52:32 | we have a |
---|
0:52:33 | prepared it down here like a this is the internet or out with the text |
---|
0:52:39 | file is here the fast |
---|
0:52:42 | and three shows that false id and you will find |
---|
0:52:47 | you find the corresponding speech in this while the nist p five p so if |
---|
0:52:53 | you for ages from by you hear yes i is that we have some t |
---|
0:52:58 | a |
---|
0:53:00 | so after the |
---|
0:53:02 | training was going to train the |
---|
0:53:07 | speed you have to be used as the |
---|
0:53:10 | blocking dft of the training phase and screen |
---|
0:53:13 | it will store many things for example pickle |
---|
0:53:18 | five some of detector the checkpoint |
---|
0:53:20 | here |
---|
0:53:21 | and also attention is wrong addition we have used to a here and |
---|
0:53:26 | configuration can be accommodated according to the animal |
---|
0:53:30 | and let's see you how the location on the looks like |
---|
0:53:34 | so |
---|
0:53:36 | configuration you are provide everything |
---|
0:53:39 | every information in during training |
---|
0:53:43 | so here is kind of |
---|
0:53:46 | result of the cup operations so for example you to use is five point zero |
---|
0:53:51 | is that we probably integrating into a non party and |
---|
0:53:56 | you to use is this kind of like this result |
---|
0:54:01 | usually it's like the binary to use this in this piece |
---|
0:54:06 | and used our in an that's dying graph structure |
---|
0:54:10 | okay |
---|
0:54:11 | and |
---|
0:54:13 | during training you can or someone that the pencil or |
---|
0:54:16 | inside a good record |
---|
0:54:18 | or you are or environment |
---|
0:54:20 | and you go far in your exact after operatable it is so severe and achieves |
---|
0:54:27 | icsi |
---|
0:54:29 | parsons partition |
---|
0:54:32 | then right |
---|
0:54:33 | it is the output |
---|
0:54:35 | so that it is only nice so yes see the other information so this is |
---|
0:54:42 | there some visualization is i x d dft |
---|
0:54:46 | as seen the voice very short utterance so that i and does not |
---|
0:54:52 | we really five |
---|
0:54:53 | additional dirichlet allocation right and i think |
---|
0:54:57 | it's okay |
---|
0:55:00 | so there is a variation result |
---|
0:55:03 | and |
---|
0:55:04 | but i said |
---|
0:55:05 | the last it is for more details on down so i just pasted to the |
---|
0:55:10 | e |
---|
0:55:11 | not sell and |
---|
0:55:14 | you just here the final result of the well there are and it's starting point |
---|
0:55:18 | five in the test that i think |
---|
0:55:21 | i mean i soundtrack the right is sixty four point nine and but can write |
---|
0:55:26 | the six point five |
---|
0:55:28 | okay so that this |
---|
0:55:31 | you lose the |
---|
0:55:34 | so one at this for infants at i |
---|
0:55:38 | so |
---|
0:55:39 | first of all we need to specify the fits point to use i document to |
---|
0:55:45 | use this |
---|
0:55:46 | very dark this because at it seems to be best |
---|
0:55:50 | two point eight or so |
---|
0:55:52 | we then use that the result really |
---|
0:55:56 | according to have the same as the speech but it looks |
---|
0:56:00 | more than seriously speaker that is it is more |
---|
0:56:04 | okay so |
---|
0:56:06 | thanks for putting the stuff there |
---|
0:56:10 | this that stuff there will explain how to extend models and pat task |
---|
0:56:16 | so that's |
---|
0:56:18 | the total section in |
---|
0:56:20 | he interest to |
---|
0:56:22 | and cortical architecture and transparent and our transducer |
---|
0:56:27 | when you have regression |
---|
0:56:28 | there are they how to use that |
---|
0:56:31 | it's |
---|
0:56:33 | this is the answer |
---|
0:56:35 | sometimes |
---|
0:56:36 | like i and four task deftly already says of the predefined |
---|
0:56:42 | plot configuration younger fought so you can just |
---|
0:56:47 | that's fine why is a coefficient and take a look at that you going and |
---|
0:56:52 | there are none of the values of a number of the units |
---|
0:56:56 | inside younger five |
---|
0:56:59 | i think it mostly goal of this fine is that yes it has test trying |
---|
0:57:03 | many things like activation or |
---|
0:57:06 | where tight so |
---|
0:57:08 | make things like that |
---|
0:57:11 | however if you |
---|
0:57:13 | down some find that you can extend multi i think i said |
---|
0:57:18 | for example |
---|
0:57:20 | the |
---|
0:57:22 | or and then what or transducer encoder and decoder but works in a men's these |
---|
0:57:27 | interfaces |
---|
0:57:29 | to ease the swat four |
---|
0:57:33 | have keep the complexity between those variants for brain |
---|
0:57:39 | implementation |
---|
0:57:41 | so |
---|
0:57:43 | this |
---|
0:57:44 | and e s |
---|
0:57:46 | other so i used to model |
---|
0:57:48 | for vice versa yes feature plus four plus a |
---|
0:57:53 | these two |
---|
0:57:55 | others and go the invitation |
---|
0:57:58 | for passing the encoder speakers and text input on the targets to i'll stick to |
---|
0:58:04 | the |
---|
0:58:05 | something like |
---|
0:58:06 | as |
---|
0:58:08 | explaining that |
---|
0:58:09 | you're though it's |
---|
0:58:11 | figure |
---|
0:58:14 | and you can use that phone come on the right argument if you this is |
---|
0:58:19 | just a you implementation in this |
---|
0:58:23 | so score |
---|
0:58:27 | and |
---|
0:58:29 | if you want to send your task like you wanna |
---|
0:58:35 | try sub tasks you on the is that are it is well for possible |
---|
0:58:41 | then you extend that i was task |
---|
0:58:44 | so existing asr was tedious task implements this |
---|
0:58:49 | that is |
---|
0:58:50 | and |
---|
0:58:52 | to get the this |
---|
0:58:55 | task i don't think feature |
---|
0:58:57 | like a distributed training on divan sampling but checkpoint rejoining like that |
---|
0:59:04 | as the was gonna section five we show you how used in payments |
---|
0:59:10 | that |
---|
0:59:12 | models |
---|
0:59:13 | so that is it yes did have rivets e |
---|
0:59:17 | and that and check the yes to implementation and |
---|
0:59:26 | okay |
---|
0:59:27 | the out into for some so |
---|
0:59:30 | and there is |
---|
0:59:32 | model definition here |
---|
0:59:35 | so as i said in the us by a base |
---|
0:59:41 | it implements have a sort the svm modeling the phase here |
---|
0:59:46 | and actually simply call use the board mess of |
---|
0:59:52 | the read and the most value is |
---|
0:59:54 | so received a for the nist |
---|
0:59:56 | it's here |
---|
0:59:58 | so increase to use this be used in baton text output as seen that argument |
---|
1:00:04 | and then it we kinda rate and was fine tuning full |
---|
1:00:09 | euros the angle tunnels |
---|
1:00:12 | so well that's in there |
---|
1:00:15 | the first thing go the network coding rates the without the input of the think |
---|
1:00:21 | of the networks |
---|
1:00:22 | still this angle regularization and |
---|
1:00:25 | well you see that output and it and |
---|
1:00:30 | this is the outfit a within good as input and |
---|
1:00:33 | they're pretty they're |
---|
1:00:36 | text target |
---|
1:00:37 | and calculated function here and the same their thing having in thus it is inference |
---|
1:00:44 | so this is exactly same impotent target as well as the political there that those |
---|
1:00:51 | are anti do the same thing |
---|
1:00:54 | yes exactly same arguments |
---|
1:00:57 | and then combine |
---|
1:01:01 | thus values i-th honours the scrolling nazi it's quite easy and |
---|
1:01:06 | same as the so we into using their you know section |
---|
1:01:11 | so |
---|
1:01:12 | thanks so or watching this |
---|