0:00:15 | right |
---|
0:00:16 | is my great writer |
---|
0:00:18 | two presents right after the two that paper nominees |
---|
0:00:23 | so i hope you also you also like this talk |
---|
0:00:27 | alright so |
---|
0:00:29 | a this work is about |
---|
0:00:31 | trans online spoken language understanding and the language modeling |
---|
0:00:35 | these recurrent neural networks |
---|
0:00:37 | my name is being real |
---|
0:00:38 | this is the work with my otherwise are provided in |
---|
0:00:42 | we are from carnegie mellon university |
---|
0:00:46 | but this is not always while the talk |
---|
0:00:48 | first of you introduce the background and the motivation of our work |
---|
0:00:52 | volume by that's are we will explain in detail our proposed method |
---|
0:00:57 | and then comes the experiment setup and the without analysis and finally |
---|
0:01:03 | conclusions will be people |
---|
0:01:06 | first the background |
---|
0:01:09 | spoken language understanding is one of the important components in spoken dialogue systems |
---|
0:01:15 | in slu |
---|
0:01:16 | two major tasks |
---|
0:01:18 | intense detection and slot filling |
---|
0:01:20 | even though user query we want slu system to identify the user's intent |
---|
0:01:26 | and also to extract |
---|
0:01:28 | useful semantic constitutions from the user query |
---|
0:01:32 | a given the |
---|
0:01:33 | example query like |
---|
0:01:35 | based show me the flights from seattle to stanley accords model |
---|
0:01:38 | we want the as a whole system |
---|
0:01:41 | to identify that |
---|
0:01:43 | the user is looking for flight information that is the intent |
---|
0:01:47 | and so we also want to |
---|
0:01:49 | extract useful information such as if one location |
---|
0:01:53 | it to location |
---|
0:01:54 | and the departure time p g's the task force one feeling |
---|
0:02:00 | intent detection |
---|
0:02:02 | can be treated as a sequence classification problem |
---|
0:02:05 | so standard of classifiers |
---|
0:02:07 | like |
---|
0:02:08 | a support vector machines with n-gram features |
---|
0:02:11 | or convolution on your network |
---|
0:02:12 | recursive neural networks can be applied |
---|
0:02:16 | on the other hand slot filling |
---|
0:02:19 | can be treated as a sequence labeling problems |
---|
0:02:21 | so sequence models like maximum entropy markov model |
---|
0:02:26 | conditional random fields |
---|
0:02:27 | and recurrent neural networks |
---|
0:02:29 | a good candidates for sequence labeling |
---|
0:02:34 | intended detection small feeding are typically processed separately |
---|
0:02:38 | in spoken language understanding systems |
---|
0:02:41 | i joint model |
---|
0:02:42 | that it can perform the two task |
---|
0:02:44 | at the same time simplifies |
---|
0:02:46 | the slu system |
---|
0:02:48 | as only one model needs to be trained and function |
---|
0:02:52 | also |
---|
0:02:53 | i training |
---|
0:02:55 | two related the task together |
---|
0:02:57 | is it is likely that |
---|
0:02:59 | we can improve the generalization performance of a task |
---|
0:03:02 | using the other related the task |
---|
0:03:05 | trance model for slot filling and the intended detection have been proposed in literature |
---|
0:03:10 | using convolutional neural networks |
---|
0:03:12 | and the recursive neural networks |
---|
0:03:17 | the limitations of deep repairs proposed as so you're models |
---|
0:03:22 | is that's this model typically |
---|
0:03:24 | condition the a the output of this model typically conditioned |
---|
0:03:29 | on the entire word sequence |
---|
0:03:31 | which makes those model not very suitable for online tasks |
---|
0:03:35 | for example in speech recognition |
---|
0:03:37 | instead of receiving the be transcript taxed |
---|
0:03:40 | at the end of the speech |
---|
0:03:42 | you'd are typically prefer to see the ongoing from transcription |
---|
0:03:45 | well the user speaks |
---|
0:03:47 | similarly in spoken language understanding |
---|
0:03:50 | wrist real-time intent detection and slot filling |
---|
0:03:53 | the constraint system will be able to perform press one enquiry |
---|
0:03:57 | well the user can take it |
---|
0:04:01 | so in this work |
---|
0:04:02 | we want to develop a model that can perform online spoken language understanding |
---|
0:04:08 | as the new word arrives from the asr in g |
---|
0:04:12 | more |
---|
0:04:13 | we suggest that |
---|
0:04:15 | the slu without |
---|
0:04:16 | can provide additional context for the next word prediction |
---|
0:04:20 | in the asr on and decoding |
---|
0:04:24 | so we want to build a model that can perform on the slu |
---|
0:04:28 | and language modeling jointly |
---|
0:04:33 | here is a simple visualization of our proposed idea |
---|
0:04:37 | so given a user query like first got i want a first class flights from |
---|
0:04:41 | phoenix to seattle |
---|
0:04:43 | and we push describe me to asr engine on a decoding |
---|
0:04:48 | we use the arrival of the first few |
---|
0:04:50 | words |
---|
0:04:51 | our intent model |
---|
0:04:53 | based on these available information |
---|
0:04:55 | or why the estimation of the user intent |
---|
0:04:58 | and |
---|
0:04:59 | the |
---|
0:05:00 | intent model gives very high confidence score |
---|
0:05:03 | on a |
---|
0:05:04 | the intent class i have fair and the lower |
---|
0:05:07 | confidence score for the other content copies |
---|
0:05:10 | confusion and conditional this intent estimation |
---|
0:05:14 | p language model |
---|
0:05:15 | i just use next word |
---|
0:05:17 | prediction probabilities |
---|
0:05:19 | so here we see that |
---|
0:05:21 | the next the probability for price being the next word is pretty high because |
---|
0:05:26 | twice |
---|
0:05:27 | he's closely related |
---|
0:05:29 | these the intents of i are fair |
---|
0:05:32 | then we start with a rival of another word flight from the asr engine |
---|
0:05:37 | the intent model update is intent estimation |
---|
0:05:41 | and increased |
---|
0:05:43 | the confidence score for instance cost flight |
---|
0:05:45 | and |
---|
0:05:47 | reduce the |
---|
0:05:49 | confidence score for alpha |
---|
0:05:51 | accordingly |
---|
0:05:52 | the language model |
---|
0:05:54 | i just ease |
---|
0:05:56 | next word probability next word prediction probabilities |
---|
0:06:00 | so here |
---|
0:06:01 | the location related words such as pittsburgh and phoenix |
---|
0:06:06 | received higher probability |
---|
0:06:07 | and the price the probability of a price |
---|
0:06:10 | is reduced |
---|
0:06:13 | and diffuse |
---|
0:06:14 | additional input from the |
---|
0:06:16 | asr |
---|
0:06:17 | all words |
---|
0:06:19 | our intent model becomes more confidence that's what the user is looking for use the |
---|
0:06:24 | flight information |
---|
0:06:25 | and accordingly the language model |
---|
0:06:27 | i just the next word probability |
---|
0:06:30 | a piece the a conditioned on the intent estimation |
---|
0:06:35 | and |
---|
0:06:36 | in two we compute the processing |
---|
0:06:39 | of the entire the car |
---|
0:06:41 | note this is not be realization of our |
---|
0:06:45 | proposed idea afford run online spoken language and the spoken language understanding and the language |
---|
0:06:50 | modeling |
---|
0:06:52 | okay next |
---|
0:06:53 | our proposed method |
---|
0:06:57 | okay here on the rnn |
---|
0:07:00 | recurrent neural net models |
---|
0:07:01 | for the three different tasks |
---|
0:07:03 | that's we want to model in or walk us a bit is we i believe |
---|
0:07:08 | these three models are very familiar to most of last the first one is the |
---|
0:07:12 | standard recurrent |
---|
0:07:14 | you know network language model |
---|
0:07:16 | the second one is the are the model for intent detection |
---|
0:07:20 | so |
---|
0:07:20 | the last hidden state output |
---|
0:07:23 | is used to produce the intent estimation |
---|
0:07:27 | and the third model used recurrent neural network for slot filling |
---|
0:07:31 | here different from the or in language model |
---|
0:07:34 | the |
---|
0:07:36 | the are the output is connected act of the hidden state so that's the slot |
---|
0:07:41 | label dependencies can also be modeled |
---|
0:07:44 | in the u d u r n |
---|
0:07:48 | and here is our proposed joint model |
---|
0:07:52 | so similar to the are independent rainy models input to the models |
---|
0:07:56 | are the board in the u r in the given utterance |
---|
0:08:01 | see most okay |
---|
0:08:02 | so we have the word is included |
---|
0:08:05 | and the hidden layer all boards is used for the three different tasks |
---|
0:08:10 | so here cd represents the intent costs |
---|
0:08:12 | s represent the small label |
---|
0:08:14 | and |
---|
0:08:15 | w represents the next word |
---|
0:08:17 | so the output from the r and he the state is used use prosody to |
---|
0:08:22 | used to generate |
---|
0:08:24 | the |
---|
0:08:24 | intent estimation |
---|
0:08:26 | once we obtained the intense |
---|
0:08:29 | uhuh intend the class probability distribution we draw a sample from these probability distribution |
---|
0:08:34 | as the |
---|
0:08:36 | as here at that some point in the cost |
---|
0:08:39 | similarly what do the same thing for slate slot label |
---|
0:08:42 | once we have to these two vector we cascade these two actor into a single |
---|
0:08:46 | one |
---|
0:08:47 | and use these i-th the complex vector |
---|
0:08:49 | to the next word prediction |
---|
0:08:51 | also we connect these contact vector |
---|
0:08:54 | back |
---|
0:08:55 | to the are and he the state |
---|
0:08:57 | such that the intense variations on the sequence |
---|
0:09:01 | as well as the small label dependencies can be modeled |
---|
0:09:05 | you are in the recurrent neural network |
---|
0:09:09 | well basically |
---|
0:09:10 | the task all code |
---|
0:09:12 | at each time-step depends on the task all posts from previous time steps |
---|
0:09:16 | so by using the chain rule the three |
---|
0:09:19 | models intense love reading and language model can be off vectorized accordingly |
---|
0:09:26 | a closer look at our model |
---|
0:09:29 | at each time-step words in goes into the art in the state |
---|
0:09:33 | and |
---|
0:09:33 | the input to the hidden states |
---|
0:09:36 | are the he the states from the previous time step |
---|
0:09:40 | the intended task strong labels from the previous times that |
---|
0:09:44 | and they were input from the current time step |
---|
0:09:47 | and |
---|
0:09:48 | once we have these are instead of word |
---|
0:09:50 | we perform |
---|
0:09:52 | intent classification |
---|
0:09:53 | slot-filling and next word probably next word prediction |
---|
0:09:57 | in the sequence |
---|
0:09:59 | so here |
---|
0:10:00 | these intent distributions for label distribution and what its fusion |
---|
0:10:04 | represents the |
---|
0:10:05 | multilayer perceptual for each of the different task |
---|
0:10:09 | the reason why we applied |
---|
0:10:10 | multilayer perceptron for each task is because |
---|
0:10:14 | using a shared a representation |
---|
0:10:16 | which is the r and he the state a good for the street different tasks |
---|
0:10:21 | you order to improve on the other two |
---|
0:10:24 | introduce additional discriminative hours |
---|
0:10:27 | for the joint model |
---|
0:10:28 | we used a multilayer perceptron |
---|
0:10:31 | given a multilayer perceptron for each task |
---|
0:10:33 | instead of using simple linear transformation |
---|
0:10:40 | "'kay" this one is about model training |
---|
0:10:44 | is what we have seen so what we do use we |
---|
0:10:48 | model the three different tasks jointly |
---|
0:10:50 | so |
---|
0:10:52 | doing model training the anywhere from the street given tasks |
---|
0:10:55 | all probably are propagated |
---|
0:10:57 | to the beginning of the input sequence |
---|
0:11:00 | and we perform a linear interpolation of the cost for each task |
---|
0:11:04 | so as |
---|
0:11:06 | in this object a function |
---|
0:11:08 | we can see that's we interpolate |
---|
0:11:10 | the cost from the intent classification |
---|
0:11:14 | from smart meeting and the language modeling linearly |
---|
0:11:17 | and but addition be at one l two recommendations |
---|
0:11:23 | to this object to this objective function |
---|
0:11:28 | as we have no to used in the previous example |
---|
0:11:32 | the intent estimation at the beginning of the sequence |
---|
0:11:36 | may not be very stable anchor eight |
---|
0:11:39 | so the confusion on |
---|
0:11:41 | so when we do next word prediction |
---|
0:11:43 | conditioning on the wrong intent cost |
---|
0:11:46 | may not be desirable |
---|
0:11:47 | to me to get easy fact |
---|
0:11:50 | we proposed a schedule approach |
---|
0:11:52 | in adjusting be intense contribution to the context |
---|
0:11:57 | so to be specific |
---|
0:11:58 | doing the first case that |
---|
0:12:01 | we disabled |
---|
0:12:02 | we disable the intent contribution to the contacts vector |
---|
0:12:06 | entirety |
---|
0:12:07 | and after the case that |
---|
0:12:09 | we gradually |
---|
0:12:10 | increase |
---|
0:12:11 | the intent contribution to the contacts vector |
---|
0:12:15 | and you the end of the sequence |
---|
0:12:17 | so here we |
---|
0:12:19 | propose just to use the linear you chris function of the case that and other |
---|
0:12:22 | type of increasing functions like lock functions for the number functions can also be explored |
---|
0:12:31 | okay so these are some model variations of the speech on the model that we |
---|
0:12:36 | introduce just no |
---|
0:12:39 | the first one is what we call it |
---|
0:12:40 | the basic at one the model |
---|
0:12:42 | so here |
---|
0:12:44 | the same a shared representation from the art and hidden state |
---|
0:12:48 | is used for the three different tasks |
---|
0:12:50 | and there's no conditional dependencies |
---|
0:12:54 | among these three different tasks so this is what we caught the basic at run |
---|
0:12:57 | the model |
---|
0:12:58 | the second one |
---|
0:13:01 | once we produced the |
---|
0:13:03 | intense estimation |
---|
0:13:04 | the intent sample is connected |
---|
0:13:07 | locally |
---|
0:13:08 | to the next word prediction |
---|
0:13:10 | without cost connecting these one back to the artist eight |
---|
0:13:14 | so what we call these all we call this model |
---|
0:13:16 | s |
---|
0:13:17 | model these local context |
---|
0:13:19 | the third one |
---|
0:13:21 | this |
---|
0:13:22 | a context like to is not connected to the local that squared prediction |
---|
0:13:26 | is that it's connect directly is connect back to the art and he the state |
---|
0:13:30 | so we call this model |
---|
0:13:32 | the model this recurrence context |
---|
0:13:35 | it last variation |
---|
0:13:37 | is the one piece also local and recurrent context |
---|
0:13:40 | and this is the thing model |
---|
0:13:41 | as well to be seen just no |
---|
0:13:46 | okay next one some experiments that have and without |
---|
0:13:52 | so in the experiments the data that that'll be used |
---|
0:13:54 | is the airline travel information system dataset and in this dataset in total we have |
---|
0:13:59 | eighteen intent classes and a hundred and the twenty seven slot labels |
---|
0:14:04 | for intense detection we evaluated |
---|
0:14:08 | we intend model on classification intent classification error rate for small fading |
---|
0:14:12 | but you evaluated i've a score |
---|
0:14:16 | the details about our are in model |
---|
0:14:20 | configurations |
---|
0:14:21 | we use lstm cells as the basic rnns you need voice |
---|
0:14:25 | stronger capability in term of modeling longer-term dependencies |
---|
0:14:29 | we perform in a batch training using adam of optimisation method |
---|
0:14:33 | and to improve the generalization k o all we're of the proposed model |
---|
0:14:38 | we use drop out and out to regular stations |
---|
0:14:43 | in order to |
---|
0:14:45 | to evaluate the robustness of our proposed model |
---|
0:14:49 | we not only experiment these the true text input |
---|
0:14:53 | also please |
---|
0:14:54 | noisy speech input |
---|
0:14:55 | so |
---|
0:14:58 | so |
---|
0:14:59 | we use this to have of improved and these are some details in |
---|
0:15:03 | our the si model setting which we will see |
---|
0:15:06 | no well |
---|
0:15:08 | basically in these experiments we report performance |
---|
0:15:12 | using these two type of include the true text input and the speech input be |
---|
0:15:16 | simulated noise |
---|
0:15:18 | compare the performance of five different type of models |
---|
0:15:22 | on these three different tasks |
---|
0:15:24 | the intent caught the intent detection slot filling and the language modeling |
---|
0:15:31 | and |
---|
0:15:32 | here is the |
---|
0:15:34 | in change detection performance |
---|
0:15:37 | using true text input |
---|
0:15:40 | the fine models from left to right |
---|
0:15:42 | a the independence training models for a intended detection the basic it on the models |
---|
0:15:48 | as will be seen just now in the in the model variations |
---|
0:15:52 | the third one is the joint one of these intent context |
---|
0:15:56 | force one is the joint model this marker label context |
---|
0:15:59 | and the last one is the current model |
---|
0:16:02 | this also type of context |
---|
0:16:04 | so as we can see that joint model of east coast type |
---|
0:16:08 | context |
---|
0:16:09 | performed the best and eats achieves twenty six point three percent relative error reduction |
---|
0:16:16 | or where the independent training intent models |
---|
0:16:18 | so |
---|
0:16:21 | of this what is the slot filling performance |
---|
0:16:25 | you think the true text input |
---|
0:16:27 | so as what can as what we can see that's |
---|
0:16:30 | our proposed one-model shoulders a slight degradations on this slot filling f one score |
---|
0:16:36 | comparing to the independent tree models |
---|
0:16:39 | but this might due to the fact that |
---|
0:16:42 | the dt proposed run model |
---|
0:16:45 | lack of certain discriminative powers |
---|
0:16:48 | for the multiple tasks because we are using the shared |
---|
0:16:52 | representation from this |
---|
0:16:53 | r and you just a good |
---|
0:16:56 | but this |
---|
0:16:57 | so just one aspect that we can be improved further in our future work for |
---|
0:17:01 | the joint modeling |
---|
0:17:04 | this one is the language modeling performance |
---|
0:17:07 | using the should act input |
---|
0:17:09 | as whatever can see |
---|
0:17:11 | the best performing model is that one to model these intent and strongly slot label |
---|
0:17:15 | context |
---|
0:17:16 | and this model achieves eleven but its relative error |
---|
0:17:20 | or action a sorry |
---|
0:17:21 | relative reduction of perplexity |
---|
0:17:24 | comparing to the independent training language model |
---|
0:17:27 | so all one saying that we can not used from this result is that |
---|
0:17:32 | the intent intense context |
---|
0:17:35 | used very important |
---|
0:17:37 | in term of producing a |
---|
0:17:39 | cootes language modeling performance |
---|
0:17:41 | we doddington context |
---|
0:17:43 | bit one model be smart label contact used off |
---|
0:17:46 | produced very similar performance |
---|
0:17:48 | in term of a perplexity comparing to be independent of any models |
---|
0:17:53 | so |
---|
0:17:54 | here we show that intent |
---|
0:17:57 | information internal context is very important for small for language modeling |
---|
0:18:04 | and the last be some results he's |
---|
0:18:07 | using these speech input |
---|
0:18:08 | and asr output to our model |
---|
0:18:11 | these are the for asr model settings |
---|
0:18:13 | the first one is just use the without directly from the decoding |
---|
0:18:17 | and second one use |
---|
0:18:19 | after decoding we do rescoring restore five grand |
---|
0:18:22 | language model |
---|
0:18:23 | a sort of one use the rescoring this independence training rnn language model |
---|
0:18:29 | last one is |
---|
0:18:30 | the model that this rescoring |
---|
0:18:32 | using our proposed drunks trendy model |
---|
0:18:36 | as we can is what we can see from these without |
---|
0:18:39 | the p d joint modeling the joint training |
---|
0:18:42 | approach |
---|
0:18:44 | produce the |
---|
0:18:45 | best performance |
---|
0:18:46 | across all these three evaluation criteria here |
---|
0:18:50 | basically the word error rate force are |
---|
0:18:52 | speech recognition in turn error anova of a score |
---|
0:18:56 | so basically this result shows that |
---|
0:18:58 | even ads d word error rates of a wrong troll |
---|
0:19:03 | if you nine |
---|
0:19:04 | our intent model and our model comes to perform can still produce |
---|
0:19:10 | competitive performance in intense detection and the scroll speeding |
---|
0:19:13 | so these numbers are slightly worse than the experiment |
---|
0:19:18 | these two text input |
---|
0:19:19 | that's on these two also to extract shows the robustness |
---|
0:19:23 | of our proposed |
---|
0:19:25 | model |
---|
0:19:27 | okay lastly the conclusion |
---|
0:19:30 | in this work |
---|
0:19:31 | we proposed a rl model for trounced online |
---|
0:19:35 | language a spoken language understanding and the language modeling |
---|
0:19:38 | and it's a by modeling the street asked one three |
---|
0:19:43 | our model is able to |
---|
0:19:45 | achieve improved performance on the intent detection and the language modeling |
---|
0:19:50 | to be slightly location |
---|
0:19:51 | a small feeding performance |
---|
0:19:54 | you order to show the robustness our model |
---|
0:19:56 | we applied our model |
---|
0:19:59 | on the asr on the past noisy speech impose |
---|
0:20:03 | and we also observed consistent performance gain |
---|
0:20:07 | or the infantry models |
---|
0:20:10 | by using our joint model |
---|
0:20:13 | so this is the end of the talk |
---|
0:20:16 | right okay |
---|
0:20:22 | okay |
---|
0:20:23 | come from a few questions |
---|
0:20:25 | that's |
---|
0:21:00 | okay so the question is if i colour channel two we define the model what |
---|
0:21:05 | are the criterias that i am i will be looking for |
---|
0:21:09 | corpus yes |
---|
0:21:10 | right so |
---|
0:21:13 | basically it's all here is |
---|
0:21:14 | we can see that we are using the recurrence new enough models |
---|
0:21:17 | and |
---|
0:21:19 | typically such models on nlp tasks requires |
---|
0:21:22 | very large dataset to show stable and robust or robot performance |
---|
0:21:27 | so the first criteria is a cost if we can have a lot of data |
---|
0:21:31 | that would be the best |
---|
0:21:33 | the bigger the better i will assume |
---|
0:21:35 | and that seconds what i can single of is that |
---|
0:21:39 | for it as |
---|
0:21:40 | why this is the very simple rather simple dataset is because it is very |
---|
0:21:46 | don't min imitate limited so most of the training utterances |
---|
0:21:51 | a close to be related to flights |
---|
0:21:54 | airline travel information |
---|
0:21:56 | so if i can |
---|
0:21:57 | you know review the covers |
---|
0:21:59 | i which explore the |
---|
0:22:01 | a multi domain |
---|
0:22:04 | scenario |
---|
0:22:05 | that to see whether our model is able to handle |
---|
0:22:08 | you know perform |
---|
0:22:09 | really good not only in the two men limited case but also in the generalized |
---|
0:22:13 | braille in a more detailed many cases |
---|
0:22:15 | so that is |
---|
0:22:17 | what i really care about in the model in the corpus define |
---|
0:22:47 | right i completely agree with you i think this is |
---|
0:22:51 | it is very good suggestion is be here we are doing joint modeling of slu |
---|
0:22:56 | and the language modeling |
---|
0:22:57 | and typically language modeling used you know having asked to make a prediction of what |
---|
0:23:02 | the user might say that the next that and |
---|
0:23:05 | i think that is not very nice that is good |
---|
0:23:21 | eval model have five words maybe have |
---|
0:23:23 | just this is one single training instance |
---|
0:23:43 | so our experiment for the |
---|
0:23:46 | for it should tax simple which the we don't have that situation |
---|
0:23:50 | that's in the asr output we may be seen in a partial |
---|
0:23:55 | partial phrase ease or corrections |
---|
0:23:59 | we |
---|
0:24:00 | you know to look into these particularly in this work |
---|
0:24:02 | but it that is something |
---|
0:24:04 | "'cause" look into in the future work |
---|
0:24:35 | alright okay thanks |
---|
0:24:39 | just a quick original source will i like to multi language model over the local |
---|
0:24:44 | you know trying about the main problem is about the corpus we have for training |
---|
0:24:48 | or slu model is usually very small going for creating language model you will be |
---|
0:24:52 | corpus so budgeting but right but jointly you know you needed to say that you |
---|
0:24:57 | have to have a |
---|
0:24:59 | you know you're automatically determine your |
---|
0:25:02 | training a language model |
---|
0:25:04 | right i think |
---|
0:25:06 | i believe in this domain and |
---|
0:25:08 | data at all |
---|
0:25:09 | well labeled data that is really a limitation because we don't have very large male |
---|
0:25:15 | labeled data for these slu task so |
---|
0:25:18 | i think if we can put more effort in generating |
---|
0:25:21 | you know |
---|
0:25:22 | better quality coppers that you |
---|
0:25:24 | have a lot of them of these slu research |
---|
0:25:27 | that's question |
---|
0:25:44 | yes i did |
---|
0:25:56 | okay so i think that is a very good question so we have a |
---|
0:26:00 | a chart in the paper but it initially here in the annotation |
---|
0:26:03 | basically all be evaluated different number of different size of k |
---|
0:26:08 | the basic a use one |
---|
0:26:09 | starting from each that |
---|
0:26:11 | we start gradually increasing the intent contribution |
---|
0:26:14 | and we evaluate so we show the training curve and validation curve |
---|
0:26:18 | for different k values |
---|
0:26:20 | the but basically these values a set |
---|
0:26:23 | not in the experiment is that all learned |
---|
0:26:26 | in the in a kind of work |
---|
0:26:32 | i think |
---|
0:26:33 | definitely discover then i think this is |
---|
0:26:35 | one of the hyper parameters that can be |
---|
0:26:38 | then from the purely data-driven approach |
---|
0:26:41 | just think that in the current work we |
---|
0:26:43 | not select of uk values |
---|
0:26:45 | and evaluates which is a |
---|
0:26:48 | that's k values |
---|
0:26:50 | okay so that's by the speaker again and that's university okay |
---|