0:00:19 | okay the |
---|
0:00:22 | so then we move on to the next |
---|
0:00:24 | speaker |
---|
0:00:27 | so the paper is unsupervised dialogue spectrum generation for more variable rounding |
---|
0:00:34 | the |
---|
0:00:35 | well as usual |
---|
0:00:48 | however and signal |
---|
0:00:50 | and this work is finished with the each or jealously then and aston gently from |
---|
0:00:55 | microsoft research by the way i'm from her what university |
---|
0:00:58 | and a flat start |
---|
0:01:01 | so the aim for this paper is that we are we would you the a |
---|
0:01:05 | ranker |
---|
0:01:06 | to detect the problematic dialogues |
---|
0:01:08 | from the normal ones without the and you labeled data |
---|
0:01:12 | where used in the existing six dialogues as the normal dialogues |
---|
0:01:16 | and then learned in a way to a generative use assimilated by can setups |
---|
0:01:21 | and have a talk with the |
---|
0:01:23 | bought in an in different training steps |
---|
0:01:26 | and the we get the old conversations from different rings taps |
---|
0:01:29 | and take them as the problem is problematic dialogues we call this mattered a step |
---|
0:01:35 | gone |
---|
0:01:36 | and the experiment result shows that the stuff step can compared favorably with the run |
---|
0:01:41 | first train it on the labeled a manually labeled datasets |
---|
0:01:46 | okay so what is the log data like a ranking |
---|
0:01:49 | so the log dialogues are dialogues are dialogues of conversations happen between the real users |
---|
0:01:54 | and the dialogue system |
---|
0:01:56 | and the other |
---|
0:01:58 | dialogue ranking aims that the identify the problematic splines from the normal ones |
---|
0:02:03 | here are two examples of |
---|
0:02:05 | the normal dialogues and problematic dialogues |
---|
0:02:08 | here is the first one |
---|
0:02:10 | the first one is a normal dialogue |
---|
0:02:13 | the dialogue scenes in the restaurant searching domain or every |
---|
0:02:18 | firstly the cs it's and state hollow and then the user |
---|
0:02:22 | is asking for european restaurant |
---|
0:02:25 | and then the system task what's part of time to have in mind |
---|
0:02:29 | and they used a set the center |
---|
0:02:32 | and after that the u s systems that i it's a system was asking for |
---|
0:02:36 | the price range |
---|
0:02:37 | and the uses the tagset expensive one |
---|
0:02:41 | and the get after getting all the information at the system said i suggest this |
---|
0:02:45 | the machine house cafe |
---|
0:02:47 | and then repeat the all the requirements of the users |
---|
0:02:51 | and after that the user ask for the rest of this cafe and the system |
---|
0:02:56 | gives the cracks informations |
---|
0:02:58 | and this think it to each other and the dialog finish |
---|
0:03:06 | so we define it is not what dialogue as |
---|
0:03:09 | dialogues that without any can't actually and natural turns |
---|
0:03:13 | and also achieved over the requirements |
---|
0:03:16 | ask about the user |
---|
0:03:19 | and here is that problematic with dialogue |
---|
0:03:23 | so where a is very |
---|
0:03:26 | pat apparently |
---|
0:03:27 | so when the system can understand the user utterance |
---|
0:03:30 | and the conversations going to the wrong direction |
---|
0:03:32 | for example |
---|
0:03:33 | the use this that i would really like still european that's cheap |
---|
0:03:37 | and the system has some problems based understanding this utterance |
---|
0:03:41 | by suggesting one restaurant |
---|
0:03:44 | which is in the east is tough town |
---|
0:03:46 | however the user was asking for the standard |
---|
0:03:49 | and after that the user it's a |
---|
0:03:54 | i want to eat at this restaurant have you got there is that a address |
---|
0:03:59 | and this is indeed is the this utterance and ask what part of town to |
---|
0:04:03 | have in mind again |
---|
0:04:05 | so we define is problematic dialogs as |
---|
0:04:08 | the dialogues with either can't actually unnatural turns |
---|
0:04:12 | or some and cheap the requirements or both |
---|
0:04:16 | so the goal for this bunker |
---|
0:04:19 | actually is the best |
---|
0:04:22 | so the goal for the ranker is the to pick up this type of problematic |
---|
0:04:26 | dialogues from the normal ones |
---|
0:04:30 | so what we need a stronger |
---|
0:04:33 | in people unity development loop of the at data driven dialogues |
---|
0:04:38 | the developer is would able upgrade there's a dialogue system |
---|
0:04:42 | i seeing some in domestic dialogues |
---|
0:04:47 | and then the dialogue system will a beating the |
---|
0:04:51 | deploy a three |
---|
0:04:53 | will be released to the customers |
---|
0:04:54 | and then the locked a lot or log conversations can be collected |
---|
0:05:02 | and then the developers can improve the performance of the system |
---|
0:05:06 | by correcting some mistakes than system at in a locked dialogues and then retrain the |
---|
0:05:12 | a dialogue system model |
---|
0:05:15 | however |
---|
0:05:16 | going through all these dialogues are time consuming |
---|
0:05:22 | so we hope |
---|
0:05:23 | that these manually checking process can be replace the by the a dialog drunker |
---|
0:05:29 | that can detect dialogue with lower quality automatically |
---|
0:05:33 | to make this dialogue learning process with human the look more efficient |
---|
0:05:40 | so here this structure of the structure of the ranker |
---|
0:05:44 | that you put for the ranker used it just the dialogue |
---|
0:05:46 | and outputs is the score |
---|
0:05:49 | in between zero and one and zero mean is the normal dialogues and the why |
---|
0:05:53 | means that problematic dialogues |
---|
0:05:57 | so firstly |
---|
0:05:58 | we get the sentencing biting by distance decoder |
---|
0:06:03 | and then feed them into this multi have stuff what's multi have self attention |
---|
0:06:07 | to capture the meaning of the dialogue context |
---|
0:06:13 | and then we have these turn level classifier |
---|
0:06:16 | to identify the quality of each turn |
---|
0:06:18 | for example |
---|
0:06:20 | for these very smooth turn the score should be zero point one |
---|
0:06:24 | and for all sort |
---|
0:06:27 | and for these problematic turns the score should be zero point nine |
---|
0:06:33 | and their and then would be these i a dialogue level run curves on top |
---|
0:06:38 | of this term life what qualities |
---|
0:06:40 | and this the for this dialogue there are some parts of them are us to |
---|
0:06:44 | move the some of them are problematic |
---|
0:06:46 | so probably the score will be like a zero point eight or something the extracted |
---|
0:06:49 | score |
---|
0:06:52 | so the training for the normally digit that a |
---|
0:06:55 | the gathering all the data for these of a trend for training of this one |
---|
0:06:59 | queries very time-consuming |
---|
0:07:01 | so you matching that human the loop process in the development when that whenever at |
---|
0:07:06 | a significant change is made to the system a new labeled data for the i'd |
---|
0:07:11 | run queries required |
---|
0:07:13 | this is not feasible for most of the developer |
---|
0:07:16 | and that's |
---|
0:07:17 | motivates us to explore this stuck on approach |
---|
0:07:23 | the general idea for this task is that |
---|
0:07:25 | we take the c dialogue set the normal dialogues |
---|
0:07:27 | and at the same time we need to step can to simulate the problem might |
---|
0:07:31 | problematic dialogues |
---|
0:07:33 | and train the bankers on top of this data |
---|
0:07:39 | so here is the structure of the i-th turn setup |
---|
0:07:43 | we have these dialogue generator and all have this we made here discriminator and need |
---|
0:07:47 | the dialogue generator would have the restaurant searching a dialogue system |
---|
0:07:51 | and are in based the user simulator |
---|
0:07:56 | firstly we start of pre-training process |
---|
0:08:00 | in this process we preach win over a user simulator but the full utterance |
---|
0:08:04 | a multi domain dialogues |
---|
0:08:08 | for example for the full for the most intimate dialogue this can be for example |
---|
0:08:12 | the pizza ordering |
---|
0:08:13 | we in which the user is asking for the large a pineapple pizza |
---|
0:08:17 | and this does it can be the temperatures taking to men in which the user |
---|
0:08:22 | is asking for a setting the temperature of the room to a seventy two degrees |
---|
0:08:30 | and then we |
---|
0:08:32 | we just we ask the user simulator to simulate some dialogue |
---|
0:08:36 | together with the restaurant search in both |
---|
0:08:40 | and hearsay example of simulated dialogue after pre-training as we can see the |
---|
0:08:45 | user simulator has some that basically language abilities |
---|
0:08:49 | but it doesn't know how to talk a bit based a restaurant search imports |
---|
0:08:54 | so when the system is asking for |
---|
0:08:57 | some restaurant searching requirement the user said management home or something like that |
---|
0:09:02 | and of course |
---|
0:09:03 | the dialogs not going to the right direction |
---|
0:09:10 | so |
---|
0:09:11 | after a guide the this after we get the simulated problematic dialogues we a trend |
---|
0:09:17 | that is committed to get discriminator together with the c dialogues |
---|
0:09:20 | which is pre-trained sorry |
---|
0:09:24 | so after the pre-training process we come we move on to the first type of |
---|
0:09:29 | the goddess that can training |
---|
0:09:32 | firstly we just the initialize the are user simulator and that discriminator |
---|
0:09:37 | by the occlusion and model |
---|
0:09:40 | separately |
---|
0:09:43 | and they're in there |
---|
0:09:45 | than setups |
---|
0:09:46 | for the training of the discriminator we ask the that looked in the reader to |
---|
0:09:51 | simulate some dialogues with only one pair |
---|
0:09:54 | and take them at the problem problematic dialogues |
---|
0:09:57 | and then we have this each dialogue and truncated them up to the first turn |
---|
0:10:03 | and to get a take them as the normal dialogues and feed them into the |
---|
0:10:06 | discriminator |
---|
0:10:10 | and for the training of the simulator in step one we also where you also |
---|
0:10:14 | use nist are wondering stick sd and si dialogues |
---|
0:10:20 | after that we start our can setups that's trained for treating the training of the |
---|
0:10:24 | generated matching of the discriminator |
---|
0:10:27 | after conver after the model get cumbersome |
---|
0:10:30 | we ask |
---|
0:10:31 | the model to simulate full length of dialogues |
---|
0:10:34 | and put them into the simulated problem simulated problematic dialogues |
---|
0:10:40 | buckets |
---|
0:10:41 | as we can see the first term of this system is very is very small |
---|
0:10:45 | but after that when the system |
---|
0:10:48 | is asking for which what's product and you have in mind |
---|
0:10:52 | the used as the continent which they use the system can understand |
---|
0:10:57 | and the dialogues going wrong |
---|
0:11:00 | and have the first that were coming to the second step |
---|
0:11:05 | and we firstly we also initialize our used a military and the discriminator |
---|
0:11:10 | we use to be we initialize the user simulator with the wire which rendered in |
---|
0:11:14 | this step one |
---|
0:11:14 | and we are |
---|
0:11:17 | a initialize disk major with the push shouldn't model |
---|
0:11:22 | and the only difference between this that one is that to step two is that |
---|
0:11:27 | we are asking the you the that ballot denoted to generator to generate the |
---|
0:11:34 | to simulate dialogue with two turns |
---|
0:11:36 | and that the same time we truncate our artistic see dialogues into two turns and |
---|
0:11:40 | then show in that is committed and estimated a user simulator at the same time |
---|
0:11:47 | after the model get commerce |
---|
0:11:49 | we asked then using user simulator to simulate folded of dialogues |
---|
0:11:53 | and then put them into the simulated problematic dialogues |
---|
0:11:57 | so as we can see the first two terms of or a smooth and stuff |
---|
0:12:00 | and third term turns there's something wrong |
---|
0:12:06 | okay and then |
---|
0:12:08 | we just repeat this that for like and steps |
---|
0:12:13 | and after the and step of training |
---|
0:12:15 | we get |
---|
0:12:18 | a four bucks buckets of the simulated problem of problematic dialogues |
---|
0:12:22 | and together with the c dialogues |
---|
0:12:24 | where should in our dialogue drunker |
---|
0:12:29 | so here's so that is a set or using this paper |
---|
0:12:32 | basically we're using the re dataset |
---|
0:12:34 | the first one is the multi domain dialogues |
---|
0:12:38 | that is for the pre-training of that segment user simulator and it's good discriminator |
---|
0:12:44 | and where using this might otherwise these is that |
---|
0:12:47 | which is task oriented conversations with a thirty sorry for two thousand dialogues |
---|
0:12:53 | you over fifty one domains |
---|
0:12:56 | and each dialogues in this dataset is task oriented conversational we interaction |
---|
0:13:01 | between two real speakers and one of them a stimulating the user and detect the |
---|
0:13:06 | otherwise stimulating the but |
---|
0:13:10 | and the second part is to see dialogues |
---|
0:13:14 | this a dialogue is portrayed is the is for the training of the can structure |
---|
0:13:19 | and normally to see dialogues are human written dialogues that will be offered to the |
---|
0:13:23 | developers before the active development of the dialogue system |
---|
0:13:27 | however we don't have these human written dialogues |
---|
0:13:30 | so the we create this stick dialogue this |
---|
0:13:34 | we create what i just need a lot |
---|
0:13:37 | by having the a high dial restaurant a searching but |
---|
0:13:41 | talk to be the rule based |
---|
0:13:43 | user simulator that also offer a high tail |
---|
0:13:48 | and the third one is the manually labeled log dialogues which is for the evaluation |
---|
0:13:53 | of this task |
---|
0:13:56 | to claques this the labeled data we deployed a deployed our a high tail |
---|
0:14:02 | restaurant search in both the way the amazon mechanical turk platform |
---|
0:14:06 | are firstly we generate automatically generates some requirements for the user's for example |
---|
0:14:13 | for some for type and also |
---|
0:14:15 | locations and price range |
---|
0:14:17 | and then |
---|
0:14:18 | we asked turkers to find the restaurant |
---|
0:14:21 | that satisfy those requirements |
---|
0:14:24 | by checking base our restaurant sports |
---|
0:14:27 | and i d n i d end of each |
---|
0:14:30 | and the also at the end of each task |
---|
0:14:33 | we add the quite the users are asked two questions |
---|
0:14:37 | and the first one is the weather define the restaurant |
---|
0:14:40 | making all the requirements mistaken one in the second one where we ask the user |
---|
0:14:44 | two labeled a contextually an actual turn |
---|
0:14:48 | do in the conversation |
---|
0:14:53 | in total we collect a one what are than the six hundred normal dialogues and |
---|
0:14:58 | one thousand three hundred problematic dialogues |
---|
0:15:05 | here are some experiment results would you basically for example for experiments |
---|
0:15:11 | to justify the performance of this |
---|
0:15:13 | stuck on |
---|
0:15:16 | so the first one is we investigate how |
---|
0:15:21 | the generated dialogue's move to was to the normal dialogues |
---|
0:15:25 | basically we examine the dialogues generated at each test |
---|
0:15:30 | each time step of the static on |
---|
0:15:32 | in terms of three metrics |
---|
0:15:34 | a here are to love them |
---|
0:15:35 | the first one a dapper one is the ranking score and the second one dollar |
---|
0:15:40 | wise the success rate |
---|
0:15:43 | and the yellow dashed lines and the green dashed line is probably very |
---|
0:15:47 | a week |
---|
0:15:49 | d stands for the average performance of the are labeled |
---|
0:15:53 | no more dialogues and the labeled |
---|
0:15:55 | problematic that problematic dialogues |
---|
0:16:02 | so as we can see after the first turn a training |
---|
0:16:06 | the |
---|
0:16:07 | performance of the are generated dialogue |
---|
0:16:10 | are much worse than the probably labeled problematic that'll a problem |
---|
0:16:16 | labeled problematic dialogues |
---|
0:16:22 | okay |
---|
0:16:23 | after three terms of training |
---|
0:16:25 | the both matrix star a growing and are better than the average performance of the |
---|
0:16:30 | labeled a problematic dialogues |
---|
0:16:35 | and as we can see after the and i terms of training |
---|
0:16:39 | and the success rate |
---|
0:16:41 | used email is as high is the |
---|
0:16:44 | unlabeled normal dialogues |
---|
0:16:47 | and also we can see the dialogues is going or smaller than the |
---|
0:16:51 | a very smooth and very |
---|
0:16:54 | a natural |
---|
0:16:58 | it here is the |
---|
0:17:02 | cues is second experiment |
---|
0:17:03 | so in the second experiment we just the compare the stuck on be the |
---|
0:17:09 | a ranker train it on the labeled data set |
---|
0:17:13 | so firstly we just divided aim at amt labeled data into three part of the |
---|
0:17:17 | two thousand training dataset and |
---|
0:17:20 | to the training examples two hundred tap examples and the four hundred testing samples |
---|
0:17:25 | and then we trained these dialogue ranker |
---|
0:17:28 | we call this as to provide two thousand on this labeled training dataset |
---|
0:17:33 | and use the performance |
---|
0:17:35 | and by the we were evaluating this problem by the opposite yet proceed and k |
---|
0:17:39 | and recall at k |
---|
0:17:44 | so the training of the |
---|
0:17:46 | sorry for this task done |
---|
0:17:47 | and we simulated basically rt start and problematic dialogues |
---|
0:17:51 | and |
---|
0:17:52 | because the number of the c dialogue opens a we so all the all the |
---|
0:17:56 | data set up balanced datasets to their one thousand a positive examples in the what |
---|
0:18:00 | the next examples |
---|
0:18:01 | and because the see the number of c dialogues is only one hundred still |
---|
0:18:05 | we just duplicated by thirty times and try to make this dataset balanced |
---|
0:18:11 | and then which in our aspect that on this dataset |
---|
0:18:15 | so here's the performance |
---|
0:18:18 | and as we can see the us that can performs even better than the supervised |
---|
0:18:22 | approach |
---|
0:18:23 | when the k a is lower than fifty |
---|
0:18:26 | even though the supervised at two thousand has higher performance |
---|
0:18:31 | wouldn't case getting larger |
---|
0:18:32 | just can't do you comparison a fair regulate this |
---|
0:18:37 | and here's the thirty some experiments |
---|
0:18:40 | we just basically class the |
---|
0:18:43 | we just basically i'd the simulated data |
---|
0:18:46 | into the into the unlabeled data |
---|
0:18:49 | and try to compare the performance of this combined it has said with the labeled |
---|
0:18:52 | data set |
---|
0:18:53 | and |
---|
0:18:56 | here is the result |
---|
0:18:58 | so basically the experiment shows that our us that can |
---|
0:19:02 | approach can bring some additional |
---|
0:19:04 | generate sessions by the segment by simulating |
---|
0:19:07 | a wired a range of dialogues |
---|
0:19:09 | that are not covered by the labeled data |
---|
0:19:14 | so the last six or experiment is where comparing the set down with other type |
---|
0:19:19 | of use of user simulator |
---|
0:19:21 | and the first one is the |
---|
0:19:22 | basically what coded multi domain |
---|
0:19:25 | what is doing is just like we train this user simulator with that the multi |
---|
0:19:29 | domain dialogues |
---|
0:19:30 | and simulated one about them problematic dialogues |
---|
0:19:32 | and then a together with just see dialogue which we need a ranker the dialogue |
---|
0:19:37 | ranker |
---|
0:19:38 | and q |
---|
0:19:41 | and the second one is the find you model |
---|
0:19:44 | so basically we preach when the user simulator based the multi-domain dollars |
---|
0:19:48 | and then find kuwait on this t dialogues |
---|
0:19:52 | and then we generate |
---|
0:19:54 | went out and problematic dialogues and train it together with the see that looks |
---|
0:19:58 | thank you performance |
---|
0:20:01 | and the last one is the we code it's that finite-state thank you |
---|
0:20:05 | so basically we just |
---|
0:20:07 | replace this find used to use that of blank unit on the full length of |
---|
0:20:12 | i think that walks we just |
---|
0:20:14 | thank you in the stepwise fashion which has been introduced in the a stack on |
---|
0:20:18 | just without the con structure |
---|
0:20:21 | and |
---|
0:20:22 | hughes the results |
---|
0:20:23 | and we also train our are stacked on the same size of dataset |
---|
0:20:27 | we should still one thought and assimilate out with simulated dialogue and the ones not |
---|
0:20:31 | and i'll |
---|
0:20:33 | the c dialogues so as we can see the is that stuck on are also |
---|
0:20:38 | performance than all the others user simulator |
---|
0:20:42 | so the conclusion is just that can generate dialogues based a wide range of |
---|
0:20:47 | qualities |
---|
0:20:48 | and compared to i this compares favorably with the ranker train another labelled dataset |
---|
0:20:54 | and this we need additional general addition by simulating little |
---|
0:20:57 | while the range of take this dialogue |
---|
0:20:59 | they can not covered by the al |
---|
0:21:02 | a labeled data or sorry |
---|
0:21:04 | the last wise |
---|
0:21:05 | it also forms other your system |
---|
0:21:15 | but you're much we questions volumes |
---|
0:21:22 | hi i actually have to questions let's see if i |
---|
0:21:25 | the first one is |
---|
0:21:28 | of course you starting with a binary classification problematic versus non problematic but of course |
---|
0:21:34 | there are |
---|
0:21:35 | more problematic dialogues and you had it |
---|
0:21:38 | i and you address some of that via the times however in the end is |
---|
0:21:43 | still a binary classification right yep |
---|
0:21:46 | then my second question is because it's a binary classification what does it mean precision |
---|
0:21:51 | that okay in this case so used to basically procedure is i case like to |
---|
0:21:56 | a ranking of matrix it might is pretty relevant for evaluating the ranking process |
---|
0:22:03 | so basically what we're doing is like |
---|
0:22:05 | we for example have for a four hundred testing data and then we just the |
---|
0:22:10 | use our model to dialogue ranker to give score to each dialogues and they would |
---|
0:22:15 | market from a top from |
---|
0:22:18 | upper to down |
---|
0:22:19 | and then that means like |
---|
0:22:21 | we suppose that |
---|
0:22:23 | i the top of these dataset like it would give is higher score to this |
---|
0:22:27 | dialogue them use like these dialogues are problematic dialogues |
---|
0:22:30 | so was again the case like which is truncated this tell at this dataset as |
---|
0:22:34 | for example first ten dialogues |
---|
0:22:37 | and then we calculate how many of them are the problem at a problematic dialogues |
---|
0:22:42 | and divided by ten |
---|
0:22:43 | and we can transmit more like maybe we can see like of part fifty and |
---|
0:22:48 | top one hand |
---|
0:22:57 | you generate this problem is to dialogue so sort of letting lasso for us a |
---|
0:23:02 | so we generate this problematic dialogues in this fashion where the beginning "'cause" all this |
---|
0:23:08 | food and then the and this kind of rubbish |
---|
0:23:12 | this is also comes from there or you this is a separate but there is |
---|
0:23:16 | like something that in the middle of the thing to get you so for the |
---|
0:23:19 | task that is the like basically use human labeled data is not only labeled but |
---|
0:23:25 | thanks acumen is talking with the our system so the error can be like at |
---|
0:23:28 | the meteoric talk alright and the end so it's like |
---|
0:23:32 | it's just or if you don't really don't by john it's like the whole don't |
---|
0:23:35 | know yes we don't run time by turn would just about the hotel or is |
---|
0:23:40 | to think intent |
---|
0:23:41 | it |
---|
0:23:45 | all the questions |
---|
0:23:53 | hi i'm a really from a dt i have a question about the how you |
---|
0:23:57 | define the problematic dellaleau as a whole i mean that is they can be some |
---|
0:24:01 | errors in the middle that the system can repair so what you mean exactly what |
---|
0:24:07 | a problem of the limiting database so we define a problematic dialogs as |
---|
0:24:12 | are like they have to look up to way not two-way these like to type |
---|
0:24:17 | of problematic that actually history type of its a problematic dialogues |
---|
0:24:21 | and the first type is likely they have some a natural turn |
---|
0:24:24 | so basically |
---|
0:24:25 | they achieve this goal they achieve their goal |
---|
0:24:28 | but the communication is not smooth |
---|
0:24:30 | so this person that |
---|
0:24:31 | and second type is like the communication is not smooth but i know that same |
---|
0:24:36 | type at the achieve a goal |
---|
0:24:37 | and actually they're potentially have the third one which nist back behind the communication use |
---|
0:24:42 | the moves but they didn't you are so |
---|
0:24:44 | we just define diplomatically in this way |
---|
0:24:47 | the in terms of the fan from the entrance is not smooth but the task |
---|
0:24:51 | be successful is that this do you have a targeted the done data entry and |
---|
0:24:55 | i'm sorry i didn't you have been calculated the annotator agreement |
---|
0:25:00 | hence we can o we didn't specifically to find this type of data but because |
---|
0:25:06 | the we gather data i think this type of examples are in the testing dataset |
---|
0:25:12 | alright thank you |
---|
0:25:15 | question whether |
---|
0:25:19 | right |
---|
0:25:23 | because like the ranker outputs |
---|
0:25:26 | continues but |
---|
0:25:28 | and you |
---|
0:25:32 | no so as to the also the run queries is cut continuers between zero and |
---|
0:25:38 | one so it can be like their point eight hours a point five something |
---|
0:25:41 | and when is close to one that means this problem i take one and when |
---|
0:25:45 | is close to zero that means that so that like the normalized so is the |
---|
0:25:49 | these units and zero and one |
---|
0:25:52 | it can it's just what is the loss function so that so |
---|
0:25:57 | so the loss function basically use the |
---|
0:26:00 | the |
---|
0:26:01 | discord that the run currently based the late is not labeled with the label so |
---|
0:26:07 | we labeled problematic dialogs as one |
---|
0:26:09 | and the normal dallas zero and the loss it just like the score given by |
---|
0:26:14 | the ranker |
---|
0:26:15 | between this like with this one so for example we use should be critical bands |
---|
0:26:24 | you know a one question also so this generate the |
---|
0:26:28 | but dialogue some problematic dialogues |
---|
0:26:31 | how do you know that they also wrote something to the actual problematic scores owes |
---|
0:26:35 | to that of course are so this corpus |
---|
0:26:38 | so we also be assuming we have three metrics to evaluate that |
---|
0:26:41 | and |
---|
0:26:42 | the first one is like the last |
---|
0:26:44 | so normally if the if that there's something while the dialogue |
---|
0:26:48 | or the user didn't achieve this goal normally the dialogues longer |
---|
0:26:51 | so this one matrix |
---|
0:26:53 | and the otherwise the a success rate determines whether the user achieve their goal |
---|
0:26:59 | and third one is the |
---|
0:27:01 | to score given by the on the run for which to train it on the |
---|
0:27:05 | labeled data so basically it's like |
---|
0:27:07 | proper boundary like giving the score |
---|
0:27:09 | so we just compiler like to |
---|
0:27:12 | so we just that would just compare the so basically that is this one |
---|
0:27:16 | this |
---|
0:27:27 | so basically use this lies so |
---|
0:27:30 | we just compare it with the average |
---|
0:27:33 | for example the average running score of the are labeled problematic dialogues which is the |
---|
0:27:38 | real |
---|
0:27:38 | and compare it with the also compare with the yellow dashed line |
---|
0:27:42 | that means the average performance of the labeled |
---|
0:27:46 | a normal dialogues so we just see like at the beginning of these very always |
---|
0:27:50 | all this evaluation metrics a very low and after that is getting higher so that |
---|
0:27:54 | means like at the beginning that for the dialogues over there is a lot of |
---|
0:27:58 | problem is problematic dialogues and you the end is getting |
---|
0:28:01 | but i was if you read this example it seems like the user utterances or |
---|
0:28:05 | various look up to be very unlikely to happen in about |
---|
0:28:10 | what color turn your mind boston |
---|
0:28:13 | them for is going up in colorado and it's like the user is doing great |
---|
0:28:18 | system here |
---|
0:28:20 | yes there is no virtual characters and system reacting yes but it without introducing probably |
---|
0:28:27 | or whatever but for this one is likely only after one trainings that and after |
---|
0:28:31 | so you can see like after the three |
---|
0:28:35 | after treatment utterance of training the user is |
---|
0:28:40 | saying something a possible example that i'm not looking for this place please change so |
---|
0:28:44 | these also related to the restaurants do man |
---|
0:28:47 | but so that |
---|
0:28:48 | that is the utterance that use the contents are the system can understand so that |
---|
0:28:52 | cost the problem of the failure of the dialogue so probably at the bikini well |
---|
0:28:56 | we want to generate the problematic in like multiple maori in very creepy we but |
---|
0:29:02 | after so i do you in this step can training process the dialogue is getting |
---|
0:29:07 | into this is a restaurant search and a man is just like the way the |
---|
0:29:11 | user describing their requirements is not accepted by the by the system so you to |
---|
0:29:19 | generate a dialogue is getting closer to the to the domain and is getting last |
---|
0:29:23 | three |
---|
0:29:25 | okay but you |
---|
0:29:27 | we want to run for a final question |
---|
0:29:34 | so it is you go along blues steps of the step again it looks like |
---|
0:29:38 | the |
---|
0:29:39 | the problems |
---|
0:29:40 | looks like ordering and back |
---|
0:29:44 | like after this the g m is that the case i'm just asking what you |
---|
0:29:48 | like doesn't the generator |
---|
0:29:51 | generated a low quality |
---|
0:29:53 | problem just and |
---|
0:29:55 | is actually you know so |
---|
0:29:58 | so most of the devil wears problem there come at their of the appeared in |
---|
0:30:03 | the end but it's a unit do that you in the generation process because we |
---|
0:30:08 | have some like a random seed or something |
---|
0:30:09 | and there are some problem as can |
---|
0:30:13 | appeared in between but these the much less than the one appeared in the end |
---|
0:30:20 | i see okay i mean so then be secure you i mean that's |
---|
0:30:24 | something because we are doing |
---|
0:30:30 | problems in the middle or the beginning i see it does so basically we actually |
---|
0:30:40 | ideally we one this paper like we have the arrow in like all kinds of |
---|
0:30:44 | place |
---|
0:30:45 | and the |
---|
0:30:47 | indeed like some of the generated dialogue even though after maybe six times over the |
---|
0:30:52 | seven turns they are still there are some problems appear in me to but is |
---|
0:30:56 | much lasso |
---|
0:30:57 | i think maybe second of future work this i guess it was just gonna see |
---|
0:31:01 | my helpful to combine different dialogs from different steps of just a |
---|
0:31:07 | in table i want to train the rent |
---|
0:31:10 | you mean like to collect the data from a different a training stuff but we're |
---|
0:31:14 | doing that where like |
---|
0:31:16 | a completely at all these dialogues into this okay |
---|
0:31:22 | okay the think that's the from a question so let's think the speaker again |
---|