0:00:15 | okay a my name is mean a |
---|
0:00:18 | and i from the natural language in dialogue systems via |
---|
0:00:22 | you see santa cruz preventing paper controlling personalities with that of variation |
---|
0:00:28 | the neural that language generators |
---|
0:00:34 | so |
---|
0:00:37 | the problem in that work on the task oriented neural nlg structured data has focused |
---|
0:00:44 | on |
---|
0:00:44 | a weighting semantic errors which has resulted in |
---|
0:00:48 | by logistically an interesting outputs |
---|
0:00:50 | so for example |
---|
0:00:52 | i two references with the |
---|
0:00:54 | locus generating coca is a stronger describe our holiday and coca |
---|
0:01:00 | is the low really construct your expressed by holiday and |
---|
0:01:04 | both realise |
---|
0:01:06 | although the attribute in the mr but that's really all that is |
---|
0:01:10 | so our goal is to train a neural nlg user semantics and stylistic |
---|
0:01:16 | variation by controlling input data and the amount of supervision available to the model |
---|
0:01:27 | really need lots of training data to learn the style |
---|
0:01:31 | so we use a statistical generator personage which is able to generate data is being |
---|
0:01:38 | and the big five personality to create stylistic variation |
---|
0:01:43 | we use |
---|
0:01:44 | i personalities agreeable conscientious disagreeable extrovert and conscientious |
---|
0:01:51 | two |
---|
0:01:53 | to generate |
---|
0:01:54 | data using train and dev mars each e |
---|
0:01:58 | challenge so personage you can systematically control |
---|
0:02:02 | the types of styles variational produced and we know which had to |
---|
0:02:06 | stylistic variation in our in |
---|
0:02:08 | it's reproducing so there are two examples |
---|
0:02:11 | the screen one for the agreeable personality and one for the disagreeable personality |
---|
0:02:16 | the remote personality and |
---|
0:02:19 | part markers like i e |
---|
0:02:21 | and the disagreeable one hand and the size or like effectively |
---|
0:02:25 | and or |
---|
0:02:27 | conversation |
---|
0:02:29 | and disagreeable is broken up into five sentences for their support agreeable |
---|
0:02:35 | all you are in one sentence |
---|
0:02:44 | for our data distribution we have i think eight hundred fifty five total utterance is |
---|
0:02:49 | generated from three thousand seven hundred and eighty four unique more and seventeen thousand seven |
---|
0:02:55 | hundred and seventy one references for personality and protest we generate one thousand three hundred |
---|
0:03:01 | and ninety total utterances |
---|
0:03:03 | for rendering from a unique are you get one |
---|
0:03:06 | preference for personality from the fact personality |
---|
0:03:10 | so with this data the mr our problem |
---|
0:03:13 | it rate and that's that each we challenge and have them are taken is directly |
---|
0:03:19 | from the text |
---|
0:03:20 | at each we challenge |
---|
0:03:22 | so the distribution of this data is problem but challenge so |
---|
0:03:27 | the training data number of attributes gram are a bit more balanced like |
---|
0:03:33 | mostly for five and the attributes |
---|
0:03:36 | gram or and a test data |
---|
0:03:39 | has a lot |
---|
0:03:40 | that's quite a bit more attributes per more mostly seven or eight actually |
---|
0:03:46 | we think this makes the test a little or in the training |
---|
0:03:52 | so there are five types of a rotation operation that personage can here |
---|
0:03:57 | do you combine the actual mr there's the period operation so x or y it |
---|
0:04:03 | is q x and y with three |
---|
0:04:07 | in conjunction operation x y and i e |
---|
0:04:12 | where x is why don't you and the |
---|
0:04:16 | the different areas the lack of four |
---|
0:04:19 | and the also q which is |
---|
0:04:22 | has why also it |
---|
0:04:24 | e |
---|
0:04:26 | aggregation operations are necessary to combine |
---|
0:04:30 | actually together with the distribution |
---|
0:04:33 | most of the personalities use most of the aggregation |
---|
0:04:37 | operations that there is still some |
---|
0:04:40 | brightly so |
---|
0:04:41 | it just agreeable voice |
---|
0:04:44 | using period operational lot more than all of the other one with |
---|
0:04:48 | and extrovert |
---|
0:04:50 | is a lot more likely to use the conjunction operator then the other |
---|
0:04:56 | what is so we can still see that was different |
---|
0:05:02 | you're the sample and pragmatic |
---|
0:05:05 | marker except me |
---|
0:05:06 | that personage can |
---|
0:05:08 | used |
---|
0:05:09 | the by now that we have had about thirty one i binary operators |
---|
0:05:15 | so some of these are the correct requests confirmation so that he what we can |
---|
0:05:19 | find on a |
---|
0:05:21 | exactly the restaurant be emphasized for |
---|
0:05:26 | like really basically actually just competing mitigation |
---|
0:05:30 | the come on obviously rewritten note that |
---|
0:05:34 | and include markers such as |
---|
0:05:36 | however we need it for and |
---|
0:05:39 | and has a product |
---|
0:05:41 | markers are necessary |
---|
0:05:43 | or a grammatically correct |
---|
0:05:45 | sentence and what utterance |
---|
0:05:48 | be you can see that not all over the personalities |
---|
0:05:52 | you every harmonic |
---|
0:05:54 | operator and i can occur |
---|
0:05:58 | you end up with some like tag question is really only used by agreeable |
---|
0:06:05 | many of them are used by multiple so |
---|
0:06:08 | what it is pretty much equally used by disagreeable and conscientious and some of a |
---|
0:06:14 | little bit less talent so you know it's mostly whose make extra or but agreeable |
---|
0:06:19 | will also |
---|
0:06:21 | you |
---|
0:06:22 | you know marker |
---|
0:06:27 | so we begin with the refined system from two sec at all and we have |
---|
0:06:32 | three different models with varying levels of supervision |
---|
0:06:36 | then there's a model the nose to model directly follows the baseline model has no |
---|
0:06:42 | supervision token model as a single okay |
---|
0:06:46 | specifies the personality |
---|
0:06:48 | similar to machine translation problems |
---|
0:06:51 | and our context model directly encodes that thirty six that parameters the pragmatic marketing aggregation |
---|
0:06:57 | operations |
---|
0:06:58 | from personage as context and if you forward network |
---|
0:07:06 | here's an example how what |
---|
0:07:08 | from our context model |
---|
0:07:12 | i e |
---|
0:07:13 | realization i had no application and no pragmatic markers so |
---|
0:07:19 | each attribute is that it on sentence and |
---|
0:07:22 | the a variety it's just realising attributes |
---|
0:07:27 | sar |
---|
0:07:28 | and i have three examples from personalities first agreeable |
---|
0:07:33 | let's see what we can finally it is well is we could use a rating |
---|
0:07:38 | also with an italian restaurant riverside moderately priced notice right so |
---|
0:07:42 | also with it in a really friendly easy |
---|
0:07:45 | so it had a confirmation in its hands and knowledge and justifications bayesian well and |
---|
0:07:53 | then it has a high as to the end and it also he's is also |
---|
0:07:59 | q for aggregation |
---|
0:08:01 | the second one |
---|
0:08:03 | i and twenty inches voice |
---|
0:08:06 | god i don't know it's really said at separating also it is moderately priced restaurant |
---|
0:08:11 | so italian place in riverside and you think you'd friendly |
---|
0:08:16 | expletive got |
---|
0:08:18 | and an initial rejection with the i don't know and this use this |
---|
0:08:23 | still uses the also q there is also with you |
---|
0:08:27 | the final four with |
---|
0:08:29 | in extrovert |
---|
0:08:30 | basically it's really is an italian place of this right and actually moderately priced the |
---|
0:08:36 | riverside decent reading okay brightly and it's a you know |
---|
0:08:40 | so it's one hand a year to emphasize errors |
---|
0:08:45 | basically actually and you know marker and only uses merge in conjunction and |
---|
0:08:52 | although he's just one sentence in there is no use of the period operation |
---|
0:09:00 | so |
---|
0:09:02 | automatic metrics |
---|
0:09:05 | really or just |
---|
0:09:07 | the |
---|
0:09:10 | i really you know why |
---|
0:09:12 | it systems that they don't just although the training data is a really is similar |
---|
0:09:18 | to the training data and i inherently bad |
---|
0:09:22 | for |
---|
0:09:23 | stylistic variation |
---|
0:09:24 | so |
---|
0:09:26 | our context model does perform the best but numbers may be a great |
---|
0:09:32 | we are mostly showing be specific completeness |
---|
0:09:35 | and we propose a new metrics for evaluating semantic causality and stylistic variation |
---|
0:09:43 | so first we evaluate the quality |
---|
0:09:46 | using four types of errors from the actual you're sitting here are in reference to |
---|
0:09:52 | the realizations so |
---|
0:09:55 | the first is deletions which is one |
---|
0:09:57 | and action you near bar it is not rely in the what |
---|
0:10:02 | reputations which is where a here |
---|
0:10:05 | actually you in the reference multiple times |
---|
0:10:08 | substitution which is where |
---|
0:10:11 | actually you is i think in a year more and the reference considered value |
---|
0:10:17 | so for example if you are marked it was italian restaurant and referent a french |
---|
0:10:23 | restaurant |
---|
0:10:24 | what he wants everything you know |
---|
0:10:26 | and then hallucinations which is one reason actually reference that was not new original mr |
---|
0:10:32 | so we have in table here that have |
---|
0:10:35 | he values for each model each personality for deletions insertions and substitutions something very or |
---|
0:10:42 | stable and it is hard to tell which one okay |
---|
0:10:45 | is doing the best overall we |
---|
0:10:48 | simplified it included a slot error rate |
---|
0:10:52 | where it is the sum of those force semantic errors over a number of slots |
---|
0:10:59 | are actually you |
---|
0:11:00 | this is modelled after the word error rate |
---|
0:11:03 | and how we have more similar table where you can |
---|
0:11:07 | actually see the difference between the models and you can see that no stupid as |
---|
0:11:12 | performed the best but also that this is |
---|
0:11:14 | we had a cost and stylistic variation and that |
---|
0:11:18 | context really |
---|
0:11:19 | that much worse |
---|
0:11:24 | so |
---|
0:11:26 | that was rated the semantic quality and now we want to measure stylistic variation |
---|
0:11:31 | so first we take a shared a text and should he to see how very |
---|
0:11:36 | the results are |
---|
0:11:37 | the context model a performs the best directly models and is closest to the original |
---|
0:11:44 | personage training data so it is why is varied of the original data |
---|
0:11:53 | we also want to measure the models are the fully reproducing it pragmatic markers |
---|
0:12:00 | at each personality user |
---|
0:12:03 | so we |
---|
0:12:06 | calculated for all right marking set of here a region |
---|
0:12:10 | and then we get the pearson's correlation between |
---|
0:12:15 | a personage training data and the output for each |
---|
0:12:20 | model and each personality |
---|
0:12:22 | so the context model that for most of the personalities except for very important can |
---|
0:12:29 | perform better |
---|
0:12:31 | no stew |
---|
0:12:32 | it has positive value for two of them agreeing projections right are actually negatively correlated |
---|
0:12:39 | i think this is because conscientious |
---|
0:12:42 | actually easy to provide markers |
---|
0:12:45 | mostly that are the request confirmation and an initial rejection which are generally at the |
---|
0:12:50 | very beginning for the very end of the sentence which makes them at your |
---|
0:12:55 | to reproduce and soon as you pretty much exclusively just one does |
---|
0:13:00 | so it's very similar conscientious but |
---|
0:13:06 | so we did pretty much the same thing for a rapid creation |
---|
0:13:09 | operations will be counting occurrences of our age |
---|
0:13:14 | and the pearson's correlation between each rate in the test data |
---|
0:13:19 | again context is performing a better than |
---|
0:13:23 | each other |
---|
0:13:24 | except for one case this time disagreeable |
---|
0:13:28 | hand |
---|
0:13:29 | you see that actually used for pretty well here |
---|
0:13:33 | it does better than okay well a couple of instance since we think this is |
---|
0:13:37 | because |
---|
0:13:40 | i patient operations like is that they need to be you can have a sentence |
---|
0:13:45 | with our own |
---|
0:13:46 | and so you'll see that it is an excellent pragmatic markers but less |
---|
0:13:51 | create a location operations this is morgan opportunity to do better with the application to |
---|
0:13:58 | the pragmatic markers |
---|
0:14:00 | the overall are context model |
---|
0:14:03 | gives us the best next a systematic quality and stylistic variation |
---|
0:14:11 | so we also evaluated a the quality of the work is all easy and turk |
---|
0:14:17 | study e |
---|
0:14:18 | so our best performing model |
---|
0:14:20 | the context model and tested whether people |
---|
0:14:24 | can recognize personality |
---|
0:14:27 | as a baseline we randomly select a set of ten unique or mars from training |
---|
0:14:32 | and their references so we gave its workers is very three hundred and i would |
---|
0:14:41 | entail in that an item |
---|
0:14:43 | inventory |
---|
0:14:45 | tp and we also i |
---|
0:14:48 | the dm's range how natural it that the utterance down |
---|
0:14:57 | so we evaluate it very unique or mars |
---|
0:15:00 | we generated from the context modeling task |
---|
0:15:04 | we had five tokens per hit me measured how |
---|
0:15:09 | frequently the majority select the crack cheapy item |
---|
0:15:12 | we were opposite item |
---|
0:15:14 | to get a ratio which is no all i highlighted |
---|
0:15:20 | personage |
---|
0:15:20 | that is had over fifty percent or |
---|
0:15:23 | all of the p i n |
---|
0:15:25 | model context |
---|
0:15:27 | that's right over fifty percent and everything except agree well conscientious |
---|
0:15:32 | yes or |
---|
0:15:33 | the lowest percentage does seem to the trend |
---|
0:15:37 | personage just a little bit lower |
---|
0:15:42 | we also got be a great rating from one to seven scale from the t |
---|
0:15:48 | v and we basically a average rating of the |
---|
0:15:52 | which of the case so it's agreeable with the average rating for the agreeable |
---|
0:15:58 | in and |
---|
0:16:00 | it's but a |
---|
0:16:03 | the average for all the time for percentage most of them for the context model |
---|
0:16:08 | agreeable it it'd |
---|
0:16:13 | about |
---|
0:16:16 | that the same and then for unconsciously and you know |
---|
0:16:21 | condescension it also has a little better than the original personage |
---|
0:16:29 | we also the nationalist rating again one to seven |
---|
0:16:35 | i |
---|
0:16:35 | the model contact again hands couple instances where it actually sounds a little more natural |
---|
0:16:41 | than the original data so disagreeable |
---|
0:16:44 | then there anything with an conscientious |
---|
0:16:47 | people are models that's where k |
---|
0:16:51 | more natural in overall results |
---|
0:16:58 | so we also tested our model for general |
---|
0:17:02 | its ability |
---|
0:17:04 | and we tried to generate what matches characteristics of |
---|
0:17:09 | all personalities so for me to |
---|
0:17:12 | the disagreeable voice and the conscientious way |
---|
0:17:16 | and we combine them and that are you sentences |
---|
0:17:20 | is that what extent to one example |
---|
0:17:23 | our model out what a fool a disagreeable and point here just personality |
---|
0:17:29 | we |
---|
0:17:30 | to evaluate it we look at |
---|
0:17:32 | e average occurrence of the different features |
---|
0:17:37 | are two examples |
---|
0:17:38 | that are pretty there is no the fury are location is a lot more common |
---|
0:17:42 | in this variable |
---|
0:17:43 | in conscientious |
---|
0:17:44 | and when we combine them the results of sorted in the middle and same with |
---|
0:17:49 | the |
---|
0:17:50 | expletive handwriting or it's much more common in disagreeable |
---|
0:17:54 | conscientious |
---|
0:17:55 | okay you can okay result that is what again between so it really think indicate |
---|
0:18:00 | that models not sticky |
---|
0:18:03 | one way or other is |
---|
0:18:07 | sort of averaging them and getting in our hands data well |
---|
0:18:12 | and this is from a model that we only trained on a single personality train |
---|
0:18:17 | it on x personalities so word tells me to have a paper speech |
---|
0:18:23 | a neural model to voice models p-expression novel personality or we can t s |
---|
0:18:31 | o solution we show |
---|
0:18:34 | and do not models used to generate a but that is both syntactically and semantically |
---|
0:18:38 | correct |
---|
0:18:39 | based on each week generation challenge |
---|
0:18:42 | in b and are role models be able to use stylistic variation in a controlled |
---|
0:18:47 | setting |
---|
0:18:48 | based on the type of data and they are trained on a number of supervision |
---|
0:18:52 | there are given in training you're currently |
---|
0:18:55 | focusing on can swarms of stylistic variation |
---|
0:19:00 | our dataset is available at that link |
---|
0:19:37 | i |
---|
0:19:41 | well |
---|
0:19:43 | so all these results are actually people have test i don't is with first which |
---|
0:19:49 | i |
---|
0:19:52 | we got around the same results as a as it were really just one show |
---|
0:19:56 | that |
---|
0:19:57 | the neural that is the model context is it's |
---|
0:20:01 | still producing these personalities and weight is recognisable so |
---|
0:20:07 | people can still tell the conscientious voice |
---|
0:20:10 | is conscientious and i |
---|
0:20:15 | it's not just that we're looking at these pragmatic markers and think that repeat it |
---|
0:20:19 | is actually still same personality training |
---|