0:00:15okay a my name is mean a
0:00:18and i from the natural language in dialogue systems via
0:00:22you see santa cruz preventing paper controlling personalities with that of variation
0:00:28the neural that language generators
0:00:34so
0:00:37the problem in that work on the task oriented neural nlg structured data has focused
0:00:44on
0:00:44a weighting semantic errors which has resulted in
0:00:48by logistically an interesting outputs
0:00:50so for example
0:00:52i two references with the
0:00:54locus generating coca is a stronger describe our holiday and coca
0:01:00is the low really construct your expressed by holiday and
0:01:04both realise
0:01:06although the attribute in the mr but that's really all that is
0:01:10so our goal is to train a neural nlg user semantics and stylistic
0:01:16variation by controlling input data and the amount of supervision available to the model
0:01:27really need lots of training data to learn the style
0:01:31so we use a statistical generator personage which is able to generate data is being
0:01:38and the big five personality to create stylistic variation
0:01:43we use
0:01:44i personalities agreeable conscientious disagreeable extrovert and conscientious
0:01:51two
0:01:53to generate
0:01:54data using train and dev mars each e
0:01:58challenge so personage you can systematically control
0:02:02the types of styles variational produced and we know which had to
0:02:06stylistic variation in our in
0:02:08it's reproducing so there are two examples
0:02:11the screen one for the agreeable personality and one for the disagreeable personality
0:02:16the remote personality and
0:02:19part markers like i e
0:02:21and the disagreeable one hand and the size or like effectively
0:02:25and or
0:02:27conversation
0:02:29and disagreeable is broken up into five sentences for their support agreeable
0:02:35all you are in one sentence
0:02:44for our data distribution we have i think eight hundred fifty five total utterance is
0:02:49generated from three thousand seven hundred and eighty four unique more and seventeen thousand seven
0:02:55hundred and seventy one references for personality and protest we generate one thousand three hundred
0:03:01and ninety total utterances
0:03:03for rendering from a unique are you get one
0:03:06preference for personality from the fact personality
0:03:10so with this data the mr our problem
0:03:13it rate and that's that each we challenge and have them are taken is directly
0:03:19from the text
0:03:20at each we challenge
0:03:22so the distribution of this data is problem but challenge so
0:03:27the training data number of attributes gram are a bit more balanced like
0:03:33mostly for five and the attributes
0:03:36gram or and a test data
0:03:39has a lot
0:03:40that's quite a bit more attributes per more mostly seven or eight actually
0:03:46we think this makes the test a little or in the training
0:03:52so there are five types of a rotation operation that personage can here
0:03:57do you combine the actual mr there's the period operation so x or y it
0:04:03is q x and y with three
0:04:07in conjunction operation x y and i e
0:04:12where x is why don't you and the
0:04:16the different areas the lack of four
0:04:19and the also q which is
0:04:22has why also it
0:04:24e
0:04:26aggregation operations are necessary to combine
0:04:30actually together with the distribution
0:04:33most of the personalities use most of the aggregation
0:04:37operations that there is still some
0:04:40brightly so
0:04:41it just agreeable voice
0:04:44using period operational lot more than all of the other one with
0:04:48and extrovert
0:04:50is a lot more likely to use the conjunction operator then the other
0:04:56what is so we can still see that was different
0:05:02you're the sample and pragmatic
0:05:05marker except me
0:05:06that personage can
0:05:08used
0:05:09the by now that we have had about thirty one i binary operators
0:05:15so some of these are the correct requests confirmation so that he what we can
0:05:19find on a
0:05:21exactly the restaurant be emphasized for
0:05:26like really basically actually just competing mitigation
0:05:30the come on obviously rewritten note that
0:05:34and include markers such as
0:05:36however we need it for and
0:05:39and has a product
0:05:41markers are necessary
0:05:43or a grammatically correct
0:05:45sentence and what utterance
0:05:48be you can see that not all over the personalities
0:05:52you every harmonic
0:05:54operator and i can occur
0:05:58you end up with some like tag question is really only used by agreeable
0:06:05many of them are used by multiple so
0:06:08what it is pretty much equally used by disagreeable and conscientious and some of a
0:06:14little bit less talent so you know it's mostly whose make extra or but agreeable
0:06:19will also
0:06:21you
0:06:22you know marker
0:06:27so we begin with the refined system from two sec at all and we have
0:06:32three different models with varying levels of supervision
0:06:36then there's a model the nose to model directly follows the baseline model has no
0:06:42supervision token model as a single okay
0:06:46specifies the personality
0:06:48similar to machine translation problems
0:06:51and our context model directly encodes that thirty six that parameters the pragmatic marketing aggregation
0:06:57operations
0:06:58from personage as context and if you forward network
0:07:06here's an example how what
0:07:08from our context model
0:07:12i e
0:07:13realization i had no application and no pragmatic markers so
0:07:19each attribute is that it on sentence and
0:07:22the a variety it's just realising attributes
0:07:27sar
0:07:28and i have three examples from personalities first agreeable
0:07:33let's see what we can finally it is well is we could use a rating
0:07:38also with an italian restaurant riverside moderately priced notice right so
0:07:42also with it in a really friendly easy
0:07:45so it had a confirmation in its hands and knowledge and justifications bayesian well and
0:07:53then it has a high as to the end and it also he's is also
0:07:59q for aggregation
0:08:01the second one
0:08:03i and twenty inches voice
0:08:06god i don't know it's really said at separating also it is moderately priced restaurant
0:08:11so italian place in riverside and you think you'd friendly
0:08:16expletive got
0:08:18and an initial rejection with the i don't know and this use this
0:08:23still uses the also q there is also with you
0:08:27the final four with
0:08:29in extrovert
0:08:30basically it's really is an italian place of this right and actually moderately priced the
0:08:36riverside decent reading okay brightly and it's a you know
0:08:40so it's one hand a year to emphasize errors
0:08:45basically actually and you know marker and only uses merge in conjunction and
0:08:52although he's just one sentence in there is no use of the period operation
0:09:00so
0:09:02automatic metrics
0:09:05really or just
0:09:07the
0:09:10i really you know why
0:09:12it systems that they don't just although the training data is a really is similar
0:09:18to the training data and i inherently bad
0:09:22for
0:09:23stylistic variation
0:09:24so
0:09:26our context model does perform the best but numbers may be a great
0:09:32we are mostly showing be specific completeness
0:09:35and we propose a new metrics for evaluating semantic causality and stylistic variation
0:09:43so first we evaluate the quality
0:09:46using four types of errors from the actual you're sitting here are in reference to
0:09:52the realizations so
0:09:55the first is deletions which is one
0:09:57and action you near bar it is not rely in the what
0:10:02reputations which is where a here
0:10:05actually you in the reference multiple times
0:10:08substitution which is where
0:10:11actually you is i think in a year more and the reference considered value
0:10:17so for example if you are marked it was italian restaurant and referent a french
0:10:23restaurant
0:10:24what he wants everything you know
0:10:26and then hallucinations which is one reason actually reference that was not new original mr
0:10:32so we have in table here that have
0:10:35he values for each model each personality for deletions insertions and substitutions something very or
0:10:42stable and it is hard to tell which one okay
0:10:45is doing the best overall we
0:10:48simplified it included a slot error rate
0:10:52where it is the sum of those force semantic errors over a number of slots
0:10:59are actually you
0:11:00this is modelled after the word error rate
0:11:03and how we have more similar table where you can
0:11:07actually see the difference between the models and you can see that no stupid as
0:11:12performed the best but also that this is
0:11:14we had a cost and stylistic variation and that
0:11:18context really
0:11:19that much worse
0:11:24so
0:11:26that was rated the semantic quality and now we want to measure stylistic variation
0:11:31so first we take a shared a text and should he to see how very
0:11:36the results are
0:11:37the context model a performs the best directly models and is closest to the original
0:11:44personage training data so it is why is varied of the original data
0:11:53we also want to measure the models are the fully reproducing it pragmatic markers
0:12:00at each personality user
0:12:03so we
0:12:06calculated for all right marking set of here a region
0:12:10and then we get the pearson's correlation between
0:12:15a personage training data and the output for each
0:12:20model and each personality
0:12:22so the context model that for most of the personalities except for very important can
0:12:29perform better
0:12:31no stew
0:12:32it has positive value for two of them agreeing projections right are actually negatively correlated
0:12:39i think this is because conscientious
0:12:42actually easy to provide markers
0:12:45mostly that are the request confirmation and an initial rejection which are generally at the
0:12:50very beginning for the very end of the sentence which makes them at your
0:12:55to reproduce and soon as you pretty much exclusively just one does
0:13:00so it's very similar conscientious but
0:13:06so we did pretty much the same thing for a rapid creation
0:13:09operations will be counting occurrences of our age
0:13:14and the pearson's correlation between each rate in the test data
0:13:19again context is performing a better than
0:13:23each other
0:13:24except for one case this time disagreeable
0:13:28hand
0:13:29you see that actually used for pretty well here
0:13:33it does better than okay well a couple of instance since we think this is
0:13:37because
0:13:40i patient operations like is that they need to be you can have a sentence
0:13:45with our own
0:13:46and so you'll see that it is an excellent pragmatic markers but less
0:13:51create a location operations this is morgan opportunity to do better with the application to
0:13:58the pragmatic markers
0:14:00the overall are context model
0:14:03gives us the best next a systematic quality and stylistic variation
0:14:11so we also evaluated a the quality of the work is all easy and turk
0:14:17study e
0:14:18so our best performing model
0:14:20the context model and tested whether people
0:14:24can recognize personality
0:14:27as a baseline we randomly select a set of ten unique or mars from training
0:14:32and their references so we gave its workers is very three hundred and i would
0:14:41entail in that an item
0:14:43inventory
0:14:45tp and we also i
0:14:48the dm's range how natural it that the utterance down
0:14:57so we evaluate it very unique or mars
0:15:00we generated from the context modeling task
0:15:04we had five tokens per hit me measured how
0:15:09frequently the majority select the crack cheapy item
0:15:12we were opposite item
0:15:14to get a ratio which is no all i highlighted
0:15:20personage
0:15:20that is had over fifty percent or
0:15:23all of the p i n
0:15:25model context
0:15:27that's right over fifty percent and everything except agree well conscientious
0:15:32yes or
0:15:33the lowest percentage does seem to the trend
0:15:37personage just a little bit lower
0:15:42we also got be a great rating from one to seven scale from the t
0:15:48v and we basically a average rating of the
0:15:52which of the case so it's agreeable with the average rating for the agreeable
0:15:58in and
0:16:00it's but a
0:16:03the average for all the time for percentage most of them for the context model
0:16:08agreeable it it'd
0:16:13about
0:16:16that the same and then for unconsciously and you know
0:16:21condescension it also has a little better than the original personage
0:16:29we also the nationalist rating again one to seven
0:16:35i
0:16:35the model contact again hands couple instances where it actually sounds a little more natural
0:16:41than the original data so disagreeable
0:16:44then there anything with an conscientious
0:16:47people are models that's where k
0:16:51more natural in overall results
0:16:58so we also tested our model for general
0:17:02its ability
0:17:04and we tried to generate what matches characteristics of
0:17:09all personalities so for me to
0:17:12the disagreeable voice and the conscientious way
0:17:16and we combine them and that are you sentences
0:17:20is that what extent to one example
0:17:23our model out what a fool a disagreeable and point here just personality
0:17:29we
0:17:30to evaluate it we look at
0:17:32e average occurrence of the different features
0:17:37are two examples
0:17:38that are pretty there is no the fury are location is a lot more common
0:17:42in this variable
0:17:43in conscientious
0:17:44and when we combine them the results of sorted in the middle and same with
0:17:49the
0:17:50expletive handwriting or it's much more common in disagreeable
0:17:54conscientious
0:17:55okay you can okay result that is what again between so it really think indicate
0:18:00that models not sticky
0:18:03one way or other is
0:18:07sort of averaging them and getting in our hands data well
0:18:12and this is from a model that we only trained on a single personality train
0:18:17it on x personalities so word tells me to have a paper speech
0:18:23a neural model to voice models p-expression novel personality or we can t s
0:18:31o solution we show
0:18:34and do not models used to generate a but that is both syntactically and semantically
0:18:38correct
0:18:39based on each week generation challenge
0:18:42in b and are role models be able to use stylistic variation in a controlled
0:18:47setting
0:18:48based on the type of data and they are trained on a number of supervision
0:18:52there are given in training you're currently
0:18:55focusing on can swarms of stylistic variation
0:19:00our dataset is available at that link
0:19:37i
0:19:41well
0:19:43so all these results are actually people have test i don't is with first which
0:19:49i
0:19:52we got around the same results as a as it were really just one show
0:19:56that
0:19:57the neural that is the model context is it's
0:20:01still producing these personalities and weight is recognisable so
0:20:07people can still tell the conscientious voice
0:20:10is conscientious and i
0:20:15it's not just that we're looking at these pragmatic markers and think that repeat it
0:20:19is actually still same personality training