Speech Transcript - Natural Language Generation: Creating Text

0:00:09	a good morning everyone that they have to change at least i hang up seeing
0:00:13	a speaker is to study and practically i is a senior researcher french national centre
0:00:20	for scientific research just a little area computer science a sixteen c fan
0:00:26	i think i c six test databases and have it seems like and statistical models
0:00:31	natural language she's are not an acquisition that can arise and lexical resources for nlp
0:00:37	syntactic and semantic parsing and i need technology for language learning
0:00:41	she is very natural language generation from syntactic let and has a large delay not
0:00:48	time which is a shared task and generating text from section let
0:00:53	said this morning style will be at and t planning-based critically models thinking i natural
0:00:59	language generation of are a variety of different types of a unique tasks so that's
0:01:04	baseline okay
0:01:13	can you email
0:01:15	can you hear the back
0:01:17	okay so good morning
0:01:19	thank you for being hereafter last night
0:01:24	so when i was invited so it is workshop i was i was for real
0:01:27	of course you know coming to the n i giving a talk
0:01:31	but then i was at the river a the also about the
0:01:33	thai tone of this workshop synthesis no
0:01:37	i think that the introduction say they show that i don't do synthesis i don't
0:01:41	do speech in fact i just for a context
0:01:45	but of course you know there is a link between
0:01:48	text-to-speech synthesis and generation which is what i've been working on for the recent years
0:01:54	which is that natural language generation is the task of producing text so you can
0:01:59	see natural language generation as a
0:02:02	three step two text-to-speech synthesis
0:02:05	and it is what i'm going to talk about today are going to talk about
0:02:09	different types of natural language generation tasks
0:02:14	and is and we start with
0:02:17	we know what's
0:02:20	okay
0:02:21	so i would start with a an introduction to you know how hard generation was
0:02:26	the on before deep learning or is and then i will show how
0:02:31	you know the deep learning
0:02:32	and paradigm completely change the approach to natural language generation
0:02:37	and i will talk about some issues about current your approach is to
0:02:42	text generation
0:02:45	so the vocal presence here is a joint work with the phd students and colleagues
0:02:49	which i want to name here so i met corner is a p g students
0:02:53	at the north sea where am based
0:02:55	you do that again from bob dylan university and it is well okay life on
0:02:59	is a piece is you don't to jointly student between fair in paris and all
0:03:03	c
0:03:04	are in our group goal of each that the dams that university okay push your
0:03:08	cough and the another liberal who were also puget students they're the joint supervision with
0:03:14	within a
0:03:15	and the finally and that so the su money not be the phd students with
0:03:19	mean of
0:03:21	okay so first what is natural language generation well it's the task of producing text
0:03:26	but it's very different from natural language understanding because of the input so in natural
0:03:32	language understanding the input is a text is well defined everybody agrees on that and
0:03:37	this you know large quantities of text around available
0:03:41	in natural language generation is very different in that the input can be candy many
0:03:46	different things
0:03:48	and this is actually one of the reason why natural language generation board was very
0:03:52	small see for a very long time so compared to nlu you know the number
0:03:56	of papers on energy was very small
0:04:00	and so what out the types of inputs there are basic
0:04:03	types of input data are meaning representation
0:04:07	well text
0:04:08	so data i would be that it data from data bases a knowledge bases
0:04:14	structured data then you have meaning representation that are devised by language that can be
0:04:18	produced by computational linguistic tools that basically a device to represent the meaning of a
0:04:24	sentence
0:04:25	sometimes of a text more generally of a sentence or dialogue turn
0:04:30	so sometimes you want to generate some this meaning representation for example
0:04:34	in the context of a dialogue system
0:04:36	you might want to the you know the system will the dialogue manager would produce
0:04:40	a meaning representation music which is called a dialogue turn
0:04:43	and then the generation task is to generate the texas
0:04:46	a turn so the system turn in response to
0:04:50	to this meeting representation
0:04:52	and finally you can generate from texas and that would be in our applications such
0:04:56	as text summarization text simplification sentence compression
0:05:03	so those are the main types of input another complicating factor is that the
0:05:07	what we call the communicative goal can be very different sometimes you want to verbalise
0:05:12	so for instance if you have a knowledge base you might want the system to
0:05:16	just verbalise the content of the knowledge base
0:05:18	so is
0:05:19	you know readable by human uses but off you know all the goals would be
0:05:23	like to respond to a dialogue turn
0:05:25	to summarize a text or to simplify text or even to summarize the content of
0:05:30	a knowledge base
0:05:31	so those two factors with the
0:05:33	means that
0:05:34	up renewal
0:05:36	natural language generation was divided into many different sub fields which didn't help
0:05:41	given that the community was already pretty small
0:05:44	and there wasn't much
0:05:45	you know communication between those subfields
0:05:48	so why did we have this difference amphibians because essentially the problem is very different
0:05:56	so in the when you are generating from data there is a big gap between
0:06:01	the input and the output six of the input is a data structure they died
0:06:05	does not like the text at all it can be even you know result from
0:06:09	signal processing can be numbers from with r
0:06:14	at call numbers whatever
0:06:16	so
0:06:16	the input data is very different from texts and so the to bridge this gap
0:06:20	you have to do many things and essentially what you have to do is to
0:06:25	decide what to say
0:06:26	and how to say eight
0:06:28	what to say is more in the eye problem so deciding you know what part
0:06:32	of the data are you want to select two
0:06:34	actually verbalise time because if you were you know if you verbalise all the numbers
0:06:37	in the you know given by a sensor it would just make no sense at
0:06:41	all of the
0:06:42	so without in text with make no sense at all
0:06:44	and so you have a content selection problem then usually have to structure the set
0:06:49	the content that you've selected into a at structures of the resemble the text
0:06:55	then so this would be more like ai planning problem so in this is actually
0:06:59	that was handles often with planning techniques
0:07:02	and this more linguistics once you have to text structure how do you convert it
0:07:08	into well-formed text
0:07:09	and there you have to make many choices so generation is really a choice problem
0:07:15	because there are many different ways of realizing of things
0:07:18	so you know have problems such as you know
0:07:21	well to choose to lexicalising that every symbol which referring expression to describe an entity
0:07:27	known hear its rays are you going to
0:07:29	use a pronoun
0:07:31	a proper name
0:07:32	aggregation how to about repetitions this is basically about choosing deciding when to using such
0:07:38	as ellipsis so-called in addition to avoid redundancy in the output x
0:07:42	even for there is redundancy in your or not
0:07:45	see but repetition signal knowledge base
0:07:49	things actually so
0:07:50	basically generating from data at the consensus was there was this a big and then
0:07:54	g pipeline where you had to mobile all of these
0:07:57	subproblems
0:08:00	if you generate a meaning representation the task was seen that's completely different
0:08:04	partly i mean a mainly because the gap between in your presentation and the sentences
0:08:10	is much smaller as in fact and in fact scissors meaning representation not be a
0:08:15	by linguists
0:08:16	so the
0:08:17	consensus here was that
0:08:20	if you can have a grammar that describes you can you know
0:08:25	no and the rubber grammars that describe basically the mapping between text and meaning
0:08:31	and c and because it's a grammar it also include this notion of syntax so
0:08:34	it ensures that that's the text will be well-formed
0:08:38	so the idea was you have a grammar that define this mapping between this association
0:08:41	between text and meaning
0:08:44	you could use it in both direction either you have it x and you use
0:08:47	the grammar to derive its meaning
0:08:49	but you can use it for generation you start some of the meaning
0:08:52	and then you user grammar was to decide you know what is the corresponding sentence
0:08:57	given by the grammar
0:08:59	and of course is grammar that soon as you have lots coverage they become very
0:09:02	ambiguous so there's a huge ambiguity problem
0:09:06	it's not tractable basically you get you know tiles on some intermediate results thousands of
0:09:12	outputs
0:09:12	and the initial search space is used so you combine usually you combines this grammar
0:09:18	with some statistical modules that will basically designed to reduce the search space
0:09:23	and to limit the output to one always you outputs
0:09:30	and finally generating from texas here again very different approach the main point them and
0:09:36	consensus again was that you generate from text they are basically foreman operations you want
0:09:41	to model i mean all of some of them depending on the application which is
0:09:46	pleats rewrite real during delete
0:09:49	thing is about learning went to split a long sentence into as the several sentences
0:09:54	for example the simplification where you want to simplify text
0:09:59	we ordering is just moving constituent around the words around
0:10:04	again because maybe you want to simplify the paraphrase is another text to text the
0:10:09	generation application
0:10:12	you want to rewrite again maybe to simplify of the paraphrase so we write a
0:10:16	word or rewrite the phrase
0:10:18	and you want to decide what is what you can see deeds in particular if
0:10:21	you're doing simplification
0:10:24	so in general free many free very different approaches to those free task depending on
0:10:30	but the input the is
0:10:32	and this completely change with annual approach so what the new one approach that it
0:10:37	is it's really completely change the field so now is the before generation was a
0:10:42	very small fields and now at a c l so the main completion linguistics conference
0:10:48	generation is one of the top
0:10:51	you know he gets the top number of submissions i think the second ranking for
0:10:54	number of submission in that field
0:10:57	so it's really
0:10:58	change completely
0:10:59	and why changes because is encoderdecoder framework willie allows you to model or free task
0:11:07	in the same way
0:11:09	so all the techniques the
0:11:10	methods that you can never up to improve the encoderdecoder framework
0:11:14	there will be novel but you know
0:11:17	it is common framework which makes it much easier to
0:11:20	take ideas from one field from one stuff it or not
0:11:23	so the encoderdecoder framework is it's very simple you have you input and it can
0:11:28	be data
0:11:29	text or meaning representation
0:11:31	you are encoded into a vector representation and then you use the power of the
0:11:37	in your language model to the code so that the decoder is going to produce
0:11:41	the text
0:11:41	one word at a time using recurrent network
0:11:44	and we know you know that
0:11:46	no language model much more powerful than previous
0:11:49	mobile language model because the context of
0:11:52	in limited amount of context into account
0:11:56	okay so we have this unifying framework but what i want to doing this talk
0:12:01	at a
0:12:03	of course the that you know the problem still remain the task are different and
0:12:07	you still have to handle them somehow
0:12:09	so whether we'll doing this talk is so you based on some work we've been
0:12:13	doing focus on two main points how to improve encoding or how to adapt encoding
0:12:19	to various energy task
0:12:21	and if i have time to talk a little bit about training data again off
0:12:25	and you know with they stiff this problems that findings disparity that so it's all
0:12:29	supervised approach usually and unsupervised mean you have this training data in this case the
0:12:34	training data has to be
0:12:36	the texts in the input
0:12:38	but this inputs candy already how to get so this meeting a presentation you know
0:12:42	where do you get them from or even the you know getting an alignment a
0:12:47	parallel corpus between
0:12:49	database fragment and the corresponding text is also very difficult to get right
0:12:52	so often you have you don't have much data
0:12:55	and of course this neural networks they want a lot of data so of and
0:12:58	you have to be clever about what you do with the training data
0:13:09	okay something coding and we talk about three different points modeling graph structured input
0:13:15	so we see that
0:13:18	the encoderdecoder framework initially at least in the first steps
0:13:22	the encoder was usually always a recurrent network so
0:13:27	no matter whether the input was that x or meaning representation of a graph
0:13:32	you know a knowledge base
0:13:33	it was people where using this recurrent network i think order because you know the
0:13:38	encoderdecoder framework
0:13:40	was very successful machine translation and people with all the building on this for doing
0:13:44	this
0:13:45	but of course you know after a while people for the about some of the
0:13:49	input is graph so maybe it's not such a good idea to model it as
0:13:52	a sequence
0:13:53	so let's do something of so we talk about
0:13:55	how to model
0:13:57	graph structure input
0:13:59	then i would talk about generating from texas where here i will focus on the
0:14:05	an application where the input is a very large quantities of text and the problem
0:14:10	is that if you are you know neural networks are only so good that encoding
0:14:14	large quantities of text so it's snowing big in fact for machine translation that
0:14:18	you know the longer the input these
0:14:20	so both the performances
0:14:21	and here we're not talking about long sentence is well within talking about single wrong
0:14:25	text to have that as an tokens or something so what do you do in
0:14:29	that case if you still want to do text to text the generation
0:14:32	and i will talk a little bit about jen normalization so some device that can
0:14:36	be used in some application
0:14:39	again because the data is not so big how can you
0:14:43	improve its so you can generalise better
0:14:46	okay so first encoding graphs
0:14:49	so i say the input so graphs
0:14:53	they all curve for example if you have not answering the input or meaning representation
0:14:58	so here you have an example from the mr to two thousand and seventeen challenge
0:15:04	where is the task was
0:15:06	given
0:15:07	meaning representations of this amr is means abstract meaning representation you part be considered as
0:15:13	a matter to match basically it's the it's a meaning representation you know the rights
0:15:17	which is can be written like it's written on the right
0:15:20	but basically you can see that the graph with the note that the concepts and
0:15:23	the edges of the relation between the concept
0:15:26	right so here the
0:15:28	this meaning representation idea would correspond to the sentence here
0:15:31	us officials had an expert meeting group meeting in january two thousand two in new
0:15:36	york and then you see that the you know that the at the top of
0:15:39	the tree you have z holds concept and then the arg zero with the person
0:15:46	and then account even read it's but contrary to the basically so united state but
0:15:50	then there are some concepts
0:15:53	so the task was to generate from these from this mr and the mr can
0:15:58	this you know the graph
0:15:59	was another challenging two thousand and seventeen which is how to generate from set saw
0:16:04	a the f triple
0:16:06	and so here the
0:16:07	we what we do this we extracted the sets about the f triple from to
0:16:11	be paid which
0:16:12	we had a method to ensure that this that's about the f triple where
0:16:18	could be match into a meaningful social texts
0:16:22	and then we had crowdsourcing people associating the sets of triples with the corresponding text
0:16:27	so that i said in this case where the pilot data set with the input
0:16:30	with a set of triples and the output with this text that was available i
0:16:33	think the content of this triples
0:16:35	so you probably can sit here
0:16:37	but for example the exact
0:16:39	example i showed here is like you have free triple is that repere is the
0:16:42	subject pretty property object
0:16:45	the first simple sets junk the harp state and then the date john gonna have
0:16:49	birthplace and then the place and then shown to how occupation fighter pilots so you
0:16:53	for example you have this very triples
0:16:55	and then that is would be to generate something like john blah ha born in
0:16:59	some of you on nineteen forty tool with twenty six worked as a fighter pilots
0:17:04	so this was or the task
0:17:06	and the point again here is that
0:17:09	so when you are generating from there are like doing here then this data can
0:17:13	be seen as a graph where the
0:17:15	well that's a graph of the subject in the pair and the object of the
0:17:19	entities in your triples and the edges of the relation between a triples
0:17:27	okay so i they send initially people where and you know apply for these two
0:17:31	task initially people with that
0:17:33	it simply using recurrent network so that we have linear rising is a graph to
0:17:37	just to a service offers a graph using
0:17:40	you know some kind of traversal methods
0:17:42	and then the so they then they have the sequence of tokens
0:17:46	and then they just encoding using a recurrent network
0:17:50	and so here use in example where you know the tokens
0:17:54	input to the rnn always tn are basically the concepts and d and the relation
0:18:00	that are present in the meaning representation
0:18:03	and then you the code from that
0:18:05	okay
0:18:06	so
0:18:07	of course that problems
0:18:08	intuitively it's not very nice you know pure modeling a graph as a sequence well
0:18:14	and then also there is technically that some problems that occur in the in that
0:18:20	not a
0:18:21	local dependency that at low content the graphical we could become long range
0:18:27	so
0:18:29	it's okay so these two edges here they are the same distance from eight writing
0:18:35	the initial graph but now when it's you know right you see that the crew
0:18:39	members of the first stage
0:18:40	is much closer to the to the a node which is that in the in
0:18:45	than this one right so you really
0:18:47	the linearization is creating those long range dependencies and then again we know that lstms
0:18:52	are not very good at dealing with long-range dependencies
0:18:55	so also you know technically you think well maybe it's not such a great idea
0:19:00	okay so people have been looking at this and they propose the various a graph
0:19:04	encoders so the idea is no instead of using an lstm to encode your linearise
0:19:09	graph you propose to you just use a graph encoder which is going to lead
0:19:14	is going to models of relation between the nodes inside a graph
0:19:19	and you and then you the code from the output of the graph encoder so
0:19:24	there were several proposal
0:19:26	which i won't go into detail via basically the amount in cohen propose a graph
0:19:30	published in the network
0:19:31	and this to approach the uses some graph a recurrent network
0:19:37	okay so we build on this idea here
0:19:40	we
0:19:41	at this you know i started this is introduction of putting your own energy because
0:19:46	i think it's when important to know all about the history of energy
0:19:51	to have ideas about how to improve the new one approach and here is this
0:19:55	proposal was really based on the previous approach the previous work on a grammar based
0:20:01	grammar based generation so this idea that you have a grammar is that and that
0:20:06	you can use to produce a text
0:20:08	so in this pre in your work
0:20:12	what people show this okay you have to a grammar and you have meaning representation
0:20:16	then you use the grammar to decide to tell you
0:20:20	which sentences that describe our associate
0:20:24	with this meaning representation
0:20:26	so you
0:20:27	see it's like it's you know it's not good parsing problem
0:20:30	if i say you know you have a sentence you have a grammar
0:20:32	and then you want to decide what are the meaning representation of the syntactic tree
0:20:36	associated basis grammar with the sentence
0:20:39	it's a parting problem
0:20:40	so all i'm saying what i'm doing it's other reversing the problem
0:20:44	instead of starting from the text that stuff of the meaning representation that say what
0:20:47	you know what that's a grammar tells me audio the okay sentence si that's to
0:20:51	say to do this sentence
0:20:52	so
0:20:54	it was a parting problem and then people started working on this reverse a parting
0:20:59	problem to generate sentences
0:21:01	and they found it was very hard problem because of all this ambiguity
0:21:04	and they had like two types of algorithm bottom and top down
0:21:07	you know eyes are you start from the from the meaning representation and then you
0:21:11	tried to be of the
0:21:12	it's really a syntactic tree that is allowed by a grammar and you get out
0:21:16	of that you get the sentence or you got top-down so you just user grammar
0:21:20	and try to be of the relation that are going to map you in jail
0:21:23	meaning representation
0:21:25	so there were these two approaches and they both had problems and what people in
0:21:28	the end it
0:21:29	if they combine both approaches so they use both top-down and bottom-up they have some
0:21:34	he breeds algorithm which was used which we are using both top-down and bottom-up a
0:21:37	information
0:21:39	so here this is what we did more that's we
0:21:42	the idea was okay and those graph encoders the and they have a unique
0:21:48	representation graph encoding of the input graph of the input meaning representation
0:21:52	what we want to do is to
0:21:53	well they're this idea that both bottom-up and top-down information are important
0:21:58	so we are going to encode each node in the graph using two encoders
0:22:03	one that's is that goes
0:22:04	basically top-down from the graph and the others that goes bottom-up for the graph so
0:22:10	is what it's gives us is that each node in the graph is going to
0:22:13	have
0:22:14	two encodings of buttons that reflect the top down view of the graph and the
0:22:18	other
0:22:19	the bottom-up view of the graph
0:22:21	what and that so in terms of number
0:22:24	we could show of course you know the weather with independence that
0:22:26	you know the we could
0:22:30	outperform the state-of-the-art so those are with the state-of-the-art so this is a more recent
0:22:33	one which are of course we are no longer state-of-the-art
0:22:38	brenda and the time we we're right so without improving a little bit over as
0:22:44	its previous approaches
0:22:46	more importantly it's all those numbers i don't what was it that was sitting here
0:22:51	blah so of course it's there's always runways evaluation is always very difficult to evaluate
0:22:56	those
0:22:57	generated text
0:22:58	because you don't want to look at them one by one what you can
0:23:02	you have to the human evaluation in fact side but if you have large quantities
0:23:06	and if you want to compare many systems you have to have an automatic metrics
0:23:09	so what people use these learn from machine translation and they are well known problems
0:23:13	which is you know you can generate a perfectly correct sentence that match the input
0:23:17	practically
0:23:18	but if it does not look like the reference sentence which is what you compute
0:23:22	your blue against then it will get very low score
0:23:25	so you have to have some other evaluation or should try hadley
0:23:30	so what we did this one on problem with neural network is semantic adequacy of
0:23:35	and they
0:23:36	the generate very nice looking texts right because this
0:23:38	you know language models are very powerful but often
0:23:42	the normal to match the input so it's a bit problematic you know because if
0:23:47	you when you want to have a generation application
0:23:49	it will it has to match input otherwise in right
0:23:51	it's very dangerous the asian in a way
0:23:54	so
0:23:55	what we try to do here is we wanted to measure the semantic adequate because
0:23:59	the semantic adequacy of a generator
0:24:01	meaning
0:24:02	how much that it to match you know how much the generated text
0:24:06	match the input
0:24:08	and then we what we did this we use the
0:24:11	the textual entailment system that basically give and give a sentence tells you whether the
0:24:16	first one entails the other
0:24:18	so is the first
0:24:19	you know with the second sentence implied so entailed by the first sentence
0:24:23	and then if you do it both ways
0:24:25	on the
0:24:26	owing to sentence si so that's being t s q and that's to intel speed
0:24:30	then you know the t and q are semantically equivalent right
0:24:33	logically that would be the fink
0:24:36	so we did something similar we wanted to check semantic equivalence on text
0:24:41	we use these tools that have been developed in competition linguistic to determine whether
0:24:45	two sentences on a relation entailment
0:24:47	and we looked at of direction between so we're comparing the reference and the generated
0:24:53	sentence and what you see here is that the always the graph approach is much
0:24:59	better
0:25:00	at that i mean at producing sentence see that are entailed by the reference
0:25:04	and also much better
0:25:06	it's producing
0:25:08	sentence is that entails a reference
0:25:14	we also the human evaluation
0:25:17	where basically the way to questions to the human evaluators
0:25:21	is it semantically it quite that the output x match the input
0:25:25	and it it's readable and then again you see so this is the in orange
0:25:30	this is our system and the result the sequence systems and you see that is
0:25:35	a large improvements
0:25:36	so this you know this all points to direction where you know using the graph
0:25:41	encoder ways you have a graph at least
0:25:43	i meaning representation of the graph is a good idea
0:25:47	okay another thing we found this
0:25:49	it's also a valuable in its often to combine local and global information local information
0:25:57	meaning local to the node in the graph
0:26:00	and global sort of giving information about the structure
0:26:03	all the surrounding graph
0:26:06	so in this so this is that it is still the same the
0:26:11	dual bottom-up so this is top-down bottom-up souls
0:26:14	this is a picture of the system
0:26:15	we have this graph encoder that eh could top down view of the of the
0:26:21	of the graph the bottom of view and then you
0:26:24	so you have
0:26:27	these are the than the encoding of the nodes
0:26:32	and then what you do is you
0:26:36	okay so you end up with free and could free embedding so each node one
0:26:40	embedding is basically the embedding of the label
0:26:43	the correct things is no one so the concept so that
0:26:46	so it's a word basically what i'm betting
0:26:48	and the other two of the bottom-up and top-down embedding of the node
0:26:52	and what we do is would ban same from an lstm
0:26:55	so we have a notion of context for each and would
0:26:58	which is given by you know the preceding nodes in the graph and we found
0:27:02	that this also improve our results
0:27:05	and we also apply this idea so this local plus global information
0:27:10	idea to the to another task he has a task was it's a surface is
0:27:16	another challenge on the generating from depending on who the dependency trees
0:27:20	so the idea is the meaning the input meaning representation is this case is an
0:27:24	older dependency tree
0:27:27	where the where the nodes are they created with them and so the
0:27:31	something like this
0:27:33	and then what you have to do is to generate a sentence from it so
0:27:37	basically this task has to send task one of them is how to real those
0:27:41	of them as into correct sentence
0:27:44	and then when you have the correct order
0:27:47	how to inflict the words so you want
0:27:50	for example you want apple to become apples
0:27:53	this case
0:27:57	so we worked on that so this was also some work we did with you
0:28:01	have any push you coughing in time slot
0:28:05	so what we did again it's a so then we transform basically what happened so
0:28:10	where we handle this
0:28:16	the way we handle this was
0:28:19	as follows so he i'm just i'm just focusing he on the word or the
0:28:22	problem how to
0:28:24	maps this and all the tree
0:28:26	to a sequence of or of elements
0:28:28	so i'm not talking about the world interaction problem
0:28:31	so what we do is we basically binarize achieve for us
0:28:36	so every everything become binary
0:28:38	and then we have c we use a multilayer perceptron to decide on the older
0:28:43	of each child's with respect to the head so here we're going to this that
0:28:48	we have a
0:28:49	we have a
0:28:51	we build a training corpus where we say okay i know you know from the
0:28:54	corpus from the from the reference that i proceed likes
0:28:59	et cetera and then you so this is a training corpus and the task is
0:29:02	medically given to
0:29:03	given the child and ahead
0:29:06	and the parent
0:29:06	how do your there's m is apparent first saw is apparent second
0:29:11	so we where doing so it's and then we found again that combining local always
0:29:16	global information helps
0:29:18	and this is the this is a picture of the of the model
0:29:22	you have that the embedding for your two nodes
0:29:25	and you concatenate them so you build in your presentation
0:29:28	and then you have
0:29:30	the unveiling although the of the normal also know that are below other parent node
0:29:35	so the subtree the is dominated by the by the parent node
0:29:38	and again we found that you know if you combine these two information
0:29:42	you get much better result in the world ordering task
0:29:45	so that what this shows is that
0:29:47	taking into account in this case the subtree
0:29:51	you know that
0:29:51	top down view of the node of that node
0:29:54	hams and it's really helps
0:29:57	so here you say you again have this bleu score this is the basic question
0:30:00	is one where you do data expansion talk about it later and this is the
0:30:06	one with the new encoder so when you when you do take into account this
0:30:09	additional global information so you see this quite a big improvement
0:30:16	okay so this was a bad decoding graph
0:30:19	what i want to talk about now is a bad
0:30:21	what you do with what you what can you do when you know the
0:30:25	the input text that you have to have there is very large
0:30:32	so in particular we look at two different tasks
0:30:42	we looked at the different is one of them is question answering on free phone
0:30:45	for more web text and the others multi document summarization
0:30:50	the first task is you have you have a query
0:30:56	and basically what you're going to do is you're going to retrieve a lot of
0:30:59	information from the web
0:31:01	meaning a lot meaning something like to know that doesn't tokens
0:31:05	so basically that the first one hundred sun hits from the web
0:31:12	and then you are going to use all six takes as input press the question
0:31:15	as input to generation
0:31:18	and the task with the generate summaries answer the question
0:31:22	so it's quite difficult is this is a text to text generation task
0:31:26	and the other one is multi document summarization
0:31:29	we use that we get some dataset
0:31:32	what
0:31:33	is that we keep at article tight also you have a title former we could
0:31:37	be the article
0:31:38	then you retrieve
0:31:41	information again some that a from the web about you know using this title other
0:31:46	query
0:31:47	and the goal is to generate a paragraph we keep at a paragraph that talk
0:31:53	about this title
0:31:54	so basically the first paragraph a week over you have to generate the first paragraph
0:32:00	of that we keep at a page
0:32:03	so here's an example you have the question so this is the a life i've
0:32:07	i do it i five is ask is you i'd if you where five so
0:32:11	with a pos it is simple language
0:32:14	so the question would be why consumers that still terrified of genetically modify logan i'd
0:32:19	organisms
0:32:21	so there is little debating the scientific community or whether that's safe or not
0:32:25	and then you retrieve some documents on the web search
0:32:28	from the web and then this is this would be in this in this case
0:32:33	the
0:32:33	the target and so
0:32:35	so not only in the input text very long but the output x is also
0:32:39	not sure what is not a single sentence is really a paragraph
0:32:44	so the question is how to encode two hundred thousand are then generate from it
0:32:49	previous work you know it took a way out in a way they basically use
0:32:54	tf-idf to select the most relevant
0:32:58	wave hit so they're not taking
0:33:01	or even sentences so that are not taking the whole results of the website so
0:33:06	just take you know they limit the results to a few thousand votes using basic
0:33:10	basically tf-idf score and
0:33:13	okay so what we wanted to do with to see could we have with their
0:33:18	way that we could encode all
0:33:20	all these two hundred thousand words that where we're three from the web
0:33:25	and then coded and use it for generation and the idea was
0:33:29	to convert the text into a graph
0:33:31	no this in this case not
0:33:34	not a new mode not a graph encoding but sweeter graph it's embody graph like
0:33:38	we used to do in
0:33:40	and information extraction and see whether that help
0:33:45	so when to see how do we do this
0:33:49	and so no so here is an example
0:33:51	the query with explain the fury of relativity
0:33:54	and here's a toy example right we have to those two documents the idea is
0:33:59	that buildings this graph allows us to reduce
0:34:04	to reduce the size of the input drastically
0:34:07	we see why later
0:34:09	so the idea is we use
0:34:12	to tools to existing tools from compression linguistics coreference resolution and information extraction tools
0:34:20	court what coreference resolution status and gives task is that it tells us what are
0:34:24	the mentioning that x that talk about the same entity
0:34:27	and then once we know was this we group them into a single node in
0:34:30	the knowledge graph
0:34:31	and the triples the transform the text into sets of triples
0:34:36	basically relation between binary relation between entities and so the band is a relations i
0:34:41	used to be at the edges in the graph and the entities that in the
0:34:44	nodes
0:34:45	so here's an example those two documents and then you have like in blue you
0:34:51	had those for mention of albert einstein they will all p combined into one note
0:34:56	and then the information extraction
0:34:58	we'll tells us that there is this triple that you can extract from
0:35:02	from this sentence here i'd but i'd sign a german theoretical physicist publish the fury
0:35:07	of relativity you can
0:35:09	this is the open i need to will tell you
0:35:12	that you can transform the sentence into these two triples here
0:35:17	no german
0:35:22	the german signals to the german no justice this one here into this triple
0:35:28	and similarly take this one here they've lobbed if you're over at t and give
0:35:31	you this triple
0:35:36	and so is the and that's how you build the graph so basically by using
0:35:39	coreference resolution and
0:35:41	information extraction
0:35:44	and another thing that is that was important was that buildings described
0:35:48	it's sort of
0:35:50	a giving us a notion of or
0:35:53	the important information or you information that is repeated in the input
0:35:58	because every time we are going to have
0:36:00	you know a different mention of the same entity
0:36:02	we will we will keep score of how many times it's entities mentioned
0:36:07	and we will use this in the graph representation to give a score to be
0:36:10	the weights
0:36:11	to each node a to each edges in the graph
0:36:15	so here if i goal either bit in more detail so you have
0:36:18	if you
0:36:20	constructs a graph incrementally by going for your sentences so you first at the sentence
0:36:23	here
0:36:24	i'd it as a graph or you are you at the corresponding triples to the
0:36:27	graph
0:36:28	and now you had this one here
0:36:30	and you see if you're real for a t v t was already mention
0:36:33	so now the corresponding node has a he's is the weight of this corresponding note
0:36:39	is incremented by one and you go on like this
0:36:42	we also have a filter operation that said you know if
0:36:45	if a sentence had
0:36:46	nothing to do with the query we don't include it's
0:36:49	right so and we do this using tf-idf
0:36:52	so to avoid including is a graph information that is totally relevant
0:36:57	okay so we built this graph and then we are going to linearise a graph
0:37:02	right
0:37:03	so it different from previous approach where we are going from sequence to graph here
0:37:08	we're going to grab from graft a sequence because the graph is too big so
0:37:11	it you know what you could time as a graph encoder but we didn't do
0:37:14	this
0:37:15	i might be the next step but this that quite the graph so i'm not
0:37:18	sure how whistles graphing coders would work
0:37:21	so we dinner as a graph but then to keep some information about the graph
0:37:25	structure
0:37:26	we use this two additional invading so it's token so we have a the encoder
0:37:31	is a transform and this case so we have since it's not a recurrent network
0:37:34	we have the position i'm betting add it to the word i'm betting
0:37:38	to keep track of where the in the sentence or word is
0:37:42	and we had these two additional embeddings that gives that's information
0:37:46	a bad you know the weight of each node and edges
0:37:49	and the relevance to the core
0:37:54	so the global view of the of the model was
0:37:58	you have your linearise graph as i said with different embeddings for four different embeddings
0:38:03	for each node or edges
0:38:05	you press it was a transformer we use
0:38:09	memory compress attention to this is for scaling better we used up cat tensions is
0:38:14	you only look at the point in the encoder which have at the top attention
0:38:22	so we encode the graph as a sequence
0:38:26	when core the query we combined involve using those that tension
0:38:30	and then we decode from that
0:38:36	so these pictures he additional or the amount of reduction you get from the graph
0:38:41	construction
0:38:42	and it's
0:38:43	and then
0:38:46	the proportion of meeting and so tokens because you might think you know okay you
0:38:50	may be choose because by compressing the text into this graph
0:38:54	by reducing the redundancies
0:38:56	maybe you lose some important information
0:38:58	but is actually not the case
0:39:00	so what the first graph shows is that
0:39:03	if you do website so you have something like two hundred thousand tokens
0:39:09	and if we you what we what we are graph construction process that it's reduces
0:39:14	to
0:39:14	or if e ten thousand tokens
0:39:16	right
0:39:17	and that we compare this with a just extracting triples on the text and not
0:39:21	constructing the graph in you see that you still have a lot of that would
0:39:24	not be enough to reduce the size
0:39:27	and it seconds a second graph shows is that
0:39:31	and it says the proportion of missing answer tokens
0:39:36	wise lower bit are missing and the tokens
0:39:40	you don't want to many
0:39:43	so we are talking about comparing with the reference and sign you want to have
0:39:46	as many tokens in your output
0:39:49	that come from the reference as possible so you don't want to many missing tokens
0:39:53	might so what this shows is the previous approach using tf-idf filtering where you don't
0:39:59	consider the whole so hundred thousand tokens but simply
0:40:02	i think it's hundred tokens
0:40:06	this is what happens if you encode the graph from this eight hundred and fifty
0:40:10	tokens that the tf-idf approaches using
0:40:14	and the seas
0:40:16	and so this is the number of meeting talk and so you know the higher
0:40:19	other words
0:40:21	but we what we see so we if we encode the everything so this is
0:40:25	the one we and core the whole
0:40:28	it's not very
0:40:29	so we encode the whole
0:40:33	one
0:40:34	with this one on that but those the what if we encore the whole input
0:40:37	text the one hundred with
0:40:40	web page
0:40:41	and so all the information we take on the web
0:40:43	you see that the actually the performance is better
0:40:53	and these are the general without again so in this case using rouge comparing against
0:40:59	or reference an answer again they are
0:41:02	issues with this so here we compare with the tf idf approach
0:41:07	with
0:41:09	extracting the tables but not constructing the graph in here was a graph and then
0:41:13	you see you always "'cause" get to you know some improvements but the important point
0:41:18	mainly is that we can so we get some improvement with respect to the tf
0:41:22	idf approach
0:41:23	you go from twenty eight something to twenty nine something
0:41:27	but also what's important in which can we can really scale to the whole
0:41:31	two hundred web pages
0:41:35	and here's an example showing the output of the system
0:41:39	which is a that's a very
0:41:41	i've only the very impressive so there's a but also illustrates some problems with the
0:41:46	evaluate the automatic evaluation metrics so
0:41:49	the question is why is touching make it might micro fibre time of such an
0:41:52	uncomfortable feeling
0:41:54	then you have these and so this is the reference and this is the generated
0:41:57	answer
0:41:58	generated answer is you know make sense the micro fibre is made up of a
0:42:02	bunch of tiny fibres
0:42:04	that attached to them
0:42:06	when you touch them
0:42:07	the fibres that make up the micro fibre an attractive to each other
0:42:11	when that actually attracted to the other end of the fibre which is what makes
0:42:15	them uncomfortable so this part is a bit strange but overall it makes sense
0:42:19	and it's relevant to the question and you know you have to think it's generated
0:42:22	from this and it doesn't talk and so it's not so bad
0:42:26	but what it also shows that you know they almost not overlapping words between the
0:42:31	generated answer in the and the reference and so it's an example where
0:42:35	you know that automatic metrics to give it a bad score actually
0:42:38	whereas in fact this is a pretty okay sentence
0:42:48	how much time your hand
0:42:54	fifteen
0:42:57	so with this
0:43:00	okay so one nothing about encoding
0:43:03	is that sometimes
0:43:05	sometimes again you don't have so much data
0:43:09	so you model your abstracting away a over the data might help in generalizing
0:43:15	it so here i'm going back here to this task of generating from an older
0:43:19	dependency trees
0:43:20	so you had this no is it input this is what you have to produce
0:43:23	as output
0:43:24	and a idea here was that so this was another work that and it's with
0:43:29	the
0:43:30	this was in that study i don't see
0:43:34	idea was that a that here we just use before we you know in this
0:43:39	other approach we have sort of
0:43:40	attaining the two into a binary tree and then having this a multilayer perceptron to
0:43:45	although the trees so local ordering of the of the knowledge
0:43:49	but we do is we just haven't encoder-decoder which basically learn to map
0:43:54	are linearise version of the and all the dependency tree
0:43:57	in two
0:43:59	the correct order of the lemons
0:44:02	so it's different approach and also what we'd it's is
0:44:08	we for twelve this work downlink problem is not so much determined by word sits
0:44:12	model dependent on syntax
0:44:14	so maybe we can abstracts of other words we can just get rid of the
0:44:18	words
0:44:18	and you know if this was reduce data about sparsity it
0:44:22	it wouldn't we be more general we don't want you know the specific words to
0:44:26	have an impact basically
0:44:30	so what we did is we actually got rid of the words
0:44:34	so here you have your input word input
0:44:38	input dependency trees that is not older oppose the john the need for example
0:44:42	and what we do is we linearize this tree
0:44:46	and we remove the words so we have factored representation of each node where we
0:44:50	keep track of you know the pos tag the parent node
0:44:54	and can't remember with this one days
0:44:58	i guess the position
0:45:06	well i don't know for the
0:45:09	zero one two okay a member
0:45:11	anyways important point is that are we got we would within right tree and remove
0:45:15	the words
0:45:16	so we only keep
0:45:17	basically postech information
0:45:19	structural information what is apparent
0:45:22	and i apparent
0:45:24	and what is a grammatical relations the dependency relation between the node and the parents
0:45:29	so here you're saying eats for example is replaced by this id one you know
0:45:34	it's over
0:45:35	you know the parent is a route
0:45:38	about sorry it's populated i didn't ct of its related by the what relation to
0:45:42	this no to the remote so if i think another example that would be clear
0:45:47	john for example
0:45:49	where is the subject john here is replaced by id for it's a proper noun
0:45:54	so this is a bust act and its the subject and the i think it's
0:45:57	missing the parent node
0:46:00	okay so we need we delexicalised we didn't linearise and delexicalise the tree
0:46:05	and then we build this corpus where the target is that the lexical i sequence
0:46:10	with the correct all other so here you see that you have the proper nouns
0:46:13	first a verb the determinant and down
0:46:16	and basically we train a sec two sec model to go from here to here
0:46:20	and then we have a lexical i so we keep a mapping of you know
0:46:24	what id one is and then you generate so you can just use of the
0:46:28	mapping to we lexicalised a sentence
0:46:32	and what you see that it really helps
0:46:35	so this a surface realization task is
0:46:38	it data for everything about ten languages so it is
0:46:42	big czech and english spanish finish french
0:46:45	italian dutch portuguese and russian
0:46:48	and you see here the difference between
0:46:51	doing the sec two sick with whereas a tree
0:46:54	contain all the words so where we haven't change of thing and the seeds doing
0:46:58	it's without the words of the delexicalised version and you see that for all languages
0:47:02	you get quite a big improvement in terms of bleu score
0:47:10	and we use a similar idea here on the so this was generating from another
0:47:15	dependency trees but
0:47:16	there was is all the task you know generating from abstract meaning representation
0:47:21	in fact a here we built a new data set off range but the
0:47:25	so here is the same idea we represent the notes by a concatenation of those
0:47:29	the factored model where
0:47:30	each node is the concatenation of different types of embedding so you know it's a
0:47:34	can take on the post like the numbers and the and morphological syntactic features
0:47:39	and again we delexicalise everything
0:47:42	and again we found oops
0:47:45	yes and again we found that you know that get this improvement so this is
0:47:49	a baseline weights not the lexical i and this is that when we delexicalise so
0:47:54	you get two points improvements
0:48:01	okay
0:48:03	so i think as i mentioned the beginning of and you know the that is
0:48:06	that they don't have they are not very being so the in particular for example
0:48:09	the surface realization challenge
0:48:12	is
0:48:15	it is you know there is a training sets a like a two thousand packets
0:48:22	so you have to be a little bit
0:48:24	sometimes you have to be clever or
0:48:27	constructive in what to do with the training data
0:48:29	one thing we found is
0:48:32	it is often useful to
0:48:37	to extend your training data outweighs information that is implicit it's in a to be
0:48:42	found
0:48:44	is that the implicit in the available training data
0:48:48	so again going back to this example where the problem was to we members of
0:48:53	the
0:48:53	the problem is to
0:48:54	we attack the problem by having this classifier that would that i mean
0:48:59	so delatable they're of a parent and child
0:49:03	and so you know you know the on your training data was like this you
0:49:07	had you had the parent and you had the child and you had the position
0:49:10	of the child with respect as a parent
0:49:16	and this is that we had and we folds well you know it should learn
0:49:19	that if this is true then also this is two days that the if the
0:49:24	you know
0:49:26	if the if the chinese to the left of the parents it should also learned
0:49:30	somehow that the parent is to the right of the chart
0:49:36	but in fact we found that it
0:49:37	didn't learns that we have also what we did is we just and it was
0:49:41	payers whenever we had these spans in the training data we add the despair to
0:49:46	the training data so we double the size of the training data
0:49:50	but we also give more explicit information about what possible constraints there are so usually
0:49:56	you know the subject it before the verb at a us thing you know and
0:50:00	the verb is after the subject
0:50:03	and again you know you see that there is a large improvement
0:50:14	and also went swimming poof expands the data is to use competition linguistic tools that
0:50:19	are available and that was done already in two thousand and seventeen billion these constants
0:50:25	where the idea is that so this was for this so generating for amr data
0:50:29	so the training data was
0:50:32	it or manually validated or constructed
0:50:35	for the task for the for the shared task but in fact there are a
0:50:39	semantic parser that if you give them a sentence they will give you the error
0:50:43	more so i mean they are not one hundred percent reliable but they can produce
0:50:49	a tamil
0:50:50	so what you can do is you just generate a lot of you generate a
0:50:54	lot of training data are by simply using a semantic parser on available data
0:50:58	and this was i think what about constancy so you basically part two hundred thousand
0:51:03	you get what sentences with this semantic parser
0:51:07	and then so you do some pre-training on this data and then you do some
0:51:11	fine tuning on the other shan't as the test set
0:51:18	and so we
0:51:20	we use this again you know for the first approach i show the good the
0:51:24	graph encodes the dual top-down bottom-up
0:51:26	i think writing approach and again we so what you know like the other approaches
0:51:29	we should we see that knows this really improve performance
0:51:36	case on getting to the ends
0:51:38	so you know i mentioned some
0:51:41	some things you can do a better encoding of your input and better training data
0:51:46	there's of course many open issues
0:51:49	some of them the affine particularly interesting is a multilingual generation so we saw in
0:51:54	the surface realization shared task there are ten languages but is still a reasonably simple
0:51:59	that's and you can have some data
0:52:02	they take it on the universal dependency tree banks
0:52:06	it so what would be interesting is you know how can you generates in multiple
0:52:10	languages from data from knowledge bases
0:52:12	or even from texas a few at simplifying can you simply find different languages
0:52:18	they all cosine supposed to be seen interpretability issues
0:52:23	i that's at the beginning you noses the standard approach is this encoder and decoder
0:52:27	into and approach
0:52:29	which that the wherewithal those modules that we had before
0:52:32	but in fact now
0:52:33	you know one way to make the mobile more interpretable is to have to reconstruct
0:52:38	those modules right so instead of having to sing a mandarin system
0:52:41	you had difference different networks for each of the of the task and people are
0:52:46	starting to work on this in particular is a fine to cost
0:52:51	coarse-to-fine approach
0:52:52	where you first for example generate the structure of the text and then you feel
0:52:56	in the details of example
0:52:59	and generalized inverse is memorising
0:53:02	they have been problems with you know that that's that are very repetitive and it's
0:53:07	really important to have very good test sets to control for the test set and
0:53:10	for example a lot of the data a lot of the shared task do not
0:53:14	provide the sort of unseen test set in the sense that
0:53:19	i don't know you add generating newspaper texts about you would like the test set
0:53:23	also to contain a the test for you know what happen if you applied your
0:53:28	mobile two different types of text one so i think this you know having sort
0:53:32	of out of main test set is really important for general for the for testing
0:53:36	so did normalization of the system is also linked to you know what can you
0:53:39	do with also learning of them in addition to go from one a type of
0:53:43	text another
0:53:44	and that's it thank you
0:53:56	is a personal questions
0:54:07	i one you shown some results of acceptability of text generation that was something like
0:54:13	seven five percent before was sixty percent somewhere in the middle
0:54:18	thus annotation and i want to show that to ask you is this like a
0:54:22	zero one people say are you accept or and not accept or dts you've a
0:54:28	degree was like i don't know so i'd human evaluation you mean yes the evaluation
0:54:32	human annotation is usually on a scale from one to five
0:54:36	okay because you shown to be percentage i think
0:54:41	at some points so i'm wondering if that is
0:54:50	readability
0:54:52	sorry to compare so they know what in this case in which they just compare
0:54:58	the output of two systems
0:55:00	so the compare the output of the second set to the output of the crisis
0:55:04	ten
0:55:04	and then they and they say which ones they prefer
0:55:07	so the percent is like sixty four people prefer this one
0:55:17	if you have them from one to five okay that because this is preference test
0:55:23	right do we know in these similar or reliability
0:55:31	good is the score between one to five
0:55:34	no i think i think a four
0:55:41	to get back to the paper i don't remember
0:55:44	but i think it is not is not that one to five or that was
0:55:47	wrong so it's been here comparison between two different if they have to do this
0:55:52	often systems that don't know which one and they have to say which one day
0:55:54	before
0:56:00	hi
0:56:07	okay
0:56:09	the quite
0:56:15	so i like to thank you for the great i since you cover their summer
0:56:22	kinds of generation start many in the relationship a slu engine texts first summarization you
0:56:30	can generate x is very a
0:56:33	conversational system and was curious is that's
0:56:39	a very different kind of problem the architecture and main a state-of-the-art approach how to
0:56:46	converge at that's another architecture they are very different by different sets of that well
0:56:53	so
0:56:54	so the question is whether we have very different your approach for the given depending
0:56:59	on the input of different on the task
0:57:01	this version task
0:57:03	so i that initially for all online photo for years everybody was using this encoder-decoder
0:57:10	are often within a recurrence encoder
0:57:14	and the difference is where what was the input space so in that i don't
0:57:19	for example
0:57:21	going to take as input
0:57:23	the current dialog the user turn press maybe the context or receiving some information about
0:57:28	the dialogue right
0:57:30	if you adding question answering qa take a question
0:57:33	and as a supporting evidence
0:57:35	so it was really more about
0:57:38	you know which kind of input you had and that was the only difference but
0:57:41	now more and more and then people are paying attention to you know
0:57:45	in fact there are differences between this task so what is the structure of an
0:57:48	input what is the goal do you want to focus on identifying important information or
0:57:54	you know
0:57:55	the problems are very think it remains very different a so you have to tutor
0:57:59	so very different problems in a way so this was but that was trying to
0:58:02	show in fact
0:58:04	but the fact that dialogue and generation there is a thing people at time different
0:58:10	approaches to do the encoder and decoder and that problem is not very is
0:58:15	and you can you can learn to see of these units that are a lot
0:58:18	campaign see that encoding okay that the time things is the decoder generates things not
0:58:26	able to generate things more san okay and a and various state transition now
0:58:33	yes so there is a known problem your data set and that the potential generate
0:58:38	very generic answers like i don't know or maybe you're not very informative we actually
0:58:44	working on
0:58:48	using external information
0:58:50	to produce to have direct systems that
0:58:53	actually are produced more informative and so the idea in this case is high or
0:58:57	the problem is how you're achieve so you have your dialogue context you have euro
0:59:01	user question no user turn
0:59:04	and it's a bit similar to the text approach to produce you do you look
0:59:09	on the web or in some sources all for some information that is relevant to
0:59:14	a what is being discussed
0:59:16	and you in the range and so now on you and we joint but with
0:59:19	this additional information and the whole busy gives you more informative dialogue so instead of
0:59:25	you know avoids is empty utterances
0:59:27	but the system now hides all this knowledge it can use to generate more informative
0:59:31	response
0:59:34	so there are a number of you know calmly i two challenge for example is
0:59:38	using providing this kind of data sets where you have it dialogue plus some additional
0:59:43	information related to the topic of the dialogue or chat image where you have
0:59:48	you have any image
0:59:50	and so there's a dialogue is based on the image
0:59:52	so again you can is it the dialogue system should actually use the content of
0:59:58	the image to provide some informative
1:00:04	and so a again this slide is something i think what you have really human
1:00:10	in evaluation is i think is something that speaks to a lot of people in
1:00:12	this room because it's at least for speech it's been shown that you really need
1:00:17	humans to george whether or not something is adequate and natural and many of those
1:00:22	things so
1:00:24	i wonder that because this to my understanding was perhaps the only subjective human evaluation
1:00:30	results double contain only slides so mostly people optimising towards objective metrics do you think
1:00:39	there is a risk of overfitting to these metrics maybe in particular tasks or so
1:00:44	on
1:00:45	do you see where do you see the role of the humans judging generated text
1:00:51	in your field now one in the future
1:00:54	so the human evaluation of sas will important because is automatic metrics
1:00:59	that you need them you need
1:01:00	two dev larger system and you need them to compare you know exhaust yet is
1:01:04	you have a the output of many system so you need some automatic metrics
1:01:08	but they are imperfect right
1:01:11	so you all you also need you meant evaluation
1:01:14	often the shared task actually organiser human evaluation and they
1:01:18	they do this i mean it's i think it's getting better and better because getting
1:01:21	a people are getting more experience
1:01:23	and they are better and better platform and you know i'm guidelines don't know how
1:01:27	to do this
1:01:29	we are not optimising with respect to those human
1:01:33	objective because it's just impossible right so the over fitting would be a with respect
1:01:38	to the training data where you do you know maximum likelihood
1:01:43	trying to maximize the likelihood of the training data mostly using cross-entropy so they are
1:01:48	say there is some oracles on using a reinforcement learning where you optimize with respect
1:01:53	to actually your evaluation metrics for the rouge number
1:01:58	that morning spent reading
1:02:02	we could and you so to me it's is that the main problem that you
1:02:07	cup is kind of during the
1:02:09	student a task and right beyond is the about this that's correct
1:02:13	looking internet finally the information that you want them right m is the about that
1:02:17	my question is very often the type of owns for that you give depends on
1:02:21	the type of person that is going to receive is what you will employ the
1:02:24	same and sort of you are going to give to work during your soul child
1:02:28	or to an expert in the figures that will go with screen your
1:02:31	is there any research on how can you can be sooner or limit
1:02:35	the answers so that it fits the user
1:02:40	there really but i can think of ways right now i mean there is a
1:02:44	people often find that you if you have some kind of parameter like this that
1:02:49	you want to used to influence the outputs so you have wine input and then
1:02:55	you want to different outputs depending on this parameter
1:02:58	often just i think this to your training data actually helps a lot
1:03:04	i think this was in
1:03:06	so people do this with emotions
1:03:09	for example should that x p us either should be happy so are there is
1:03:15	a might use there is they want to use emotion detector and emotion you know
1:03:19	some something that gives you in any motion tag for the for the sentence and
1:03:23	then they would
1:03:25	produce
1:03:26	i mean you need to the training data right but if you
1:03:30	if you can have this training data and you can label the training example with
1:03:35	the
1:03:36	personalities that you want to generate for then it works reasonably well in fact the
1:03:41	chat image a data it's nice place
1:03:47	it's
1:03:49	the dialogue is the dialogue has you have four same image you might have different
1:03:54	dialogue depending on the
1:03:56	personality so there's i-th input as a personality and the personalities is like something like
1:04:01	two hundred and fifty personalities
1:04:04	can be you know
1:04:06	jockeying serious or whatever and so they had the database the training data taking into
1:04:11	account this personality
1:04:13	so you can generate dialogues that have
1:04:15	about the same image
1:04:16	with different on depending on the personal but for example within the same or what
1:04:22	would be possible to open a constraint on be the vocabulary but you can use
1:04:26	on the output
1:04:27	yes
1:04:29	so in the encoder decoder
1:04:30	yes you could do that
1:04:33	this is not something normally people do the just use the whole vocabulary and then
1:04:37	they hope that the model is going to learn to focus on that vocabularies that
1:04:42	correspond to certain feature but maybe you could say that
1:04:57	you already mentioned it somewhat but
1:05:00	this you also raises or effective questions on the right the text more than maybe
1:05:06	in synthesis
1:05:08	that you really need to get it right or two
1:05:11	you have some other problems consistency or units indication of something bend it is
1:05:18	c
1:05:19	this is the proposed a statistical approach or can you
1:05:24	can you solve this
1:05:26	well that's a i mean i think one
1:05:30	when the problem i think that is the l c of the i c with
1:05:34	the with the current approach to new approach to generation is that they're not necessarily
1:05:39	semantically faithful what okay thing right so you know that they can print think that
1:05:45	have nothing to do with the input we can see the problem
1:05:47	i'm not sure a syntactical problem in the sense you know that generators are not
1:05:51	really out in the one that are not super useful either but in an application
1:05:55	so you know for people for industrial people want to develop application clearly it's a
1:06:00	problem right because there you don't want to sell a generative that
1:06:04	that is not faithful
1:06:08	but i mean ethical problems we have plenty in general in nlp
1:06:17	that's not time we had so that's a think is documents

Natural Language Generation: Creating Text

Keynotes

Claire Gardent CNRS/LORIA, Nancy, France