0:00:09 | a good morning everyone that they have to change at least i hang up seeing |
---|
0:00:13 | a speaker is to study and practically i is a senior researcher french national centre |
---|
0:00:20 | for scientific research just a little area computer science a sixteen c fan |
---|
0:00:26 | i think i c six test databases and have it seems like and statistical models |
---|
0:00:31 | natural language she's are not an acquisition that can arise and lexical resources for nlp |
---|
0:00:37 | syntactic and semantic parsing and i need technology for language learning |
---|
0:00:41 | she is very natural language generation from syntactic let and has a large delay not |
---|
0:00:48 | time which is a shared task and generating text from section let |
---|
0:00:53 | said this morning style will be at and t planning-based critically models thinking i natural |
---|
0:00:59 | language generation of are a variety of different types of a unique tasks so that's |
---|
0:01:04 | baseline okay |
---|
0:01:13 | can you email |
---|
0:01:15 | can you hear the back |
---|
0:01:17 | okay so good morning |
---|
0:01:19 | thank you for being hereafter last night |
---|
0:01:24 | so when i was invited so it is workshop i was i was for real |
---|
0:01:27 | of course you know coming to the n i giving a talk |
---|
0:01:31 | but then i was at the river a the also about the |
---|
0:01:33 | thai tone of this workshop synthesis no |
---|
0:01:37 | i think that the introduction say they show that i don't do synthesis i don't |
---|
0:01:41 | do speech in fact i just for a context |
---|
0:01:45 | but of course you know there is a link between |
---|
0:01:48 | text-to-speech synthesis and generation which is what i've been working on for the recent years |
---|
0:01:54 | which is that natural language generation is the task of producing text so you can |
---|
0:01:59 | see natural language generation as a |
---|
0:02:02 | three step two text-to-speech synthesis |
---|
0:02:05 | and it is what i'm going to talk about today are going to talk about |
---|
0:02:09 | different types of natural language generation tasks |
---|
0:02:14 | and is and we start with |
---|
0:02:17 | we know what's |
---|
0:02:20 | okay |
---|
0:02:21 | so i would start with a an introduction to you know how hard generation was |
---|
0:02:26 | the on before deep learning or is and then i will show how |
---|
0:02:31 | you know the deep learning |
---|
0:02:32 | and paradigm completely change the approach to natural language generation |
---|
0:02:37 | and i will talk about some issues about current your approach is to |
---|
0:02:42 | text generation |
---|
0:02:45 | so the vocal presence here is a joint work with the phd students and colleagues |
---|
0:02:49 | which i want to name here so i met corner is a p g students |
---|
0:02:53 | at the north sea where am based |
---|
0:02:55 | you do that again from bob dylan university and it is well okay life on |
---|
0:02:59 | is a piece is you don't to jointly student between fair in paris and all |
---|
0:03:03 | c |
---|
0:03:04 | are in our group goal of each that the dams that university okay push your |
---|
0:03:08 | cough and the another liberal who were also puget students they're the joint supervision with |
---|
0:03:14 | within a |
---|
0:03:15 | and the finally and that so the su money not be the phd students with |
---|
0:03:19 | mean of |
---|
0:03:21 | okay so first what is natural language generation well it's the task of producing text |
---|
0:03:26 | but it's very different from natural language understanding because of the input so in natural |
---|
0:03:32 | language understanding the input is a text is well defined everybody agrees on that and |
---|
0:03:37 | this you know large quantities of text around available |
---|
0:03:41 | in natural language generation is very different in that the input can be candy many |
---|
0:03:46 | different things |
---|
0:03:48 | and this is actually one of the reason why natural language generation board was very |
---|
0:03:52 | small see for a very long time so compared to nlu you know the number |
---|
0:03:56 | of papers on energy was very small |
---|
0:04:00 | and so what out the types of inputs there are basic |
---|
0:04:03 | types of input data are meaning representation |
---|
0:04:07 | well text |
---|
0:04:08 | so data i would be that it data from data bases a knowledge bases |
---|
0:04:14 | structured data then you have meaning representation that are devised by language that can be |
---|
0:04:18 | produced by computational linguistic tools that basically a device to represent the meaning of a |
---|
0:04:24 | sentence |
---|
0:04:25 | sometimes of a text more generally of a sentence or dialogue turn |
---|
0:04:30 | so sometimes you want to generate some this meaning representation for example |
---|
0:04:34 | in the context of a dialogue system |
---|
0:04:36 | you might want to the you know the system will the dialogue manager would produce |
---|
0:04:40 | a meaning representation music which is called a dialogue turn |
---|
0:04:43 | and then the generation task is to generate the texas |
---|
0:04:46 | a turn so the system turn in response to |
---|
0:04:50 | to this meeting representation |
---|
0:04:52 | and finally you can generate from texas and that would be in our applications such |
---|
0:04:56 | as text summarization text simplification sentence compression |
---|
0:05:03 | so those are the main types of input another complicating factor is that the |
---|
0:05:07 | what we call the communicative goal can be very different sometimes you want to verbalise |
---|
0:05:12 | so for instance if you have a knowledge base you might want the system to |
---|
0:05:16 | just verbalise the content of the knowledge base |
---|
0:05:18 | so is |
---|
0:05:19 | you know readable by human uses but off you know all the goals would be |
---|
0:05:23 | like to respond to a dialogue turn |
---|
0:05:25 | to summarize a text or to simplify text or even to summarize the content of |
---|
0:05:30 | a knowledge base |
---|
0:05:31 | so those two factors with the |
---|
0:05:33 | means that |
---|
0:05:34 | up renewal |
---|
0:05:36 | natural language generation was divided into many different sub fields which didn't help |
---|
0:05:41 | given that the community was already pretty small |
---|
0:05:44 | and there wasn't much |
---|
0:05:45 | you know communication between those subfields |
---|
0:05:48 | so why did we have this difference amphibians because essentially the problem is very different |
---|
0:05:56 | so in the when you are generating from data there is a big gap between |
---|
0:06:01 | the input and the output six of the input is a data structure they died |
---|
0:06:05 | does not like the text at all it can be even you know result from |
---|
0:06:09 | signal processing can be numbers from with r |
---|
0:06:14 | at call numbers whatever |
---|
0:06:16 | so |
---|
0:06:16 | the input data is very different from texts and so the to bridge this gap |
---|
0:06:20 | you have to do many things and essentially what you have to do is to |
---|
0:06:25 | decide what to say |
---|
0:06:26 | and how to say eight |
---|
0:06:28 | what to say is more in the eye problem so deciding you know what part |
---|
0:06:32 | of the data are you want to select two |
---|
0:06:34 | actually verbalise time because if you were you know if you verbalise all the numbers |
---|
0:06:37 | in the you know given by a sensor it would just make no sense at |
---|
0:06:41 | all of the |
---|
0:06:42 | so without in text with make no sense at all |
---|
0:06:44 | and so you have a content selection problem then usually have to structure the set |
---|
0:06:49 | the content that you've selected into a at structures of the resemble the text |
---|
0:06:55 | then so this would be more like ai planning problem so in this is actually |
---|
0:06:59 | that was handles often with planning techniques |
---|
0:07:02 | and this more linguistics once you have to text structure how do you convert it |
---|
0:07:08 | into well-formed text |
---|
0:07:09 | and there you have to make many choices so generation is really a choice problem |
---|
0:07:15 | because there are many different ways of realizing of things |
---|
0:07:18 | so you know have problems such as you know |
---|
0:07:21 | well to choose to lexicalising that every symbol which referring expression to describe an entity |
---|
0:07:27 | known hear its rays are you going to |
---|
0:07:29 | use a pronoun |
---|
0:07:31 | a proper name |
---|
0:07:32 | aggregation how to about repetitions this is basically about choosing deciding when to using such |
---|
0:07:38 | as ellipsis so-called in addition to avoid redundancy in the output x |
---|
0:07:42 | even for there is redundancy in your or not |
---|
0:07:45 | see but repetition signal knowledge base |
---|
0:07:49 | things actually so |
---|
0:07:50 | basically generating from data at the consensus was there was this a big and then |
---|
0:07:54 | g pipeline where you had to mobile all of these |
---|
0:07:57 | subproblems |
---|
0:08:00 | if you generate a meaning representation the task was seen that's completely different |
---|
0:08:04 | partly i mean a mainly because the gap between in your presentation and the sentences |
---|
0:08:10 | is much smaller as in fact and in fact scissors meaning representation not be a |
---|
0:08:15 | by linguists |
---|
0:08:16 | so the |
---|
0:08:17 | consensus here was that |
---|
0:08:20 | if you can have a grammar that describes you can you know |
---|
0:08:25 | no and the rubber grammars that describe basically the mapping between text and meaning |
---|
0:08:31 | and c and because it's a grammar it also include this notion of syntax so |
---|
0:08:34 | it ensures that that's the text will be well-formed |
---|
0:08:38 | so the idea was you have a grammar that define this mapping between this association |
---|
0:08:41 | between text and meaning |
---|
0:08:44 | you could use it in both direction either you have it x and you use |
---|
0:08:47 | the grammar to derive its meaning |
---|
0:08:49 | but you can use it for generation you start some of the meaning |
---|
0:08:52 | and then you user grammar was to decide you know what is the corresponding sentence |
---|
0:08:57 | given by the grammar |
---|
0:08:59 | and of course is grammar that soon as you have lots coverage they become very |
---|
0:09:02 | ambiguous so there's a huge ambiguity problem |
---|
0:09:06 | it's not tractable basically you get you know tiles on some intermediate results thousands of |
---|
0:09:12 | outputs |
---|
0:09:12 | and the initial search space is used so you combine usually you combines this grammar |
---|
0:09:18 | with some statistical modules that will basically designed to reduce the search space |
---|
0:09:23 | and to limit the output to one always you outputs |
---|
0:09:30 | and finally generating from texas here again very different approach the main point them and |
---|
0:09:36 | consensus again was that you generate from text they are basically foreman operations you want |
---|
0:09:41 | to model i mean all of some of them depending on the application which is |
---|
0:09:46 | pleats rewrite real during delete |
---|
0:09:49 | thing is about learning went to split a long sentence into as the several sentences |
---|
0:09:54 | for example the simplification where you want to simplify text |
---|
0:09:59 | we ordering is just moving constituent around the words around |
---|
0:10:04 | again because maybe you want to simplify the paraphrase is another text to text the |
---|
0:10:09 | generation application |
---|
0:10:12 | you want to rewrite again maybe to simplify of the paraphrase so we write a |
---|
0:10:16 | word or rewrite the phrase |
---|
0:10:18 | and you want to decide what is what you can see deeds in particular if |
---|
0:10:21 | you're doing simplification |
---|
0:10:24 | so in general free many free very different approaches to those free task depending on |
---|
0:10:30 | but the input the is |
---|
0:10:32 | and this completely change with annual approach so what the new one approach that it |
---|
0:10:37 | is it's really completely change the field so now is the before generation was a |
---|
0:10:42 | very small fields and now at a c l so the main completion linguistics conference |
---|
0:10:48 | generation is one of the top |
---|
0:10:51 | you know he gets the top number of submissions i think the second ranking for |
---|
0:10:54 | number of submission in that field |
---|
0:10:57 | so it's really |
---|
0:10:58 | change completely |
---|
0:10:59 | and why changes because is encoderdecoder framework willie allows you to model or free task |
---|
0:11:07 | in the same way |
---|
0:11:09 | so all the techniques the |
---|
0:11:10 | methods that you can never up to improve the encoderdecoder framework |
---|
0:11:14 | there will be novel but you know |
---|
0:11:17 | it is common framework which makes it much easier to |
---|
0:11:20 | take ideas from one field from one stuff it or not |
---|
0:11:23 | so the encoderdecoder framework is it's very simple you have you input and it can |
---|
0:11:28 | be data |
---|
0:11:29 | text or meaning representation |
---|
0:11:31 | you are encoded into a vector representation and then you use the power of the |
---|
0:11:37 | in your language model to the code so that the decoder is going to produce |
---|
0:11:41 | the text |
---|
0:11:41 | one word at a time using recurrent network |
---|
0:11:44 | and we know you know that |
---|
0:11:46 | no language model much more powerful than previous |
---|
0:11:49 | mobile language model because the context of |
---|
0:11:52 | in limited amount of context into account |
---|
0:11:56 | okay so we have this unifying framework but what i want to doing this talk |
---|
0:12:01 | at a |
---|
0:12:03 | of course the that you know the problem still remain the task are different and |
---|
0:12:07 | you still have to handle them somehow |
---|
0:12:09 | so whether we'll doing this talk is so you based on some work we've been |
---|
0:12:13 | doing focus on two main points how to improve encoding or how to adapt encoding |
---|
0:12:19 | to various energy task |
---|
0:12:21 | and if i have time to talk a little bit about training data again off |
---|
0:12:25 | and you know with they stiff this problems that findings disparity that so it's all |
---|
0:12:29 | supervised approach usually and unsupervised mean you have this training data in this case the |
---|
0:12:34 | training data has to be |
---|
0:12:36 | the texts in the input |
---|
0:12:38 | but this inputs candy already how to get so this meeting a presentation you know |
---|
0:12:42 | where do you get them from or even the you know getting an alignment a |
---|
0:12:47 | parallel corpus between |
---|
0:12:49 | database fragment and the corresponding text is also very difficult to get right |
---|
0:12:52 | so often you have you don't have much data |
---|
0:12:55 | and of course this neural networks they want a lot of data so of and |
---|
0:12:58 | you have to be clever about what you do with the training data |
---|
0:13:09 | okay something coding and we talk about three different points modeling graph structured input |
---|
0:13:15 | so we see that |
---|
0:13:18 | the encoderdecoder framework initially at least in the first steps |
---|
0:13:22 | the encoder was usually always a recurrent network so |
---|
0:13:27 | no matter whether the input was that x or meaning representation of a graph |
---|
0:13:32 | you know a knowledge base |
---|
0:13:33 | it was people where using this recurrent network i think order because you know the |
---|
0:13:38 | encoderdecoder framework |
---|
0:13:40 | was very successful machine translation and people with all the building on this for doing |
---|
0:13:44 | this |
---|
0:13:45 | but of course you know after a while people for the about some of the |
---|
0:13:49 | input is graph so maybe it's not such a good idea to model it as |
---|
0:13:52 | a sequence |
---|
0:13:53 | so let's do something of so we talk about |
---|
0:13:55 | how to model |
---|
0:13:57 | graph structure input |
---|
0:13:59 | then i would talk about generating from texas where here i will focus on the |
---|
0:14:05 | an application where the input is a very large quantities of text and the problem |
---|
0:14:10 | is that if you are you know neural networks are only so good that encoding |
---|
0:14:14 | large quantities of text so it's snowing big in fact for machine translation that |
---|
0:14:18 | you know the longer the input these |
---|
0:14:20 | so both the performances |
---|
0:14:21 | and here we're not talking about long sentence is well within talking about single wrong |
---|
0:14:25 | text to have that as an tokens or something so what do you do in |
---|
0:14:29 | that case if you still want to do text to text the generation |
---|
0:14:32 | and i will talk a little bit about jen normalization so some device that can |
---|
0:14:36 | be used in some application |
---|
0:14:39 | again because the data is not so big how can you |
---|
0:14:43 | improve its so you can generalise better |
---|
0:14:46 | okay so first encoding graphs |
---|
0:14:49 | so i say the input so graphs |
---|
0:14:53 | they all curve for example if you have not answering the input or meaning representation |
---|
0:14:58 | so here you have an example from the mr to two thousand and seventeen challenge |
---|
0:15:04 | where is the task was |
---|
0:15:06 | given |
---|
0:15:07 | meaning representations of this amr is means abstract meaning representation you part be considered as |
---|
0:15:13 | a matter to match basically it's the it's a meaning representation you know the rights |
---|
0:15:17 | which is can be written like it's written on the right |
---|
0:15:20 | but basically you can see that the graph with the note that the concepts and |
---|
0:15:23 | the edges of the relation between the concept |
---|
0:15:26 | right so here the |
---|
0:15:28 | this meaning representation idea would correspond to the sentence here |
---|
0:15:31 | us officials had an expert meeting group meeting in january two thousand two in new |
---|
0:15:36 | york and then you see that the you know that the at the top of |
---|
0:15:39 | the tree you have z holds concept and then the arg zero with the person |
---|
0:15:46 | and then account even read it's but contrary to the basically so united state but |
---|
0:15:50 | then there are some concepts |
---|
0:15:53 | so the task was to generate from these from this mr and the mr can |
---|
0:15:58 | this you know the graph |
---|
0:15:59 | was another challenging two thousand and seventeen which is how to generate from set saw |
---|
0:16:04 | a the f triple |
---|
0:16:06 | and so here the |
---|
0:16:07 | we what we do this we extracted the sets about the f triple from to |
---|
0:16:11 | be paid which |
---|
0:16:12 | we had a method to ensure that this that's about the f triple where |
---|
0:16:18 | could be match into a meaningful social texts |
---|
0:16:22 | and then we had crowdsourcing people associating the sets of triples with the corresponding text |
---|
0:16:27 | so that i said in this case where the pilot data set with the input |
---|
0:16:30 | with a set of triples and the output with this text that was available i |
---|
0:16:33 | think the content of this triples |
---|
0:16:35 | so you probably can sit here |
---|
0:16:37 | but for example the exact |
---|
0:16:39 | example i showed here is like you have free triple is that repere is the |
---|
0:16:42 | subject pretty property object |
---|
0:16:45 | the first simple sets junk the harp state and then the date john gonna have |
---|
0:16:49 | birthplace and then the place and then shown to how occupation fighter pilots so you |
---|
0:16:53 | for example you have this very triples |
---|
0:16:55 | and then that is would be to generate something like john blah ha born in |
---|
0:16:59 | some of you on nineteen forty tool with twenty six worked as a fighter pilots |
---|
0:17:04 | so this was or the task |
---|
0:17:06 | and the point again here is that |
---|
0:17:09 | so when you are generating from there are like doing here then this data can |
---|
0:17:13 | be seen as a graph where the |
---|
0:17:15 | well that's a graph of the subject in the pair and the object of the |
---|
0:17:19 | entities in your triples and the edges of the relation between a triples |
---|
0:17:27 | okay so i they send initially people where and you know apply for these two |
---|
0:17:31 | task initially people with that |
---|
0:17:33 | it simply using recurrent network so that we have linear rising is a graph to |
---|
0:17:37 | just to a service offers a graph using |
---|
0:17:40 | you know some kind of traversal methods |
---|
0:17:42 | and then the so they then they have the sequence of tokens |
---|
0:17:46 | and then they just encoding using a recurrent network |
---|
0:17:50 | and so here use in example where you know the tokens |
---|
0:17:54 | input to the rnn always tn are basically the concepts and d and the relation |
---|
0:18:00 | that are present in the meaning representation |
---|
0:18:03 | and then you the code from that |
---|
0:18:05 | okay |
---|
0:18:06 | so |
---|
0:18:07 | of course that problems |
---|
0:18:08 | intuitively it's not very nice you know pure modeling a graph as a sequence well |
---|
0:18:14 | and then also there is technically that some problems that occur in the in that |
---|
0:18:20 | not a |
---|
0:18:21 | local dependency that at low content the graphical we could become long range |
---|
0:18:27 | so |
---|
0:18:29 | it's okay so these two edges here they are the same distance from eight writing |
---|
0:18:35 | the initial graph but now when it's you know right you see that the crew |
---|
0:18:39 | members of the first stage |
---|
0:18:40 | is much closer to the to the a node which is that in the in |
---|
0:18:45 | than this one right so you really |
---|
0:18:47 | the linearization is creating those long range dependencies and then again we know that lstms |
---|
0:18:52 | are not very good at dealing with long-range dependencies |
---|
0:18:55 | so also you know technically you think well maybe it's not such a great idea |
---|
0:19:00 | okay so people have been looking at this and they propose the various a graph |
---|
0:19:04 | encoders so the idea is no instead of using an lstm to encode your linearise |
---|
0:19:09 | graph you propose to you just use a graph encoder which is going to lead |
---|
0:19:14 | is going to models of relation between the nodes inside a graph |
---|
0:19:19 | and you and then you the code from the output of the graph encoder so |
---|
0:19:24 | there were several proposal |
---|
0:19:26 | which i won't go into detail via basically the amount in cohen propose a graph |
---|
0:19:30 | published in the network |
---|
0:19:31 | and this to approach the uses some graph a recurrent network |
---|
0:19:37 | okay so we build on this idea here |
---|
0:19:40 | we |
---|
0:19:41 | at this you know i started this is introduction of putting your own energy because |
---|
0:19:46 | i think it's when important to know all about the history of energy |
---|
0:19:51 | to have ideas about how to improve the new one approach and here is this |
---|
0:19:55 | proposal was really based on the previous approach the previous work on a grammar based |
---|
0:20:01 | grammar based generation so this idea that you have a grammar is that and that |
---|
0:20:06 | you can use to produce a text |
---|
0:20:08 | so in this pre in your work |
---|
0:20:12 | what people show this okay you have to a grammar and you have meaning representation |
---|
0:20:16 | then you use the grammar to decide to tell you |
---|
0:20:20 | which sentences that describe our associate |
---|
0:20:24 | with this meaning representation |
---|
0:20:26 | so you |
---|
0:20:27 | see it's like it's you know it's not good parsing problem |
---|
0:20:30 | if i say you know you have a sentence you have a grammar |
---|
0:20:32 | and then you want to decide what are the meaning representation of the syntactic tree |
---|
0:20:36 | associated basis grammar with the sentence |
---|
0:20:39 | it's a parting problem |
---|
0:20:40 | so all i'm saying what i'm doing it's other reversing the problem |
---|
0:20:44 | instead of starting from the text that stuff of the meaning representation that say what |
---|
0:20:47 | you know what that's a grammar tells me audio the okay sentence si that's to |
---|
0:20:51 | say to do this sentence |
---|
0:20:52 | so |
---|
0:20:54 | it was a parting problem and then people started working on this reverse a parting |
---|
0:20:59 | problem to generate sentences |
---|
0:21:01 | and they found it was very hard problem because of all this ambiguity |
---|
0:21:04 | and they had like two types of algorithm bottom and top down |
---|
0:21:07 | you know eyes are you start from the from the meaning representation and then you |
---|
0:21:11 | tried to be of the |
---|
0:21:12 | it's really a syntactic tree that is allowed by a grammar and you get out |
---|
0:21:16 | of that you get the sentence or you got top-down so you just user grammar |
---|
0:21:20 | and try to be of the relation that are going to map you in jail |
---|
0:21:23 | meaning representation |
---|
0:21:25 | so there were these two approaches and they both had problems and what people in |
---|
0:21:28 | the end it |
---|
0:21:29 | if they combine both approaches so they use both top-down and bottom-up they have some |
---|
0:21:34 | he breeds algorithm which was used which we are using both top-down and bottom-up a |
---|
0:21:37 | information |
---|
0:21:39 | so here this is what we did more that's we |
---|
0:21:42 | the idea was okay and those graph encoders the and they have a unique |
---|
0:21:48 | representation graph encoding of the input graph of the input meaning representation |
---|
0:21:52 | what we want to do is to |
---|
0:21:53 | well they're this idea that both bottom-up and top-down information are important |
---|
0:21:58 | so we are going to encode each node in the graph using two encoders |
---|
0:22:03 | one that's is that goes |
---|
0:22:04 | basically top-down from the graph and the others that goes bottom-up for the graph so |
---|
0:22:10 | is what it's gives us is that each node in the graph is going to |
---|
0:22:13 | have |
---|
0:22:14 | two encodings of buttons that reflect the top down view of the graph and the |
---|
0:22:18 | other |
---|
0:22:19 | the bottom-up view of the graph |
---|
0:22:21 | what and that so in terms of number |
---|
0:22:24 | we could show of course you know the weather with independence that |
---|
0:22:26 | you know the we could |
---|
0:22:30 | outperform the state-of-the-art so those are with the state-of-the-art so this is a more recent |
---|
0:22:33 | one which are of course we are no longer state-of-the-art |
---|
0:22:38 | brenda and the time we we're right so without improving a little bit over as |
---|
0:22:44 | its previous approaches |
---|
0:22:46 | more importantly it's all those numbers i don't what was it that was sitting here |
---|
0:22:51 | blah so of course it's there's always runways evaluation is always very difficult to evaluate |
---|
0:22:56 | those |
---|
0:22:57 | generated text |
---|
0:22:58 | because you don't want to look at them one by one what you can |
---|
0:23:02 | you have to the human evaluation in fact side but if you have large quantities |
---|
0:23:06 | and if you want to compare many systems you have to have an automatic metrics |
---|
0:23:09 | so what people use these learn from machine translation and they are well known problems |
---|
0:23:13 | which is you know you can generate a perfectly correct sentence that match the input |
---|
0:23:17 | practically |
---|
0:23:18 | but if it does not look like the reference sentence which is what you compute |
---|
0:23:22 | your blue against then it will get very low score |
---|
0:23:25 | so you have to have some other evaluation or should try hadley |
---|
0:23:30 | so what we did this one on problem with neural network is semantic adequacy of |
---|
0:23:35 | and they |
---|
0:23:36 | the generate very nice looking texts right because this |
---|
0:23:38 | you know language models are very powerful but often |
---|
0:23:42 | the normal to match the input so it's a bit problematic you know because if |
---|
0:23:47 | you when you want to have a generation application |
---|
0:23:49 | it will it has to match input otherwise in right |
---|
0:23:51 | it's very dangerous the asian in a way |
---|
0:23:54 | so |
---|
0:23:55 | what we try to do here is we wanted to measure the semantic adequate because |
---|
0:23:59 | the semantic adequacy of a generator |
---|
0:24:01 | meaning |
---|
0:24:02 | how much that it to match you know how much the generated text |
---|
0:24:06 | match the input |
---|
0:24:08 | and then we what we did this we use the |
---|
0:24:11 | the textual entailment system that basically give and give a sentence tells you whether the |
---|
0:24:16 | first one entails the other |
---|
0:24:18 | so is the first |
---|
0:24:19 | you know with the second sentence implied so entailed by the first sentence |
---|
0:24:23 | and then if you do it both ways |
---|
0:24:25 | on the |
---|
0:24:26 | owing to sentence si so that's being t s q and that's to intel speed |
---|
0:24:30 | then you know the t and q are semantically equivalent right |
---|
0:24:33 | logically that would be the fink |
---|
0:24:36 | so we did something similar we wanted to check semantic equivalence on text |
---|
0:24:41 | we use these tools that have been developed in competition linguistic to determine whether |
---|
0:24:45 | two sentences on a relation entailment |
---|
0:24:47 | and we looked at of direction between so we're comparing the reference and the generated |
---|
0:24:53 | sentence and what you see here is that the always the graph approach is much |
---|
0:24:59 | better |
---|
0:25:00 | at that i mean at producing sentence see that are entailed by the reference |
---|
0:25:04 | and also much better |
---|
0:25:06 | it's producing |
---|
0:25:08 | sentence is that entails a reference |
---|
0:25:14 | we also the human evaluation |
---|
0:25:17 | where basically the way to questions to the human evaluators |
---|
0:25:21 | is it semantically it quite that the output x match the input |
---|
0:25:25 | and it it's readable and then again you see so this is the in orange |
---|
0:25:30 | this is our system and the result the sequence systems and you see that is |
---|
0:25:35 | a large improvements |
---|
0:25:36 | so this you know this all points to direction where you know using the graph |
---|
0:25:41 | encoder ways you have a graph at least |
---|
0:25:43 | i meaning representation of the graph is a good idea |
---|
0:25:47 | okay another thing we found this |
---|
0:25:49 | it's also a valuable in its often to combine local and global information local information |
---|
0:25:57 | meaning local to the node in the graph |
---|
0:26:00 | and global sort of giving information about the structure |
---|
0:26:03 | all the surrounding graph |
---|
0:26:06 | so in this so this is that it is still the same the |
---|
0:26:11 | dual bottom-up so this is top-down bottom-up souls |
---|
0:26:14 | this is a picture of the system |
---|
0:26:15 | we have this graph encoder that eh could top down view of the of the |
---|
0:26:21 | of the graph the bottom of view and then you |
---|
0:26:24 | so you have |
---|
0:26:27 | these are the than the encoding of the nodes |
---|
0:26:32 | and then what you do is you |
---|
0:26:36 | okay so you end up with free and could free embedding so each node one |
---|
0:26:40 | embedding is basically the embedding of the label |
---|
0:26:43 | the correct things is no one so the concept so that |
---|
0:26:46 | so it's a word basically what i'm betting |
---|
0:26:48 | and the other two of the bottom-up and top-down embedding of the node |
---|
0:26:52 | and what we do is would ban same from an lstm |
---|
0:26:55 | so we have a notion of context for each and would |
---|
0:26:58 | which is given by you know the preceding nodes in the graph and we found |
---|
0:27:02 | that this also improve our results |
---|
0:27:05 | and we also apply this idea so this local plus global information |
---|
0:27:10 | idea to the to another task he has a task was it's a surface is |
---|
0:27:16 | another challenge on the generating from depending on who the dependency trees |
---|
0:27:20 | so the idea is the meaning the input meaning representation is this case is an |
---|
0:27:24 | older dependency tree |
---|
0:27:27 | where the where the nodes are they created with them and so the |
---|
0:27:31 | something like this |
---|
0:27:33 | and then what you have to do is to generate a sentence from it so |
---|
0:27:37 | basically this task has to send task one of them is how to real those |
---|
0:27:41 | of them as into correct sentence |
---|
0:27:44 | and then when you have the correct order |
---|
0:27:47 | how to inflict the words so you want |
---|
0:27:50 | for example you want apple to become apples |
---|
0:27:53 | this case |
---|
0:27:57 | so we worked on that so this was also some work we did with you |
---|
0:28:01 | have any push you coughing in time slot |
---|
0:28:05 | so what we did again it's a so then we transform basically what happened so |
---|
0:28:10 | where we handle this |
---|
0:28:16 | the way we handle this was |
---|
0:28:19 | as follows so he i'm just i'm just focusing he on the word or the |
---|
0:28:22 | problem how to |
---|
0:28:24 | maps this and all the tree |
---|
0:28:26 | to a sequence of or of elements |
---|
0:28:28 | so i'm not talking about the world interaction problem |
---|
0:28:31 | so what we do is we basically binarize achieve for us |
---|
0:28:36 | so every everything become binary |
---|
0:28:38 | and then we have c we use a multilayer perceptron to decide on the older |
---|
0:28:43 | of each child's with respect to the head so here we're going to this that |
---|
0:28:48 | we have a |
---|
0:28:49 | we have a |
---|
0:28:51 | we build a training corpus where we say okay i know you know from the |
---|
0:28:54 | corpus from the from the reference that i proceed likes |
---|
0:28:59 | et cetera and then you so this is a training corpus and the task is |
---|
0:29:02 | medically given to |
---|
0:29:03 | given the child and ahead |
---|
0:29:06 | and the parent |
---|
0:29:06 | how do your there's m is apparent first saw is apparent second |
---|
0:29:11 | so we where doing so it's and then we found again that combining local always |
---|
0:29:16 | global information helps |
---|
0:29:18 | and this is the this is a picture of the of the model |
---|
0:29:22 | you have that the embedding for your two nodes |
---|
0:29:25 | and you concatenate them so you build in your presentation |
---|
0:29:28 | and then you have |
---|
0:29:30 | the unveiling although the of the normal also know that are below other parent node |
---|
0:29:35 | so the subtree the is dominated by the by the parent node |
---|
0:29:38 | and again we found that you know if you combine these two information |
---|
0:29:42 | you get much better result in the world ordering task |
---|
0:29:45 | so that what this shows is that |
---|
0:29:47 | taking into account in this case the subtree |
---|
0:29:51 | you know that |
---|
0:29:51 | top down view of the node of that node |
---|
0:29:54 | hams and it's really helps |
---|
0:29:57 | so here you say you again have this bleu score this is the basic question |
---|
0:30:00 | is one where you do data expansion talk about it later and this is the |
---|
0:30:06 | one with the new encoder so when you when you do take into account this |
---|
0:30:09 | additional global information so you see this quite a big improvement |
---|
0:30:16 | okay so this was a bad decoding graph |
---|
0:30:19 | what i want to talk about now is a bad |
---|
0:30:21 | what you do with what you what can you do when you know the |
---|
0:30:25 | the input text that you have to have there is very large |
---|
0:30:32 | so in particular we look at two different tasks |
---|
0:30:42 | we looked at the different is one of them is question answering on free phone |
---|
0:30:45 | for more web text and the others multi document summarization |
---|
0:30:50 | the first task is you have you have a query |
---|
0:30:56 | and basically what you're going to do is you're going to retrieve a lot of |
---|
0:30:59 | information from the web |
---|
0:31:01 | meaning a lot meaning something like to know that doesn't tokens |
---|
0:31:05 | so basically that the first one hundred sun hits from the web |
---|
0:31:12 | and then you are going to use all six takes as input press the question |
---|
0:31:15 | as input to generation |
---|
0:31:18 | and the task with the generate summaries answer the question |
---|
0:31:22 | so it's quite difficult is this is a text to text generation task |
---|
0:31:26 | and the other one is multi document summarization |
---|
0:31:29 | we use that we get some dataset |
---|
0:31:32 | what |
---|
0:31:33 | is that we keep at article tight also you have a title former we could |
---|
0:31:37 | be the article |
---|
0:31:38 | then you retrieve |
---|
0:31:41 | information again some that a from the web about you know using this title other |
---|
0:31:46 | query |
---|
0:31:47 | and the goal is to generate a paragraph we keep at a paragraph that talk |
---|
0:31:53 | about this title |
---|
0:31:54 | so basically the first paragraph a week over you have to generate the first paragraph |
---|
0:32:00 | of that we keep at a page |
---|
0:32:03 | so here's an example you have the question so this is the a life i've |
---|
0:32:07 | i do it i five is ask is you i'd if you where five so |
---|
0:32:11 | with a pos it is simple language |
---|
0:32:14 | so the question would be why consumers that still terrified of genetically modify logan i'd |
---|
0:32:19 | organisms |
---|
0:32:21 | so there is little debating the scientific community or whether that's safe or not |
---|
0:32:25 | and then you retrieve some documents on the web search |
---|
0:32:28 | from the web and then this is this would be in this in this case |
---|
0:32:33 | the |
---|
0:32:33 | the target and so |
---|
0:32:35 | so not only in the input text very long but the output x is also |
---|
0:32:39 | not sure what is not a single sentence is really a paragraph |
---|
0:32:44 | so the question is how to encode two hundred thousand are then generate from it |
---|
0:32:49 | previous work you know it took a way out in a way they basically use |
---|
0:32:54 | tf-idf to select the most relevant |
---|
0:32:58 | wave hit so they're not taking |
---|
0:33:01 | or even sentences so that are not taking the whole results of the website so |
---|
0:33:06 | just take you know they limit the results to a few thousand votes using basic |
---|
0:33:10 | basically tf-idf score and |
---|
0:33:13 | okay so what we wanted to do with to see could we have with their |
---|
0:33:18 | way that we could encode all |
---|
0:33:20 | all these two hundred thousand words that where we're three from the web |
---|
0:33:25 | and then coded and use it for generation and the idea was |
---|
0:33:29 | to convert the text into a graph |
---|
0:33:31 | no this in this case not |
---|
0:33:34 | not a new mode not a graph encoding but sweeter graph it's embody graph like |
---|
0:33:38 | we used to do in |
---|
0:33:40 | and information extraction and see whether that help |
---|
0:33:45 | so when to see how do we do this |
---|
0:33:49 | and so no so here is an example |
---|
0:33:51 | the query with explain the fury of relativity |
---|
0:33:54 | and here's a toy example right we have to those two documents the idea is |
---|
0:33:59 | that buildings this graph allows us to reduce |
---|
0:34:04 | to reduce the size of the input drastically |
---|
0:34:07 | we see why later |
---|
0:34:09 | so the idea is we use |
---|
0:34:12 | to tools to existing tools from compression linguistics coreference resolution and information extraction tools |
---|
0:34:20 | court what coreference resolution status and gives task is that it tells us what are |
---|
0:34:24 | the mentioning that x that talk about the same entity |
---|
0:34:27 | and then once we know was this we group them into a single node in |
---|
0:34:30 | the knowledge graph |
---|
0:34:31 | and the triples the transform the text into sets of triples |
---|
0:34:36 | basically relation between binary relation between entities and so the band is a relations i |
---|
0:34:41 | used to be at the edges in the graph and the entities that in the |
---|
0:34:44 | nodes |
---|
0:34:45 | so here's an example those two documents and then you have like in blue you |
---|
0:34:51 | had those for mention of albert einstein they will all p combined into one note |
---|
0:34:56 | and then the information extraction |
---|
0:34:58 | we'll tells us that there is this triple that you can extract from |
---|
0:35:02 | from this sentence here i'd but i'd sign a german theoretical physicist publish the fury |
---|
0:35:07 | of relativity you can |
---|
0:35:09 | this is the open i need to will tell you |
---|
0:35:12 | that you can transform the sentence into these two triples here |
---|
0:35:17 | no german |
---|
0:35:22 | the german signals to the german no justice this one here into this triple |
---|
0:35:28 | and similarly take this one here they've lobbed if you're over at t and give |
---|
0:35:31 | you this triple |
---|
0:35:36 | and so is the and that's how you build the graph so basically by using |
---|
0:35:39 | coreference resolution and |
---|
0:35:41 | information extraction |
---|
0:35:44 | and another thing that is that was important was that buildings described |
---|
0:35:48 | it's sort of |
---|
0:35:50 | a giving us a notion of or |
---|
0:35:53 | the important information or you information that is repeated in the input |
---|
0:35:58 | because every time we are going to have |
---|
0:36:00 | you know a different mention of the same entity |
---|
0:36:02 | we will we will keep score of how many times it's entities mentioned |
---|
0:36:07 | and we will use this in the graph representation to give a score to be |
---|
0:36:10 | the weights |
---|
0:36:11 | to each node a to each edges in the graph |
---|
0:36:15 | so here if i goal either bit in more detail so you have |
---|
0:36:18 | if you |
---|
0:36:20 | constructs a graph incrementally by going for your sentences so you first at the sentence |
---|
0:36:23 | here |
---|
0:36:24 | i'd it as a graph or you are you at the corresponding triples to the |
---|
0:36:27 | graph |
---|
0:36:28 | and now you had this one here |
---|
0:36:30 | and you see if you're real for a t v t was already mention |
---|
0:36:33 | so now the corresponding node has a he's is the weight of this corresponding note |
---|
0:36:39 | is incremented by one and you go on like this |
---|
0:36:42 | we also have a filter operation that said you know if |
---|
0:36:45 | if a sentence had |
---|
0:36:46 | nothing to do with the query we don't include it's |
---|
0:36:49 | right so and we do this using tf-idf |
---|
0:36:52 | so to avoid including is a graph information that is totally relevant |
---|
0:36:57 | okay so we built this graph and then we are going to linearise a graph |
---|
0:37:02 | right |
---|
0:37:03 | so it different from previous approach where we are going from sequence to graph here |
---|
0:37:08 | we're going to grab from graft a sequence because the graph is too big so |
---|
0:37:11 | it you know what you could time as a graph encoder but we didn't do |
---|
0:37:14 | this |
---|
0:37:15 | i might be the next step but this that quite the graph so i'm not |
---|
0:37:18 | sure how whistles graphing coders would work |
---|
0:37:21 | so we dinner as a graph but then to keep some information about the graph |
---|
0:37:25 | structure |
---|
0:37:26 | we use this two additional invading so it's token so we have a the encoder |
---|
0:37:31 | is a transform and this case so we have since it's not a recurrent network |
---|
0:37:34 | we have the position i'm betting add it to the word i'm betting |
---|
0:37:38 | to keep track of where the in the sentence or word is |
---|
0:37:42 | and we had these two additional embeddings that gives that's information |
---|
0:37:46 | a bad you know the weight of each node and edges |
---|
0:37:49 | and the relevance to the core |
---|
0:37:54 | so the global view of the of the model was |
---|
0:37:58 | you have your linearise graph as i said with different embeddings for four different embeddings |
---|
0:38:03 | for each node or edges |
---|
0:38:05 | you press it was a transformer we use |
---|
0:38:09 | memory compress attention to this is for scaling better we used up cat tensions is |
---|
0:38:14 | you only look at the point in the encoder which have at the top attention |
---|
0:38:22 | so we encode the graph as a sequence |
---|
0:38:26 | when core the query we combined involve using those that tension |
---|
0:38:30 | and then we decode from that |
---|
0:38:36 | so these pictures he additional or the amount of reduction you get from the graph |
---|
0:38:41 | construction |
---|
0:38:42 | and it's |
---|
0:38:43 | and then |
---|
0:38:46 | the proportion of meeting and so tokens because you might think you know okay you |
---|
0:38:50 | may be choose because by compressing the text into this graph |
---|
0:38:54 | by reducing the redundancies |
---|
0:38:56 | maybe you lose some important information |
---|
0:38:58 | but is actually not the case |
---|
0:39:00 | so what the first graph shows is that |
---|
0:39:03 | if you do website so you have something like two hundred thousand tokens |
---|
0:39:09 | and if we you what we what we are graph construction process that it's reduces |
---|
0:39:14 | to |
---|
0:39:14 | or if e ten thousand tokens |
---|
0:39:16 | right |
---|
0:39:17 | and that we compare this with a just extracting triples on the text and not |
---|
0:39:21 | constructing the graph in you see that you still have a lot of that would |
---|
0:39:24 | not be enough to reduce the size |
---|
0:39:27 | and it seconds a second graph shows is that |
---|
0:39:31 | and it says the proportion of missing answer tokens |
---|
0:39:36 | wise lower bit are missing and the tokens |
---|
0:39:40 | you don't want to many |
---|
0:39:43 | so we are talking about comparing with the reference and sign you want to have |
---|
0:39:46 | as many tokens in your output |
---|
0:39:49 | that come from the reference as possible so you don't want to many missing tokens |
---|
0:39:53 | might so what this shows is the previous approach using tf-idf filtering where you don't |
---|
0:39:59 | consider the whole so hundred thousand tokens but simply |
---|
0:40:02 | i think it's hundred tokens |
---|
0:40:06 | this is what happens if you encode the graph from this eight hundred and fifty |
---|
0:40:10 | tokens that the tf-idf approaches using |
---|
0:40:14 | and the seas |
---|
0:40:16 | and so this is the number of meeting talk and so you know the higher |
---|
0:40:19 | other words |
---|
0:40:21 | but we what we see so we if we encode the everything so this is |
---|
0:40:25 | the one we and core the whole |
---|
0:40:28 | it's not very |
---|
0:40:29 | so we encode the whole |
---|
0:40:33 | one |
---|
0:40:34 | with this one on that but those the what if we encore the whole input |
---|
0:40:37 | text the one hundred with |
---|
0:40:40 | web page |
---|
0:40:41 | and so all the information we take on the web |
---|
0:40:43 | you see that the actually the performance is better |
---|
0:40:53 | and these are the general without again so in this case using rouge comparing against |
---|
0:40:59 | or reference an answer again they are |
---|
0:41:02 | issues with this so here we compare with the tf idf approach |
---|
0:41:07 | with |
---|
0:41:09 | extracting the tables but not constructing the graph in here was a graph and then |
---|
0:41:13 | you see you always "'cause" get to you know some improvements but the important point |
---|
0:41:18 | mainly is that we can so we get some improvement with respect to the tf |
---|
0:41:22 | idf approach |
---|
0:41:23 | you go from twenty eight something to twenty nine something |
---|
0:41:27 | but also what's important in which can we can really scale to the whole |
---|
0:41:31 | two hundred web pages |
---|
0:41:35 | and here's an example showing the output of the system |
---|
0:41:39 | which is a that's a very |
---|
0:41:41 | i've only the very impressive so there's a but also illustrates some problems with the |
---|
0:41:46 | evaluate the automatic evaluation metrics so |
---|
0:41:49 | the question is why is touching make it might micro fibre time of such an |
---|
0:41:52 | uncomfortable feeling |
---|
0:41:54 | then you have these and so this is the reference and this is the generated |
---|
0:41:57 | answer |
---|
0:41:58 | generated answer is you know make sense the micro fibre is made up of a |
---|
0:42:02 | bunch of tiny fibres |
---|
0:42:04 | that attached to them |
---|
0:42:06 | when you touch them |
---|
0:42:07 | the fibres that make up the micro fibre an attractive to each other |
---|
0:42:11 | when that actually attracted to the other end of the fibre which is what makes |
---|
0:42:15 | them uncomfortable so this part is a bit strange but overall it makes sense |
---|
0:42:19 | and it's relevant to the question and you know you have to think it's generated |
---|
0:42:22 | from this and it doesn't talk and so it's not so bad |
---|
0:42:26 | but what it also shows that you know they almost not overlapping words between the |
---|
0:42:31 | generated answer in the and the reference and so it's an example where |
---|
0:42:35 | you know that automatic metrics to give it a bad score actually |
---|
0:42:38 | whereas in fact this is a pretty okay sentence |
---|
0:42:48 | how much time your hand |
---|
0:42:54 | fifteen |
---|
0:42:57 | so with this |
---|
0:43:00 | okay so one nothing about encoding |
---|
0:43:03 | is that sometimes |
---|
0:43:05 | sometimes again you don't have so much data |
---|
0:43:09 | so you model your abstracting away a over the data might help in generalizing |
---|
0:43:15 | it so here i'm going back here to this task of generating from an older |
---|
0:43:19 | dependency trees |
---|
0:43:20 | so you had this no is it input this is what you have to produce |
---|
0:43:23 | as output |
---|
0:43:24 | and a idea here was that so this was another work that and it's with |
---|
0:43:29 | the |
---|
0:43:30 | this was in that study i don't see |
---|
0:43:34 | idea was that a that here we just use before we you know in this |
---|
0:43:39 | other approach we have sort of |
---|
0:43:40 | attaining the two into a binary tree and then having this a multilayer perceptron to |
---|
0:43:45 | although the trees so local ordering of the of the knowledge |
---|
0:43:49 | but we do is we just haven't encoder-decoder which basically learn to map |
---|
0:43:54 | are linearise version of the and all the dependency tree |
---|
0:43:57 | in two |
---|
0:43:59 | the correct order of the lemons |
---|
0:44:02 | so it's different approach and also what we'd it's is |
---|
0:44:08 | we for twelve this work downlink problem is not so much determined by word sits |
---|
0:44:12 | model dependent on syntax |
---|
0:44:14 | so maybe we can abstracts of other words we can just get rid of the |
---|
0:44:18 | words |
---|
0:44:18 | and you know if this was reduce data about sparsity it |
---|
0:44:22 | it wouldn't we be more general we don't want you know the specific words to |
---|
0:44:26 | have an impact basically |
---|
0:44:30 | so what we did is we actually got rid of the words |
---|
0:44:34 | so here you have your input word input |
---|
0:44:38 | input dependency trees that is not older oppose the john the need for example |
---|
0:44:42 | and what we do is we linearize this tree |
---|
0:44:46 | and we remove the words so we have factored representation of each node where we |
---|
0:44:50 | keep track of you know the pos tag the parent node |
---|
0:44:54 | and can't remember with this one days |
---|
0:44:58 | i guess the position |
---|
0:45:06 | well i don't know for the |
---|
0:45:09 | zero one two okay a member |
---|
0:45:11 | anyways important point is that are we got we would within right tree and remove |
---|
0:45:15 | the words |
---|
0:45:16 | so we only keep |
---|
0:45:17 | basically postech information |
---|
0:45:19 | structural information what is apparent |
---|
0:45:22 | and i apparent |
---|
0:45:24 | and what is a grammatical relations the dependency relation between the node and the parents |
---|
0:45:29 | so here you're saying eats for example is replaced by this id one you know |
---|
0:45:34 | it's over |
---|
0:45:35 | you know the parent is a route |
---|
0:45:38 | about sorry it's populated i didn't ct of its related by the what relation to |
---|
0:45:42 | this no to the remote so if i think another example that would be clear |
---|
0:45:47 | john for example |
---|
0:45:49 | where is the subject john here is replaced by id for it's a proper noun |
---|
0:45:54 | so this is a bust act and its the subject and the i think it's |
---|
0:45:57 | missing the parent node |
---|
0:46:00 | okay so we need we delexicalised we didn't linearise and delexicalise the tree |
---|
0:46:05 | and then we build this corpus where the target is that the lexical i sequence |
---|
0:46:10 | with the correct all other so here you see that you have the proper nouns |
---|
0:46:13 | first a verb the determinant and down |
---|
0:46:16 | and basically we train a sec two sec model to go from here to here |
---|
0:46:20 | and then we have a lexical i so we keep a mapping of you know |
---|
0:46:24 | what id one is and then you generate so you can just use of the |
---|
0:46:28 | mapping to we lexicalised a sentence |
---|
0:46:32 | and what you see that it really helps |
---|
0:46:35 | so this a surface realization task is |
---|
0:46:38 | it data for everything about ten languages so it is |
---|
0:46:42 | big czech and english spanish finish french |
---|
0:46:45 | italian dutch portuguese and russian |
---|
0:46:48 | and you see here the difference between |
---|
0:46:51 | doing the sec two sick with whereas a tree |
---|
0:46:54 | contain all the words so where we haven't change of thing and the seeds doing |
---|
0:46:58 | it's without the words of the delexicalised version and you see that for all languages |
---|
0:47:02 | you get quite a big improvement in terms of bleu score |
---|
0:47:10 | and we use a similar idea here on the so this was generating from another |
---|
0:47:15 | dependency trees but |
---|
0:47:16 | there was is all the task you know generating from abstract meaning representation |
---|
0:47:21 | in fact a here we built a new data set off range but the |
---|
0:47:25 | so here is the same idea we represent the notes by a concatenation of those |
---|
0:47:29 | the factored model where |
---|
0:47:30 | each node is the concatenation of different types of embedding so you know it's a |
---|
0:47:34 | can take on the post like the numbers and the and morphological syntactic features |
---|
0:47:39 | and again we delexicalise everything |
---|
0:47:42 | and again we found oops |
---|
0:47:45 | yes and again we found that you know that get this improvement so this is |
---|
0:47:49 | a baseline weights not the lexical i and this is that when we delexicalise so |
---|
0:47:54 | you get two points improvements |
---|
0:48:01 | okay |
---|
0:48:03 | so i think as i mentioned the beginning of and you know the that is |
---|
0:48:06 | that they don't have they are not very being so the in particular for example |
---|
0:48:09 | the surface realization challenge |
---|
0:48:12 | is |
---|
0:48:15 | it is you know there is a training sets a like a two thousand packets |
---|
0:48:22 | so you have to be a little bit |
---|
0:48:24 | sometimes you have to be clever or |
---|
0:48:27 | constructive in what to do with the training data |
---|
0:48:29 | one thing we found is |
---|
0:48:32 | it is often useful to |
---|
0:48:37 | to extend your training data outweighs information that is implicit it's in a to be |
---|
0:48:42 | found |
---|
0:48:44 | is that the implicit in the available training data |
---|
0:48:48 | so again going back to this example where the problem was to we members of |
---|
0:48:53 | the |
---|
0:48:53 | the problem is to |
---|
0:48:54 | we attack the problem by having this classifier that would that i mean |
---|
0:48:59 | so delatable they're of a parent and child |
---|
0:49:03 | and so you know you know the on your training data was like this you |
---|
0:49:07 | had you had the parent and you had the child and you had the position |
---|
0:49:10 | of the child with respect as a parent |
---|
0:49:16 | and this is that we had and we folds well you know it should learn |
---|
0:49:19 | that if this is true then also this is two days that the if the |
---|
0:49:24 | you know |
---|
0:49:26 | if the if the chinese to the left of the parents it should also learned |
---|
0:49:30 | somehow that the parent is to the right of the chart |
---|
0:49:36 | but in fact we found that it |
---|
0:49:37 | didn't learns that we have also what we did is we just and it was |
---|
0:49:41 | payers whenever we had these spans in the training data we add the despair to |
---|
0:49:46 | the training data so we double the size of the training data |
---|
0:49:50 | but we also give more explicit information about what possible constraints there are so usually |
---|
0:49:56 | you know the subject it before the verb at a us thing you know and |
---|
0:50:00 | the verb is after the subject |
---|
0:50:03 | and again you know you see that there is a large improvement |
---|
0:50:14 | and also went swimming poof expands the data is to use competition linguistic tools that |
---|
0:50:19 | are available and that was done already in two thousand and seventeen billion these constants |
---|
0:50:25 | where the idea is that so this was for this so generating for amr data |
---|
0:50:29 | so the training data was |
---|
0:50:32 | it or manually validated or constructed |
---|
0:50:35 | for the task for the for the shared task but in fact there are a |
---|
0:50:39 | semantic parser that if you give them a sentence they will give you the error |
---|
0:50:43 | more so i mean they are not one hundred percent reliable but they can produce |
---|
0:50:49 | a tamil |
---|
0:50:50 | so what you can do is you just generate a lot of you generate a |
---|
0:50:54 | lot of training data are by simply using a semantic parser on available data |
---|
0:50:58 | and this was i think what about constancy so you basically part two hundred thousand |
---|
0:51:03 | you get what sentences with this semantic parser |
---|
0:51:07 | and then so you do some pre-training on this data and then you do some |
---|
0:51:11 | fine tuning on the other shan't as the test set |
---|
0:51:18 | and so we |
---|
0:51:20 | we use this again you know for the first approach i show the good the |
---|
0:51:24 | graph encodes the dual top-down bottom-up |
---|
0:51:26 | i think writing approach and again we so what you know like the other approaches |
---|
0:51:29 | we should we see that knows this really improve performance |
---|
0:51:36 | case on getting to the ends |
---|
0:51:38 | so you know i mentioned some |
---|
0:51:41 | some things you can do a better encoding of your input and better training data |
---|
0:51:46 | there's of course many open issues |
---|
0:51:49 | some of them the affine particularly interesting is a multilingual generation so we saw in |
---|
0:51:54 | the surface realization shared task there are ten languages but is still a reasonably simple |
---|
0:51:59 | that's and you can have some data |
---|
0:52:02 | they take it on the universal dependency tree banks |
---|
0:52:06 | it so what would be interesting is you know how can you generates in multiple |
---|
0:52:10 | languages from data from knowledge bases |
---|
0:52:12 | or even from texas a few at simplifying can you simply find different languages |
---|
0:52:18 | they all cosine supposed to be seen interpretability issues |
---|
0:52:23 | i that's at the beginning you noses the standard approach is this encoder and decoder |
---|
0:52:27 | into and approach |
---|
0:52:29 | which that the wherewithal those modules that we had before |
---|
0:52:32 | but in fact now |
---|
0:52:33 | you know one way to make the mobile more interpretable is to have to reconstruct |
---|
0:52:38 | those modules right so instead of having to sing a mandarin system |
---|
0:52:41 | you had difference different networks for each of the of the task and people are |
---|
0:52:46 | starting to work on this in particular is a fine to cost |
---|
0:52:51 | coarse-to-fine approach |
---|
0:52:52 | where you first for example generate the structure of the text and then you feel |
---|
0:52:56 | in the details of example |
---|
0:52:59 | and generalized inverse is memorising |
---|
0:53:02 | they have been problems with you know that that's that are very repetitive and it's |
---|
0:53:07 | really important to have very good test sets to control for the test set and |
---|
0:53:10 | for example a lot of the data a lot of the shared task do not |
---|
0:53:14 | provide the sort of unseen test set in the sense that |
---|
0:53:19 | i don't know you add generating newspaper texts about you would like the test set |
---|
0:53:23 | also to contain a the test for you know what happen if you applied your |
---|
0:53:28 | mobile two different types of text one so i think this you know having sort |
---|
0:53:32 | of out of main test set is really important for general for the for testing |
---|
0:53:36 | so did normalization of the system is also linked to you know what can you |
---|
0:53:39 | do with also learning of them in addition to go from one a type of |
---|
0:53:43 | text another |
---|
0:53:44 | and that's it thank you |
---|
0:53:56 | is a personal questions |
---|
0:54:07 | i one you shown some results of acceptability of text generation that was something like |
---|
0:54:13 | seven five percent before was sixty percent somewhere in the middle |
---|
0:54:18 | thus annotation and i want to show that to ask you is this like a |
---|
0:54:22 | zero one people say are you accept or and not accept or dts you've a |
---|
0:54:28 | degree was like i don't know so i'd human evaluation you mean yes the evaluation |
---|
0:54:32 | human annotation is usually on a scale from one to five |
---|
0:54:36 | okay because you shown to be percentage i think |
---|
0:54:41 | at some points so i'm wondering if that is |
---|
0:54:50 | readability |
---|
0:54:52 | sorry to compare so they know what in this case in which they just compare |
---|
0:54:58 | the output of two systems |
---|
0:55:00 | so the compare the output of the second set to the output of the crisis |
---|
0:55:04 | ten |
---|
0:55:04 | and then they and they say which ones they prefer |
---|
0:55:07 | so the percent is like sixty four people prefer this one |
---|
0:55:17 | if you have them from one to five okay that because this is preference test |
---|
0:55:23 | right do we know in these similar or reliability |
---|
0:55:31 | good is the score between one to five |
---|
0:55:34 | no i think i think a four |
---|
0:55:41 | to get back to the paper i don't remember |
---|
0:55:44 | but i think it is not is not that one to five or that was |
---|
0:55:47 | wrong so it's been here comparison between two different if they have to do this |
---|
0:55:52 | often systems that don't know which one and they have to say which one day |
---|
0:55:54 | before |
---|
0:56:00 | hi |
---|
0:56:07 | okay |
---|
0:56:09 | the quite |
---|
0:56:15 | so i like to thank you for the great i since you cover their summer |
---|
0:56:22 | kinds of generation start many in the relationship a slu engine texts first summarization you |
---|
0:56:30 | can generate x is very a |
---|
0:56:33 | conversational system and was curious is that's |
---|
0:56:39 | a very different kind of problem the architecture and main a state-of-the-art approach how to |
---|
0:56:46 | converge at that's another architecture they are very different by different sets of that well |
---|
0:56:53 | so |
---|
0:56:54 | so the question is whether we have very different your approach for the given depending |
---|
0:56:59 | on the input of different on the task |
---|
0:57:01 | this version task |
---|
0:57:03 | so i that initially for all online photo for years everybody was using this encoder-decoder |
---|
0:57:10 | are often within a recurrence encoder |
---|
0:57:14 | and the difference is where what was the input space so in that i don't |
---|
0:57:19 | for example |
---|
0:57:21 | going to take as input |
---|
0:57:23 | the current dialog the user turn press maybe the context or receiving some information about |
---|
0:57:28 | the dialogue right |
---|
0:57:30 | if you adding question answering qa take a question |
---|
0:57:33 | and as a supporting evidence |
---|
0:57:35 | so it was really more about |
---|
0:57:38 | you know which kind of input you had and that was the only difference but |
---|
0:57:41 | now more and more and then people are paying attention to you know |
---|
0:57:45 | in fact there are differences between this task so what is the structure of an |
---|
0:57:48 | input what is the goal do you want to focus on identifying important information or |
---|
0:57:54 | you know |
---|
0:57:55 | the problems are very think it remains very different a so you have to tutor |
---|
0:57:59 | so very different problems in a way so this was but that was trying to |
---|
0:58:02 | show in fact |
---|
0:58:04 | but the fact that dialogue and generation there is a thing people at time different |
---|
0:58:10 | approaches to do the encoder and decoder and that problem is not very is |
---|
0:58:15 | and you can you can learn to see of these units that are a lot |
---|
0:58:18 | campaign see that encoding okay that the time things is the decoder generates things not |
---|
0:58:26 | able to generate things more san okay and a and various state transition now |
---|
0:58:33 | yes so there is a known problem your data set and that the potential generate |
---|
0:58:38 | very generic answers like i don't know or maybe you're not very informative we actually |
---|
0:58:44 | working on |
---|
0:58:48 | using external information |
---|
0:58:50 | to produce to have direct systems that |
---|
0:58:53 | actually are produced more informative and so the idea in this case is high or |
---|
0:58:57 | the problem is how you're achieve so you have your dialogue context you have euro |
---|
0:59:01 | user question no user turn |
---|
0:59:04 | and it's a bit similar to the text approach to produce you do you look |
---|
0:59:09 | on the web or in some sources all for some information that is relevant to |
---|
0:59:14 | a what is being discussed |
---|
0:59:16 | and you in the range and so now on you and we joint but with |
---|
0:59:19 | this additional information and the whole busy gives you more informative dialogue so instead of |
---|
0:59:25 | you know avoids is empty utterances |
---|
0:59:27 | but the system now hides all this knowledge it can use to generate more informative |
---|
0:59:31 | response |
---|
0:59:34 | so there are a number of you know calmly i two challenge for example is |
---|
0:59:38 | using providing this kind of data sets where you have it dialogue plus some additional |
---|
0:59:43 | information related to the topic of the dialogue or chat image where you have |
---|
0:59:48 | you have any image |
---|
0:59:50 | and so there's a dialogue is based on the image |
---|
0:59:52 | so again you can is it the dialogue system should actually use the content of |
---|
0:59:58 | the image to provide some informative |
---|
1:00:04 | and so a again this slide is something i think what you have really human |
---|
1:00:10 | in evaluation is i think is something that speaks to a lot of people in |
---|
1:00:12 | this room because it's at least for speech it's been shown that you really need |
---|
1:00:17 | humans to george whether or not something is adequate and natural and many of those |
---|
1:00:22 | things so |
---|
1:00:24 | i wonder that because this to my understanding was perhaps the only subjective human evaluation |
---|
1:00:30 | results double contain only slides so mostly people optimising towards objective metrics do you think |
---|
1:00:39 | there is a risk of overfitting to these metrics maybe in particular tasks or so |
---|
1:00:44 | on |
---|
1:00:45 | do you see where do you see the role of the humans judging generated text |
---|
1:00:51 | in your field now one in the future |
---|
1:00:54 | so the human evaluation of sas will important because is automatic metrics |
---|
1:00:59 | that you need them you need |
---|
1:01:00 | two dev larger system and you need them to compare you know exhaust yet is |
---|
1:01:04 | you have a the output of many system so you need some automatic metrics |
---|
1:01:08 | but they are imperfect right |
---|
1:01:11 | so you all you also need you meant evaluation |
---|
1:01:14 | often the shared task actually organiser human evaluation and they |
---|
1:01:18 | they do this i mean it's i think it's getting better and better because getting |
---|
1:01:21 | a people are getting more experience |
---|
1:01:23 | and they are better and better platform and you know i'm guidelines don't know how |
---|
1:01:27 | to do this |
---|
1:01:29 | we are not optimising with respect to those human |
---|
1:01:33 | objective because it's just impossible right so the over fitting would be a with respect |
---|
1:01:38 | to the training data where you do you know maximum likelihood |
---|
1:01:43 | trying to maximize the likelihood of the training data mostly using cross-entropy so they are |
---|
1:01:48 | say there is some oracles on using a reinforcement learning where you optimize with respect |
---|
1:01:53 | to actually your evaluation metrics for the rouge number |
---|
1:01:58 | that morning spent reading |
---|
1:02:02 | we could and you so to me it's is that the main problem that you |
---|
1:02:07 | cup is kind of during the |
---|
1:02:09 | student a task and right beyond is the about this that's correct |
---|
1:02:13 | looking internet finally the information that you want them right m is the about that |
---|
1:02:17 | my question is very often the type of owns for that you give depends on |
---|
1:02:21 | the type of person that is going to receive is what you will employ the |
---|
1:02:24 | same and sort of you are going to give to work during your soul child |
---|
1:02:28 | or to an expert in the figures that will go with screen your |
---|
1:02:31 | is there any research on how can you can be sooner or limit |
---|
1:02:35 | the answers so that it fits the user |
---|
1:02:40 | there really but i can think of ways right now i mean there is a |
---|
1:02:44 | people often find that you if you have some kind of parameter like this that |
---|
1:02:49 | you want to used to influence the outputs so you have wine input and then |
---|
1:02:55 | you want to different outputs depending on this parameter |
---|
1:02:58 | often just i think this to your training data actually helps a lot |
---|
1:03:04 | i think this was in |
---|
1:03:06 | so people do this with emotions |
---|
1:03:09 | for example should that x p us either should be happy so are there is |
---|
1:03:15 | a might use there is they want to use emotion detector and emotion you know |
---|
1:03:19 | some something that gives you in any motion tag for the for the sentence and |
---|
1:03:23 | then they would |
---|
1:03:25 | produce |
---|
1:03:26 | i mean you need to the training data right but if you |
---|
1:03:30 | if you can have this training data and you can label the training example with |
---|
1:03:35 | the |
---|
1:03:36 | personalities that you want to generate for then it works reasonably well in fact the |
---|
1:03:41 | chat image a data it's nice place |
---|
1:03:47 | it's |
---|
1:03:49 | the dialogue is the dialogue has you have four same image you might have different |
---|
1:03:54 | dialogue depending on the |
---|
1:03:56 | personality so there's i-th input as a personality and the personalities is like something like |
---|
1:04:01 | two hundred and fifty personalities |
---|
1:04:04 | can be you know |
---|
1:04:06 | jockeying serious or whatever and so they had the database the training data taking into |
---|
1:04:11 | account this personality |
---|
1:04:13 | so you can generate dialogues that have |
---|
1:04:15 | about the same image |
---|
1:04:16 | with different on depending on the personal but for example within the same or what |
---|
1:04:22 | would be possible to open a constraint on be the vocabulary but you can use |
---|
1:04:26 | on the output |
---|
1:04:27 | yes |
---|
1:04:29 | so in the encoder decoder |
---|
1:04:30 | yes you could do that |
---|
1:04:33 | this is not something normally people do the just use the whole vocabulary and then |
---|
1:04:37 | they hope that the model is going to learn to focus on that vocabularies that |
---|
1:04:42 | correspond to certain feature but maybe you could say that |
---|
1:04:57 | you already mentioned it somewhat but |
---|
1:05:00 | this you also raises or effective questions on the right the text more than maybe |
---|
1:05:06 | in synthesis |
---|
1:05:08 | that you really need to get it right or two |
---|
1:05:11 | you have some other problems consistency or units indication of something bend it is |
---|
1:05:18 | c |
---|
1:05:19 | this is the proposed a statistical approach or can you |
---|
1:05:24 | can you solve this |
---|
1:05:26 | well that's a i mean i think one |
---|
1:05:30 | when the problem i think that is the l c of the i c with |
---|
1:05:34 | the with the current approach to new approach to generation is that they're not necessarily |
---|
1:05:39 | semantically faithful what okay thing right so you know that they can print think that |
---|
1:05:45 | have nothing to do with the input we can see the problem |
---|
1:05:47 | i'm not sure a syntactical problem in the sense you know that generators are not |
---|
1:05:51 | really out in the one that are not super useful either but in an application |
---|
1:05:55 | so you know for people for industrial people want to develop application clearly it's a |
---|
1:06:00 | problem right because there you don't want to sell a generative that |
---|
1:06:04 | that is not faithful |
---|
1:06:08 | but i mean ethical problems we have plenty in general in nlp |
---|
1:06:17 | that's not time we had so that's a think is documents |
---|