0:00:17 | but we have a session sure |
---|
0:00:24 | okay thank you got real so that we want to the office keynote |
---|
0:00:34 | so using this time i vector you do not use |
---|
0:00:37 | the first keynote |
---|
0:00:40 | the first keynote speaker is needed about the |
---|
0:00:43 | the proposal |
---|
0:00:44 | school of informatics university obeyed embark |
---|
0:00:47 | but not be due to use a proper so natural language processing in the school |
---|
0:00:52 | of informatics that's the university of edinburgh okay |
---|
0:00:55 | how results will can see his own |
---|
0:00:59 | one getting compute us to understand reasonably and generate natural language so zero talk about |
---|
0:01:06 | that he's got how kind of research activities |
---|
0:01:11 | there's a more information on the other proceedings a node but |
---|
0:01:16 | she doesn't |
---|
0:01:18 | okay right okay you can hear me i sat right |
---|
0:01:22 | okay at |
---|
0:01:23 | right like that what it like a was saying earlier this talk is gonna be |
---|
0:01:27 | about learning |
---|
0:01:28 | natural language interfaces with neural models |
---|
0:01:31 | and so i'm gonna give you a bit of |
---|
0:01:33 | and introduction as to what these natural language interfaces are |
---|
0:01:38 | and then we're gonna see how we build a more problems are related to them |
---|
0:01:41 | and you know what future lies ahead |
---|
0:01:44 | okay so what he's a natural language interface it's the most intuitive thing one wants |
---|
0:01:51 | to do to a computer |
---|
0:01:53 | you just want to speak to it the computer in an ideal world understands and |
---|
0:02:00 | executes what you wanted to do |
---|
0:02:02 | and this |
---|
0:02:03 | billy don't know it is like one of the first things that people |
---|
0:02:06 | wanted to do with nlp so in the sixties |
---|
0:02:10 | when we didn't have computers the computers didn't have memory |
---|
0:02:13 | we didn't have neural networks none of this |
---|
0:02:15 | the first systems that appeared out there |
---|
0:02:19 | had to do |
---|
0:02:21 | speaking to the computer |
---|
0:02:23 | and |
---|
0:02:23 | getting some response so green at are in nineteen fifty nine |
---|
0:02:29 | presented this system called the conversation machine |
---|
0:02:32 | and this was the system that was having conversations with a human can people guests |
---|
0:02:38 | or know what about |
---|
0:02:41 | the weather |
---|
0:02:43 | well it's always the weather |
---|
0:02:45 | first the weather and then everything else so that they said okay the what the |
---|
0:02:49 | weather is a bit boring let's talk about baseball |
---|
0:02:51 | and this work very primitive systems they just had models they had grammars you know |
---|
0:02:56 | it was all manual but the intent was there we want to communicate with computers |
---|
0:03:02 | well in a little bit more formally |
---|
0:03:04 | what the task entails |
---|
0:03:06 | is we have a natural language |
---|
0:03:08 | and natural language has to be translated |
---|
0:03:11 | by what you see the arrow thereby parser you can think of it as a |
---|
0:03:15 | model or some black box the takes the natural language |
---|
0:03:18 | and translates it |
---|
0:03:19 | to something |
---|
0:03:21 | the computer can understand |
---|
0:03:22 | and this cannot be natural language had it must be |
---|
0:03:25 | either sql or lambda calculus or some internal representation that the computer as |
---|
0:03:33 | to give you an answer |
---|
0:03:34 | okay |
---|
0:03:35 | so as an example |
---|
0:03:37 | it is again has been very popular within the semantic parsing field you query a |
---|
0:03:42 | database |
---|
0:03:44 | but you actually don't want to learn the syntax of the database and you don't |
---|
0:03:47 | want to learn a square you just ask the question what are the copy those |
---|
0:03:51 | of states bordering texas |
---|
0:03:53 | you translate these into these logical form you see down there |
---|
0:03:58 | okay you don't need to understand this is just something that the computer understands you |
---|
0:04:02 | can see there is variables it's a form a language |
---|
0:04:05 | and then you get the answer and i'm not gonna tell you the answer you |
---|
0:04:07 | can see here texas is bordering a lot of states |
---|
0:04:11 | now i start from asking data bases the questions another task and this is an |
---|
0:04:18 | actual task that people have deployed in the real world |
---|
0:04:21 | is instructing a role board to do something that you wanted to do |
---|
0:04:25 | again this is a another example you can tell the robot if you have it |
---|
0:04:29 | one of this little robots of make you coffee and you know go up and |
---|
0:04:32 | down the corridor |
---|
0:04:33 | you can say at the chair move forward three steps past the sofa |
---|
0:04:38 | again the robot has to translate this into some internal representation but you understands |
---|
0:04:43 | in order not to crash against the software |
---|
0:04:48 | another example is actually doing question answering and |
---|
0:04:52 | a there is a lot of systems like this using a big knowledge base like |
---|
0:04:58 | freebase doesn't exist anymore |
---|
0:05:01 | it's called knowledge graph |
---|
0:05:03 | but this is issue much graph with millions of entities and connections between them |
---|
0:05:07 | and the delayed congolese using |
---|
0:05:09 | it's when you ask a question i mean to have many modules but one of |
---|
0:05:13 | them is that |
---|
0:05:14 | so |
---|
0:05:14 | one of the questions you may want to ask is for the male actors in |
---|
0:05:18 | the titanic and again this has to be translated |
---|
0:05:21 | in some language |
---|
0:05:23 | that freebase or your knowledge graph understands and you can see here this is expressed |
---|
0:05:27 | in |
---|
0:05:28 | lambda calculus but you have to translate it meant that some sql that the freebase |
---|
0:05:33 | again |
---|
0:05:34 | understand |
---|
0:05:35 | so you see there is many applications in the real world of that |
---|
0:05:39 | necessitate semantic parsing or some interface with a computer |
---|
0:05:44 | and |
---|
0:05:45 | here comes the man himself so bill gates |
---|
0:05:48 | the costume mit publishes this technology review it's actually |
---|
0:05:54 | very interesting i suggest that you take a look |
---|
0:05:56 | and it's not very mit centric they talk about many things |
---|
0:06:00 | and so this year they when an asked a bill gates they said to him |
---|
0:06:04 | okay what do you think are the new technological breakthroughs theme pensions the of two |
---|
0:06:09 | thousand nineteen |
---|
0:06:10 | that will actually change the world |
---|
0:06:12 | and so if you read the review |
---|
0:06:14 | he starts by saying you know i want to be able to detect premature babies |
---|
0:06:19 | fine |
---|
0:06:20 | then he says you know with a couple free burger |
---|
0:06:23 | so no meat |
---|
0:06:24 | you make a burgers so you know because the world has so many animals |
---|
0:06:28 | then he talks about drugs for cancer and the very last |
---|
0:06:33 | he's |
---|
0:06:35 | smooth talking ai assistance so semantic parsing comes last which means that you know it's |
---|
0:06:41 | very important to bill gates |
---|
0:06:43 | now |
---|
0:06:44 | i don't know why i mean no why |
---|
0:06:47 | but anyway he thinks it's really cool |
---|
0:06:50 | and |
---|
0:06:50 | of course is not only bill gates |
---|
0:06:53 | every company you can fit coref has a smooth talking a is system or is |
---|
0:06:58 | working on one |
---|
0:07:00 | or using the back of their head or they have prototypes |
---|
0:07:02 | and there's so many of them |
---|
0:07:05 | i so i'll xa is your sponsor |
---|
0:07:08 | there is cortana a context has at least what will |
---|
0:07:13 | decided to be different of the call it will hold not some female name |
---|
0:07:17 | then god |
---|
0:07:18 | so there is get salience of these things |
---|
0:07:21 | and can i see is shorthand how many people have one of them at home |
---|
0:07:27 | very good |
---|
0:07:28 | do you think do you think that work |
---|
0:07:31 | how many how do you think they work |
---|
0:07:35 | exactly so here i want this think the set alarms for me all the time |
---|
0:07:42 | i mean it they work if you're in the kitchen they use a lexus set |
---|
0:07:45 | for half an hour |
---|
0:07:47 | or can do you have to monitor the kids homework |
---|
0:07:49 | but |
---|
0:07:50 | we want these things to |
---|
0:07:52 | go beyond simple commands |
---|
0:07:56 | now i'll just show here |
---|
0:07:58 | and there is the reason why there's so much talk about these smooth talking i |
---|
0:08:01 | assistance because |
---|
0:08:03 | they could have in society a four |
---|
0:08:06 | not able people for people who cannot see for people who are you know are |
---|
0:08:10 | disabled |
---|
0:08:11 | is actually pretty huge if it worked |
---|
0:08:14 | now i'm gonna show here |
---|
0:08:18 | if we deal |
---|
0:08:19 | the video is the parity of i'm as an l x to |
---|
0:08:23 | and you see it and then you understand immediately |
---|
0:08:26 | why |
---|
0:08:28 | there's no sound |
---|
0:08:30 | hello |
---|
0:08:33 | we check the sound as well before |
---|
0:08:39 | should i do something |
---|
0:08:44 | i raise of the volume is raised |
---|
0:08:48 | to the max |
---|
0:08:53 | amazon and everyone asking for help |
---|
0:09:05 | technology isn't always easy to use for people others are you thinking |
---|
0:09:12 | that's why i was on par with a darpa to present amazon so we only |
---|
0:09:20 | smart speaker device designed specifically we used five greatest generation it's to rule out and |
---|
0:09:26 | response in even remotely close to |
---|
0:09:31 | and there is a forty i agree i |
---|
0:09:48 | i |
---|
0:09:58 | i |
---|
0:10:03 | no |
---|
0:10:05 | using hold true |
---|
0:10:08 | one two three |
---|
0:10:21 | this is like your thermostat i was set to ten |
---|
0:10:25 | i one |
---|
0:10:28 | i feel may have |
---|
0:10:30 | you amazon co silver placed on the music they loved when they were a |
---|
0:10:44 | it also has a quick skin feature to help them find things |
---|
0:10:50 | right |
---|
0:10:55 | feature for a long rambling stories i is the one i |
---|
0:11:03 | so i |
---|
0:11:07 | i really great of yours did i say yours today to them as a nickel |
---|
0:11:17 | silver said to check or money order to do not go right i think that's |
---|
0:11:21 | not exist |
---|
0:11:22 | okay |
---|
0:11:23 | it's saturday night live sketch |
---|
0:11:25 | but you can see how we could help the elderly |
---|
0:11:30 | or those in need it could to remind you for example to take two pills |
---|
0:11:33 | or you know it could help you feel more comfortable in your own home |
---|
0:11:38 | now |
---|
0:11:40 | let's get a bit more formal a so what are we going to try to |
---|
0:11:43 | do here we will try to learn this mapping from the natural language |
---|
0:11:48 | to the |
---|
0:11:49 | for remote |
---|
0:11:50 | representation that the computer understands and the landing setting is we have |
---|
0:11:55 | sentence logical form |
---|
0:11:57 | and biological form i will use the terms logical form |
---|
0:12:01 | meaning representations interchangeably because |
---|
0:12:05 | the model so will be talking about do not care about what the |
---|
0:12:09 | meaning representation is what the program if you like that the computer will execute days |
---|
0:12:15 | so we assume we have sentence logical form pairs |
---|
0:12:19 | and this is a setting the most of the work has focused on a previously |
---|
0:12:26 | so it's like machine translation but except that you know the target is a an |
---|
0:12:32 | executable and which now |
---|
0:12:33 | this task |
---|
0:12:35 | is harder than it seems for three reasons |
---|
0:12:38 | first of all |
---|
0:12:40 | their ease |
---|
0:12:41 | it's severe mismatch between |
---|
0:12:44 | d natural language and the logical form |
---|
0:12:49 | so if you look at this example how much does it cost a flight to |
---|
0:12:53 | boston |
---|
0:12:54 | and look at the representation here |
---|
0:12:56 | you will immediately notice that |
---|
0:12:59 | they're not very similar this structures mismatch |
---|
0:13:02 | and a only there is a mismatch between the logical form |
---|
0:13:06 | and the natural language string |
---|
0:13:09 | but also its syntactic representation so you couldn't even using text if you wanted to |
---|
0:13:13 | get the matching |
---|
0:13:15 | so here for example |
---|
0:13:17 | flight |
---|
0:13:18 | would align to fly |
---|
0:13:20 | and two and boston to boston but then fair corresponds to these huge natural language |
---|
0:13:26 | phrase how much does it cost and the system must |
---|
0:13:30 | in federal of that |
---|
0:13:32 | now |
---|
0:13:33 | this is the first challenge of destruction mismatching |
---|
0:13:36 | the second challenge has to do with the fact |
---|
0:13:39 | that |
---|
0:13:40 | the former language |
---|
0:13:42 | the program if you like that we have to execute with a computer |
---|
0:13:46 | has structure any has to be well-formed |
---|
0:13:50 | you cannot just generate anything and hope that the computer will give you an answer |
---|
0:13:54 | so this is a structure prediction problem and |
---|
0:13:57 | if you look here for the male actors and the titanic there is |
---|
0:14:01 | three mating representations |
---|
0:14:03 | do people see which one is the right one |
---|
0:14:06 | i mean they all look similar you have to squint that it |
---|
0:14:09 | the first one |
---|
0:14:12 | hasn't bound variables the second one has apparent this is that is missing |
---|
0:14:17 | so the only right one is the last one |
---|
0:14:20 | you cannot do it approximately |
---|
0:14:22 | it's not like machine translation you're gonna get the gist of it you actually need |
---|
0:14:25 | to get the right logical form of that executes the computer |
---|
0:14:29 | now the fact challenge |
---|
0:14:31 | and this is when you deploy google holman lx that the people who developed these |
---|
0:14:35 | things immediately notice is that people will say |
---|
0:14:38 | i mean |
---|
0:14:40 | so |
---|
0:14:41 | the same intent can be realized in very many different expressions who created microsoft |
---|
0:14:47 | microsoft was created by |
---|
0:14:50 | who founded microsoft qualities the founder of microsoft and so on and so forth |
---|
0:14:55 | and all that maps to this little bit from the knowledge graph which is |
---|
0:15:01 | well under bill gates are the founders of microsoft |
---|
0:15:04 | and we have to be able the system has to be able you're semantic parser |
---|
0:15:08 | to actually deal |
---|
0:15:09 | we've all of these |
---|
0:15:10 | different ways that we can express |
---|
0:15:13 | are intent |
---|
0:15:14 | okay |
---|
0:15:15 | so in this talk we have three parts |
---|
0:15:18 | well actually three parts so first i'm gonna show you how with neural models we |
---|
0:15:23 | are dealing with this |
---|
0:15:24 | structural mismatch |
---|
0:15:25 | using something that is very familiar to all of you the encoder decoder paradigm |
---|
0:15:30 | then i will talk about the |
---|
0:15:33 | structure prediction problem and the fact that you're and not if you're like your formal |
---|
0:15:38 | representation has to be well-formed using this coarse to fine decoding algorithm i will explain |
---|
0:15:43 | it and then finally i will show you solution to the coverage problem |
---|
0:15:49 | okay |
---|
0:15:49 | now i should point out that there are many more challenges that and are there |
---|
0:15:53 | and i'm not going to talk about but it's good to flag of them |
---|
0:15:57 | where do we get the training data from so i told you that we have |
---|
0:16:00 | to have |
---|
0:16:01 | natural language logical form pairs to train the models for creates this and some of |
---|
0:16:06 | it is like i actually quite complicated |
---|
0:16:08 | what happens if you have out-of-domain queries if you have a parser trained on one |
---|
0:16:12 | domain let's say the weather and then you want to use it for baseball |
---|
0:16:17 | what happens if you don't have actually only |
---|
0:16:20 | independent questions and answers but you have codependent there's coreference between the aquarius now we're |
---|
0:16:26 | getting into the territory of dialogue |
---|
0:16:29 | what's with speech we all pretend here that speech is to solve problem it is |
---|
0:16:33 | and a lot of times alexi doesn't understand children doesn't in the some people with |
---|
0:16:37 | accents like me |
---|
0:16:39 | and then you talk to design wasn't people and you say but okay so do |
---|
0:16:42 | you use the lattice and the good old the lattice we use on a lattice |
---|
0:16:46 | of one because you know |
---|
0:16:48 | if it it's to each slows us down the so there is many |
---|
0:16:52 | technical and actual a challenge is that you know |
---|
0:16:56 | have to all work together to make this work this thing work |
---|
0:16:59 | okay |
---|
0:17:00 | so let's talk about the structure mismatches |
---|
0:17:03 | and so here the model is something you all must be a bit familiar with |
---|
0:17:08 | and it's |
---|
0:17:09 | one of the like |
---|
0:17:11 | there is three or four things with neural models that get a recycled a over |
---|
0:17:16 | and over again the encoderdecoder framework is one of them |
---|
0:17:19 | so we have natural language as input |
---|
0:17:22 | we encoded with using an lstm or whatever favourite model you have a you can |
---|
0:17:28 | use a transform all the transformers don't work for this task |
---|
0:17:31 | but well because the datasets are small |
---|
0:17:34 | whatever the next thing is you encoded you get a vector out of it then |
---|
0:17:38 | these encoded vector is serves as an input to |
---|
0:17:41 | another lstm that actually decoded into |
---|
0:17:46 | and logical form |
---|
0:17:47 | and you will not use here i say you decoded into a sequence |
---|
0:17:51 | or a tree |
---|
0:17:53 | i will not talk about trees but i should flak that there is a lot |
---|
0:17:57 | of work trying to decode |
---|
0:17:59 | the natural language into this tree structure which makes sense since |
---|
0:18:04 | the logical form has structures there's parentheses there is a there is a recursive |
---|
0:18:10 | however in my experience these models |
---|
0:18:13 | are weighted complicated to get to work |
---|
0:18:15 | and |
---|
0:18:17 | the advantage over the assuming that the logical form is a sequence is not that |
---|
0:18:21 | great so for the rest of the talk we will assume that we have sequences |
---|
0:18:25 | in and we get sequences out and we will pretend |
---|
0:18:28 | but the logical form is a sequence even though it isn't |
---|
0:18:32 | okay |
---|
0:18:33 | a little bit formally the model will map |
---|
0:18:36 | the natural language input |
---|
0:18:38 | which is a sequence of tokens x to logical form |
---|
0:18:41 | representation of its meaning a which is a sequence of tokens y |
---|
0:18:46 | and we are modeling the probability of |
---|
0:18:49 | the |
---|
0:18:50 | input |
---|
0:18:51 | given |
---|
0:18:51 | the representation of the meaning |
---|
0:18:53 | and the encoder |
---|
0:18:56 | we'll just in called the language into the vector this vector then will be fed |
---|
0:19:01 | into the decoder which will the generated conditioned on the encoding vector |
---|
0:19:05 | and of course we have the |
---|
0:19:08 | very important |
---|
0:19:09 | attention here the attention mechanism that the original models did not use attention but then |
---|
0:19:16 | everybody realised in particular in semantic parsing it's very important because it deals with this |
---|
0:19:22 | structure mismatching problem |
---|
0:19:25 | so i'm assuming people are familiar here it instead of actually generating the tokens in |
---|
0:19:32 | the logical form one by one without considering the input the attention will look at |
---|
0:19:37 | the input be able |
---|
0:19:38 | wait |
---|
0:19:39 | the output given the input and you will get things you will get some sort |
---|
0:19:44 | of certainty that you know |
---|
0:19:46 | if to generate mountain maps two mountain in my input |
---|
0:19:52 | now |
---|
0:19:53 | this is a very sort of simplistic view of semantic parsing |
---|
0:19:58 | it assumes that not only natural language is a string |
---|
0:20:01 | but what the logical form |
---|
0:20:03 | does is also a string and |
---|
0:20:06 | and this may be okay but maybe it isn't |
---|
0:20:10 | there is a problem so i and i'll explain |
---|
0:20:12 | so we train this model by maximizing the likelihood of the logical forms |
---|
0:20:17 | given the natural language input to this is a standard |
---|
0:20:21 | its time |
---|
0:20:22 | we have to predict the locks the logical form that for any input utterance |
---|
0:20:28 | and we have to find the one that actually maximizes this probability |
---|
0:20:33 | of the output given the input |
---|
0:20:35 | now trying to find this |
---|
0:20:38 | argmax can be very computationally intensive and if you're google you can do beam search |
---|
0:20:43 | if you're university of edinburgh you just too greedy search any works just fine |
---|
0:20:50 | now |
---|
0:20:52 | can people see the problem with this assumption of actually decoding into a string |
---|
0:20:58 | remember the second problem but i said we have these we have to make sure |
---|
0:21:03 | that the logical form is a well formed |
---|
0:21:07 | and by assuming that everything is a sequence i have no way to check for |
---|
0:21:12 | example that my parentheses are being matched |
---|
0:21:15 | i don't all these because i've forgotten what i've generated |
---|
0:21:19 | so i keep going to get mine at some point i |
---|
0:21:22 | it he the end of sequence and that's it |
---|
0:21:24 | so we actually want |
---|
0:21:26 | should be able to enforce some constraints of well formedness on the output |
---|
0:21:32 | so how are we gonna do that |
---|
0:21:34 | we're gonna do this with this idea of coarse to fine decoding which i'm gonna |
---|
0:21:38 | explain |
---|
0:21:39 | so again we will have are not sure language input here all slides from dallas |
---|
0:21:43 | before ten am |
---|
0:21:45 | and i what we would do before is we will be called the entire |
---|
0:21:49 | natural language string into this logical form representation but now what can insert a second |
---|
0:21:55 | stage |
---|
0:21:56 | where we first |
---|
0:21:58 | the cold |
---|
0:21:59 | to a meaning sketch |
---|
0:22:01 | what the meeting's sketch does is it abstracts away details |
---|
0:22:05 | from the very detailed logical form it's an abstraction |
---|
0:22:11 | it doesn't have arguments it doesn't have variable names you can think of it |
---|
0:22:15 | if you're familiar with |
---|
0:22:16 | template it's a template of the |
---|
0:22:19 | logical form of the meaning representation |
---|
0:22:22 | so first we will have a natural language |
---|
0:22:25 | to decode into this meeting sketch and then we will use this meeting this case |
---|
0:22:29 | to fill in the details |
---|
0:22:31 | know why does this make sense |
---|
0:22:34 | well there is several arguments first of all you disentangle higher level information from low-level |
---|
0:22:41 | information |
---|
0:22:43 | so there are some things that are the same |
---|
0:22:45 | across logical forms |
---|
0:22:47 | but you want to capture |
---|
0:22:49 | so you're meaning representation in this case at the sketch level is gonna to be |
---|
0:22:53 | more compact so in if for example a need to switch is the dataset we |
---|
0:22:57 | work with |
---|
0:22:58 | these catch use nine point two tokens as opposed to twenty one twenty one tokens |
---|
0:23:04 | is a very long logical form |
---|
0:23:06 | another thing that is important is that the model level because then you explicitly share |
---|
0:23:12 | the core structure |
---|
0:23:14 | that is the same for multiple examples so you use your data more efficiently |
---|
0:23:19 | and you learn to represent commonalities across examples which the other model did not know |
---|
0:23:24 | so you do provide global context |
---|
0:23:27 | to do the find meaning decoding no i have a graph coming up in a |
---|
0:23:31 | minute |
---|
0:23:32 | now |
---|
0:23:32 | the formulation of the problem is the same as before we again map natural language |
---|
0:23:37 | input to the logical form representation |
---|
0:23:40 | except now that we have two stages in this model and so we again the |
---|
0:23:45 | model the probability of the output given the input |
---|
0:23:48 | but now |
---|
0:23:49 | this is factorized into two terms |
---|
0:23:51 | the probability of |
---|
0:23:53 | the meetings kitsch given the input |
---|
0:23:56 | and the probability of the output |
---|
0:23:59 | given the input in the meetings catch |
---|
0:24:02 | so the meetings get |
---|
0:24:04 | i is shared between those two terms |
---|
0:24:07 | and i'm sure you a graph here so the |
---|
0:24:11 | green nodes are to be encoder units the orange or brown i don't know how |
---|
0:24:16 | comes out here this colour |
---|
0:24:18 | are the decoder human it's so in the beginning we have a natural and which |
---|
0:24:23 | we will encoded with your favourite encoder |
---|
0:24:25 | here are you see a bidirectional lstm |
---|
0:24:29 | then we will use this encoding |
---|
0:24:31 | to decode two s catch |
---|
0:24:33 | which is this abstraction of the high-level meaning representation |
---|
0:24:38 | once would you call it this catch we will |
---|
0:24:41 | and coded again |
---|
0:24:43 | we do not or bidirectional lstm into some representation |
---|
0:24:47 | that we will fit in to our final decoder that fills in all the details |
---|
0:24:52 | we're missing |
---|
0:24:54 | and you can see at their the red bits are the information that i'm filling |
---|
0:24:59 | in |
---|
0:25:00 | you will see a list of the this decoder |
---|
0:25:03 | this the coder takes into account |
---|
0:25:05 | not only the encoding |
---|
0:25:07 | all of the sketch |
---|
0:25:08 | but also the input |
---|
0:25:10 | remember in the probably probability terms it is |
---|
0:25:13 | be probability of x given x and a |
---|
0:25:16 | the probably y given x n a y and use our output x is their |
---|
0:25:21 | input and the a is the encoding of my sketch |
---|
0:25:26 | okay this is what why we say |
---|
0:25:29 | the sketch provides context for the decoding |
---|
0:25:33 | okay |
---|
0:25:34 | no training and inference works the same way to gain maximizing the log-likelihood of the |
---|
0:25:38 | generated meaning representations given the natural language |
---|
0:25:42 | and a test set i'm again we have to predict both the sketch and the |
---|
0:25:49 | more detailed logical form |
---|
0:25:51 | and we do this via greedy search |
---|
0:25:55 | okay so a question that they have not addressed is where do these templates come |
---|
0:26:00 | from |
---|
0:26:01 | where do we find the meaning sketches |
---|
0:26:04 | and if the answer that i would like to give you use our work we |
---|
0:26:09 | would just an errand |
---|
0:26:11 | now |
---|
0:26:12 | that is fine we can their them |
---|
0:26:14 | but a first will try something very simple no show you examples because of the |
---|
0:26:19 | simple thing doesn't work then learning will never work |
---|
0:26:22 | so |
---|
0:26:24 | actually example so the different meanings sketches |
---|
0:26:27 | for different kinds of a meaning representations |
---|
0:26:31 | so here we have logical form lambda calculus |
---|
0:26:34 | and it's very trivial |
---|
0:26:36 | to understand how would you would get the meeting sketches you would just |
---|
0:26:40 | get rid of arable information |
---|
0:26:43 | you know lambda counts and arg max this gets you would anything that is specific |
---|
0:26:48 | to that would remove we would remove any notions of arguments |
---|
0:26:53 | and |
---|
0:26:54 | a any sort of |
---|
0:26:56 | information that may be specific to the logical form so you see here |
---|
0:27:00 | this is the details for and this |
---|
0:27:03 | whole the expression becomes lambda to a fight there is known numeric information so these |
---|
0:27:09 | are variables |
---|
0:27:10 | this is for logical form |
---|
0:27:13 | if you have source code this is python a thinks are very easy actually would |
---|
0:27:17 | just substitute tokens with token types |
---|
0:27:22 | so here is the python called and |
---|
0:27:25 | s will become a name for will become a number |
---|
0:27:30 | named here is the name of the function and then this is a string |
---|
0:27:34 | of course |
---|
0:27:35 | we want to keep the structure of the expression as it is so we will |
---|
0:27:39 | not substitute delimiters operators or built-in keywords |
---|
0:27:43 | because that would change actually what the problem program is meaning to do |
---|
0:27:49 | if we have sql query is |
---|
0:27:52 | it's again simple to get this meeting sketches so this is above you can see |
---|
0:27:56 | this is the s two l syntax |
---|
0:27:58 | so we have a select clause and we have two |
---|
0:28:02 | first select the columns so industrial we have tables and they have columns |
---|
0:28:07 | here we have to select the call them and then |
---|
0:28:10 | we have the where clause that has conditions on it so in the example we're |
---|
0:28:14 | selecting a record company |
---|
0:28:16 | and here we are saying |
---|
0:28:19 | the where clause put some conditions so the hearer reporting in this record company has |
---|
0:28:24 | to be after nineteen ninety six of the contact conductor has to be |
---|
0:28:28 | michael thus need cohesive russian composer now if you want to create a meeting scheduled |
---|
0:28:33 | very simple |
---|
0:28:34 | well we'll just have the syntax of the were close where |
---|
0:28:37 | larger and |
---|
0:28:39 | and equal |
---|
0:28:40 | so we'll just have the were close in the conditions on it |
---|
0:28:43 | these are not filled out yet so we could apply |
---|
0:28:47 | too many different columns in an sql table |
---|
0:28:53 | okay let me show you some results so i'm gonna compare |
---|
0:28:56 | the simple model that have shown you the simple is supposed to sequence model |
---|
0:29:02 | with this more sophisticated model but that's constrained decoding |
---|
0:29:06 | and this is comparing two state-of-the-art of course |
---|
0:29:10 | the state-of-the-art is a moving target in the sense that now all these numbers with |
---|
0:29:15 | barrett |
---|
0:29:16 | a people are familiar with paired rate and so these numbers with paired |
---|
0:29:20 | go up by some percent so whatever show you |
---|
0:29:23 | you can add in your head |
---|
0:29:25 | two or three percent |
---|
0:29:28 | it so this is that it is models do not use but so this is |
---|
0:29:31 | the previous to the state-of-the-art this is geo query and the eighties this some gonna |
---|
0:29:35 | trigger results for and |
---|
0:29:37 | different datasets |
---|
0:29:38 | and this important to see that it works in different datasets with very different meaning |
---|
0:29:43 | representation so somehow of logical form do you play an eighties have logical form |
---|
0:29:48 | and then we have an example with python code and with sql so here is |
---|
0:29:53 | the system |
---|
0:29:55 | uses syntactic the coding |
---|
0:29:58 | so it uses |
---|
0:29:59 | i |
---|
0:30:00 | quite sophisticated grammatical operations that then get compose two with neural networks |
---|
0:30:05 | to perform semantic parsing |
---|
0:30:07 | this is the simple sequences you ones model or showed you before |
---|
0:30:10 | and this is coarse to fine decoding so |
---|
0:30:13 | you do get a three percent increase |
---|
0:30:16 | with regards to eight is a this is very interesting it has fan every very |
---|
0:30:20 | long utterances in very long logical forms |
---|
0:30:24 | again at six you do almost as well |
---|
0:30:27 | remember what is said about you know |
---|
0:30:29 | syntactic the coding does not give so much of an advantage |
---|
0:30:33 | and then again |
---|
0:30:35 | we get a bows with coarse to fine |
---|
0:30:37 | and a similar pattern can be observed when you use |
---|
0:30:40 | sql |
---|
0:30:43 | for you jump from seventy four to seventy nine |
---|
0:30:45 | and the john goal use these |
---|
0:30:50 | pi phone so you execute python code and again from seventy to seventy four |
---|
0:30:57 | okay |
---|
0:30:59 | now this is on the side no just mention it a very briefly |
---|
0:31:04 | all the all the tasks and i'm talking about here |
---|
0:31:08 | are dealing with the fact that you have |
---|
0:31:10 | your input and you're output pre-specified some human goal was and writes down to logical |
---|
0:31:17 | form |
---|
0:31:17 | for the utterance |
---|
0:31:19 | and the community has realise that this is not scalable |
---|
0:31:22 | so what we're also trying to do is to work with weak supervision where you |
---|
0:31:27 | have the question |
---|
0:31:28 | and then you have the answer |
---|
0:31:30 | no logical form |
---|
0:31:32 | the logical form is latent |
---|
0:31:33 | and you have to |
---|
0:31:34 | come up with it the model has to come up with it so now this |
---|
0:31:37 | is good because it's more realistic |
---|
0:31:39 | but it opens another huge kind of warms which is you have to come up |
---|
0:31:43 | with a logical forms you have to have a way of generating them |
---|
0:31:47 | and then you have a and their this variance because you don't know which ones |
---|
0:31:50 | are correct and which ones are and |
---|
0:31:52 | so here we show you table you're given the table |
---|
0:31:56 | you're given how many silver medals in the nation of turkey when |
---|
0:32:00 | and the answer which is zero and that you have to hallucinate all the rest |
---|
0:32:04 | so this idea of actually using the meaning skate used |
---|
0:32:08 | is very useful in this scenario |
---|
0:32:10 | because it sort of restricts the search space |
---|
0:32:14 | so rather than actually a looking for all the types of logical forms you can |
---|
0:32:20 | have you sort of first generate a map struck |
---|
0:32:24 | program or and meaning sketch |
---|
0:32:26 | and then |
---|
0:32:27 | once you have that |
---|
0:32:29 | you can feel in pdtb so this idea of obstruction |
---|
0:32:32 | is helpful that would say |
---|
0:32:33 | in this scenario even more |
---|
0:32:37 | okay |
---|
0:32:37 | now |
---|
0:32:38 | let's go back to the third challenge which has to do with linguistic coverage |
---|
0:32:44 | and this is the problem |
---|
0:32:46 | that will always be with this it will be whatever used all of the human |
---|
0:32:50 | is unpredictable |
---|
0:32:51 | i think that you know what was it things that you're model does not anticipate |
---|
0:32:55 | and so we have to have a way of dealing with it |
---|
0:33:00 | okay so |
---|
0:33:03 | this is not then you at a |
---|
0:33:05 | whatever has done question answering has come up with this problem |
---|
0:33:09 | or of g how do i increase the coverage of my system |
---|
0:33:14 | so what people have done and this is actually unbounded thing to do you have |
---|
0:33:18 | a question there and you paraphrase it to in ir for example people to query |
---|
0:33:24 | expansion it's the analogous idea what i have a question i will have some paraphrases |
---|
0:33:28 | that will paraphrase it and then |
---|
0:33:31 | you know what i will submit the paraphrases and i will get some answers and |
---|
0:33:34 | the this is the problem solved |
---|
0:33:36 | except that it is and if any of you have worked with paraphrases you see |
---|
0:33:40 | but you know |
---|
0:33:42 | the paraphrases can be really bad |
---|
0:33:44 | and so you get a couple answers so now you have the problem and then |
---|
0:33:49 | you've created a problem and the reason why this happens is because the |
---|
0:33:55 | paraphrases are generated |
---|
0:33:58 | independently |
---|
0:33:59 | all your task of the qa module but you have so you have accurate module |
---|
0:34:04 | you paraphrasing the questions and then you get answers and that not point do you |
---|
0:34:08 | have v |
---|
0:34:09 | and sir communicate with the paraphrase |
---|
0:34:12 | to get something that you know |
---|
0:34:14 | is appropriate for the task or for the qa model |
---|
0:34:18 | so what i'm gonna show you now is how |
---|
0:34:20 | we train these paraphrase model jointly |
---|
0:34:24 | with a qa model for and then turn task and our task is again semantic |
---|
0:34:28 | parsing except that this time because this is a more realistic tasks we're gonna be |
---|
0:34:33 | asking a knowledge base like freebase or was knowledge graph |
---|
0:34:37 | and of course there is a question that i will address in the bit where |
---|
0:34:41 | do the paraphrases come from |
---|
0:34:43 | who gives the most who what where are they |
---|
0:34:48 | okay so this is don think this slide of but it's actually really simple and |
---|
0:34:52 | i'm gonna take it through this so this is how we see the |
---|
0:34:58 | modeling framework as |
---|
0:35:00 | we have a question who created microsoft |
---|
0:35:03 | and we have some paraphrases |
---|
0:35:06 | bettered even with this and i will tell you mean the minute whole gives the |
---|
0:35:09 | paraphrases assume for a moment we have these paraphrases |
---|
0:35:13 | now what we will do is we will first take all these paraphrases here |
---|
0:35:19 | and score them |
---|
0:35:22 | okay |
---|
0:35:22 | so we will then called we will get question vectors we will have a model |
---|
0:35:27 | that gives the score how what is this paraphrase for question |
---|
0:35:31 | how would is who founded microsoft as a paraphrase for who created microsoft |
---|
0:35:36 | now once we normalize this course |
---|
0:35:39 | then we have our question answering module so we have two modules one is the |
---|
0:35:43 | paraphrasing module in one the question answering module and their trained jointly |
---|
0:35:47 | so once i have my scores for my paraphrases these are gonna may be used |
---|
0:35:52 | to weight the answers given the question |
---|
0:35:56 | so this is gonna tell your model well look |
---|
0:35:59 | this answer is quite good given your paraphrase or this answer is not so good |
---|
0:36:05 | giving your paraphrases do you see now that you kind of latter which paraphrases are |
---|
0:36:10 | important for your task |
---|
0:36:12 | for your question answering model |
---|
0:36:14 | and your answer jointly |
---|
0:36:18 | okay |
---|
0:36:20 | so |
---|
0:36:20 | a bit more formally we have |
---|
0:36:23 | them the modeling problem is we have the an answer |
---|
0:36:26 | and we want to model the probability of the answer given the question |
---|
0:36:30 | and this is factorized into two models one is the question answering model |
---|
0:36:35 | and the other one is the paraphrasing model |
---|
0:36:37 | now for the question answering model you can use whatever you like |
---|
0:36:41 | your latest neural qa model you can plug in there and |
---|
0:36:46 | this is what the paraphrase model |
---|
0:36:48 | if whatever you have as long as you can actually |
---|
0:36:52 | and called them somehow |
---|
0:36:54 | it doesn't really matter |
---|
0:36:56 | now i will not talk a lot about the question answering model we used an |
---|
0:37:01 | in-house model that is based on graphs that the |
---|
0:37:05 | is quite simple be it just as graph matching on wheels knowledge graph |
---|
0:37:10 | and i'm gonna tell you a bit more about the paraphrasing model |
---|
0:37:15 | okay so this is how we score of the paraphrases |
---|
0:37:20 | we have a question |
---|
0:37:22 | we generate paraphrases for this question |
---|
0:37:25 | and then for each of these paraphrases so we will just |
---|
0:37:30 | score them how good r-d given |
---|
0:37:33 | my question |
---|
0:37:34 | and this is you know a dot product essentially |
---|
0:37:37 | is a good paraphrase or not |
---|
0:37:39 | but it's trained and they're and |
---|
0:37:42 | with the answer in mind |
---|
0:37:44 | so |
---|
0:37:46 | is this paraphrases going to help me to find the right answer |
---|
0:37:50 | and now |
---|
0:37:51 | as far as the paraphrases are concerned again this is applied can play module you |
---|
0:37:55 | can use your favourite so if you are in limited domain you can write them |
---|
0:38:00 | yourself |
---|
0:38:02 | manually |
---|
0:38:03 | you could use wordnet |
---|
0:38:05 | or pp db which is this database which has a lot of paraphrases |
---|
0:38:10 | but we do something else a |
---|
0:38:12 | using neural machine translation |
---|
0:38:17 | okay so this like to put it i know everybody knows it but it's my |
---|
0:38:20 | favourite slide of all times |
---|
0:38:22 | because |
---|
0:38:23 | but we address tried to do this slide again |
---|
0:38:26 | it's not as good as the original |
---|
0:38:29 | like you do it in particular if you go to machine translation talks about that |
---|
0:38:32 | all this is a machine translation |
---|
0:38:34 | or ever come to capture so beautifully |
---|
0:38:37 | the fact that bob sorry the fact that you have this language here |
---|
0:38:41 | you have this english language and that you have attention weights so beautiful |
---|
0:38:46 | and then you take it is sensational weights and you wait them |
---|
0:38:49 | with the decoder and hey presto you get the french language |
---|
0:38:53 | so |
---|
0:38:54 | this is your usual machine translation your vanilla machine translation engine |
---|
0:38:59 | it's again and encoder-decoder model with attention |
---|
0:39:02 | and we assume we have access to this engine |
---|
0:39:06 | now |
---|
0:39:07 | you may wonder how i'm not gonna get paraphrases out of this |
---|
0:39:12 | this again an old idea which goes back a back actually the martin k somatic |
---|
0:39:17 | a i think can be eighties |
---|
0:39:19 | notice this thing so what we wanted to ease |
---|
0:39:23 | in the case of english goal from english to english |
---|
0:39:27 | so we want to be able to sort of paraphrase and english expression to another |
---|
0:39:31 | english expression but in machine translation i don't have any direct path |
---|
0:39:35 | from english to english |
---|
0:39:37 | what i don't have is a path from english to german |
---|
0:39:40 | and german to english |
---|
0:39:42 | so |
---|
0:39:43 | the theory goal is if i have to english phrase is |
---|
0:39:47 | like here under control |
---|
0:39:49 | and |
---|
0:39:50 | in check |
---|
0:39:51 | if they are aligned or if they correspond to the same phrase in another language |
---|
0:39:57 | there are likely to be a paraphrase |
---|
0:40:00 | now i'm gonna use these alignments this is for you'd understand the concept but you |
---|
0:40:04 | can see that i have english i translate english to german |
---|
0:40:09 | then german gets back translated to english |
---|
0:40:13 | i have my paraphrase |
---|
0:40:19 | more specifically |
---|
0:40:20 | i have my input which is in one language |
---|
0:40:24 | okay i encoded i decode it into some translations in the foreign language g stance |
---|
0:40:29 | here for german |
---|
0:40:31 | i encode my german and then i decoded back to english |
---|
0:40:36 | there is |
---|
0:40:37 | two or three things you should not just about this thing |
---|
0:40:41 | first of all |
---|
0:40:42 | these things in the middle the translation so called people it's |
---|
0:40:46 | and you see that we have k people it's |
---|
0:40:49 | i don't have one translation but i have multiple translations distance out to be really |
---|
0:40:53 | important because a single translation may be very wrong and then i'm completely screwed i |
---|
0:40:58 | have very bad paraphrases |
---|
0:41:00 | so i have to have multiple people it's i don't only that i could also |
---|
0:41:05 | have multiple people it's in multiple languages |
---|
0:41:08 | which then i take into account while i'm the coding |
---|
0:41:12 | now this is very different from what do you may think of as paraphrases because |
---|
0:41:17 | the paraphrases there never |
---|
0:41:20 | explicitly stored anywhere they're all model internal |
---|
0:41:23 | so what this thing variance i give it english you just paraphrases english into english |
---|
0:41:30 | but i don't have an explicit database |
---|
0:41:32 | with paraphrases |
---|
0:41:34 | and of course they are all vectors and they're all scored but |
---|
0:41:37 | i you know i cannot ball in say |
---|
0:41:39 | where is that paraphrase i cannot give the model the paraphrase and it generates another |
---|
0:41:44 | one which is very nice because you do generation for free in the past if |
---|
0:41:49 | you had rules you have to see how you actually use them to generate something |
---|
0:41:53 | that is meaningful and so on |
---|
0:41:55 | okay |
---|
0:41:55 | let me show again example |
---|
0:41:57 | this is a paraphrasing the question what is the zip code of the largest car |
---|
0:42:02 | manufacturer if we put people through french |
---|
0:42:06 | so french tells us what is the zip code of the largest vehicle manufacturer or |
---|
0:42:11 | what is the zip code of the largest car producer |
---|
0:42:14 | if we people through german |
---|
0:42:16 | what's the postal code of the biggest automobile manufacturer |
---|
0:42:20 | what is the postcode of the biggest car manufacturer |
---|
0:42:24 | and if we people through check |
---|
0:42:25 | what is the largest car manufacturers postal code |
---|
0:42:29 | or zip code of the largest car manufacturer |
---|
0:42:32 | can i see a show of hands which are people to language do you think |
---|
0:42:36 | gives you the best |
---|
0:42:37 | paraphrases |
---|
0:42:39 | i mean it's a sample of two |
---|
0:42:43 | check |
---|
0:42:44 | very good |
---|
0:42:44 | check |
---|
0:42:45 | proved out to be the best pay but |
---|
0:42:47 | for the by german |
---|
0:42:49 | french was not so good |
---|
0:42:51 | and again here there's the question how many people it's to use what languages do |
---|
0:42:56 | you choose i mean these are all experimental variables that you can manipulate okay |
---|
0:43:00 | then we show you some results |
---|
0:43:03 | the grey you don't need to understand |
---|
0:43:05 | these are all be used baselines that somebody can use |
---|
0:43:10 | to show that the model is doing something over and above the obvious things |
---|
0:43:16 | this is |
---|
0:43:17 | c grad the this graph here is using nothing so you go from forty nine |
---|
0:43:23 | to fifty one |
---|
0:43:25 | this it from sixteen to twenty |
---|
0:43:27 | these are web questions a graph questions is our datasets that people have developed this |
---|
0:43:33 | graph questions is very difficult it has like |
---|
0:43:36 | very complicated questions that have a multihop reasoning so who's the bombers daughters friend dog |
---|
0:43:43 | called a very difficult that's why the performance is really bad |
---|
0:43:48 | what you should a c d's that |
---|
0:43:52 | here pink is apparent that |
---|
0:43:54 | is so in all cases |
---|
0:43:56 | using the hold on a pad paranoid is pink |
---|
0:44:00 | a here is second best system |
---|
0:44:03 | and |
---|
0:44:05 | read here is best system and you can see that it is very well in |
---|
0:44:08 | the difficult dataset |
---|
0:44:09 | in the other dataset there is another system that is better |
---|
0:44:12 | but they use a lot of external knowledge which we don't have a better exploits |
---|
0:44:16 | the graph itself which is another avenue for future work |
---|
0:44:21 | okay |
---|
0:44:22 | now this my last slide and then our take questions |
---|
0:44:27 | what have we learned is so there is a couple of things that are interesting |
---|
0:44:31 | first of all he's that |
---|
0:44:34 | if you use encoder-decoder models |
---|
0:44:36 | are |
---|
0:44:37 | good enough |
---|
0:44:38 | for mapping natural language to meaning representations with minimal engineering effort and the cannot emphasise |
---|
0:44:46 | that |
---|
0:44:48 | more |
---|
0:44:49 | before |
---|
0:44:50 | these paradigm shift |
---|
0:44:53 | what we used to do is we would spend a huge is coming up with |
---|
0:44:56 | features that we would have to re engineer |
---|
0:44:58 | for every single domain so if i go from lambda calculus to sql and then |
---|
0:45:02 | to python code are would have to do the whole process from scratch |
---|
0:45:05 | here you have one model |
---|
0:45:08 | with some experimental variables that you know you can keep fixed or change and it |
---|
0:45:13 | works very well of across domains |
---|
0:45:17 | a constrained decoding improves performance and only for this setting the type show to you |
---|
0:45:22 | but for more weakly supervised settings |
---|
0:45:25 | and i'll people are using this constraint encoding even |
---|
0:45:29 | not in semantic parsing i so you know in generation for example |
---|
0:45:34 | the paraphrases n and hands the robustness of the model and in general it would |
---|
0:45:38 | say their useful |
---|
0:45:40 | if you have other tasks leave for dialogue for example |
---|
0:45:43 | you could give robustness to a dialogue model to generate answer of a chat board |
---|
0:45:49 | and the models could transfer to other tasks or architectures i've shown for the purposes |
---|
0:45:54 | of this talk |
---|
0:45:56 | you know so as not to overwhelm people |
---|
0:45:59 | simple architectures but you know you can put neural networks left right and centres you |
---|
0:46:03 | feel like |
---|
0:46:04 | now in the future i think there is a couple of a venues from future |
---|
0:46:08 | work worth pursuing one is of course learning the sketch is so big could be |
---|
0:46:12 | a latent variable in your model trying to you know generalise and that would mean |
---|
0:46:18 | that you don't need to do any preprocessing you don't need to give the algorithm |
---|
0:46:21 | the sketches |
---|
0:46:23 | how do you do with multiple languages that have a semantic parser in english |
---|
0:46:27 | how do i try switching chinese big problem in particular industry they have the come |
---|
0:46:33 | up this problem a lot and their answers we higher annotators |
---|
0:46:39 | how do you |
---|
0:46:42 | train this model seaview have no data at all so just a database |
---|
0:46:47 | and of course there is something but i would be in of interest to you |
---|
0:46:51 | is how do i actually |
---|
0:46:53 | do coreference how do i |
---|
0:46:56 | model a sequence of turns |
---|
0:46:59 | are suppose to a single turn |
---|
0:47:01 | and without further ado i have one last slide and it's a very depressing slide |
---|
0:47:07 | so |
---|
0:47:08 | when they get this talk like a couple months ago i used to have this |
---|
0:47:11 | where it was to resume |
---|
0:47:13 | and a this is on twitter and she's to the david the jockeys to resume |
---|
0:47:18 | will ask alexi to negotiate for her |
---|
0:47:21 | and it will be fine i try to find another one with boris johnson |
---|
0:47:25 | and failed i don't think it does technology |
---|
0:47:28 | so and he doesn't of negotiating either |
---|
0:47:30 | so she would have been she would at least negotiate and at this point out |
---|
0:47:35 | just a questions thank you very much |
---|
0:47:38 | really |
---|
0:47:43 | and my store |
---|
0:47:45 | the time for question |
---|
0:47:48 | thank you this is result from i j p morgan so my question is do |
---|
0:47:53 | we really need to do |
---|
0:47:56 | to extract the logical forms |
---|
0:47:58 | given the fact that |
---|
0:48:00 | probably humans don't do we really except in really complicated |
---|
0:48:05 | case |
---|
0:48:06 | about my daughter that |
---|
0:48:10 | do we really need to do that for a well in that world machine translation |
---|
0:48:15 | we don't really extract all these things |
---|
0:48:18 | but we do translate i even to |
---|
0:48:22 | like personal data stuff |
---|
0:48:24 | that's a that's a good question so the answer is the |
---|
0:48:27 | yes no |
---|
0:48:28 | so if you look at a lexus l or google these people |
---|
0:48:33 | they have very complicated systems where they have |
---|
0:48:37 | one module that does what you're say i don't translate to logical form i just |
---|
0:48:41 | you know like to query matching and then extract the answer |
---|
0:48:44 | but for some of the highly compositional way switch to get with to execute the |
---|
0:48:49 | mean databases |
---|
0:48:51 | and they all have internal representations of what they're which means |
---|
0:48:55 | also |
---|
0:48:56 | if you are developer and for example |
---|
0:48:59 | whenever you have a database |
---|
0:49:02 | and that has think so i seven genes or i still fruit and have a |
---|
0:49:06 | database and the deal with |
---|
0:49:07 | customers and i have to have a spoken interface there you would have to extracted |
---|
0:49:12 | somehow now for the phone when you say cv a set my alarm clock i |
---|
0:49:17 | would agree with you there you just need to recognize intents |
---|
0:49:20 | and do the attribute slot filling |
---|
0:49:22 | and then you're done |
---|
0:49:24 | but whenever you know how |
---|
0:49:27 | more like to beak infrastructure in the |
---|
0:49:30 | output a of the answer space and then you do this |
---|
0:49:39 | thanks for a very nice to |
---|
0:49:42 | had a question on the on the paraphrase |
---|
0:49:47 | the scoring and it seem to me something wasn't quite right if i understood it |
---|
0:49:51 | well but what's more the you have an equation with the summation of thing that's |
---|
0:49:57 | what so intuitively |
---|
0:50:01 | to make the right thing is to you look for the closest paraphrase that actually |
---|
0:50:06 | has an answer that you can a good quality actually can find it so you're |
---|
0:50:09 | trying to optimize that's two things by finding something that means the same that we're |
---|
0:50:14 | i can find an answer if i can't find a matter of the original question |
---|
0:50:17 | but when you some that the problem as paraphrases that have been an equal |
---|
0:50:22 | distribution out of some phrases have many paraphrases are many paraphrases in a particular direction |
---|
0:50:27 | but maybe not so many in the others just depending on how many synonyms you |
---|
0:50:31 | haven't so trying to add them up and weight them if you have a lot |
---|
0:50:35 | of paraphrases here for the wrong answer and one for something that's better you know |
---|
0:50:39 | it seems like the |
---|
0:50:40 | closeness should dominated if you have a very high quality after and it seems like |
---|
0:50:45 | your models trying to do something different that i'm wondering if that |
---|
0:50:48 | is causing problems or something that are not seen that no right so this is |
---|
0:50:52 | how morally strange at the case we have to make it robust |
---|
0:50:55 | and you can manipulate the n-best paraphrases |
---|
0:50:59 | access time you're absolutely right would just find the one the one max the one |
---|
0:51:03 | that is best |
---|
0:51:05 | so you are right it's and i did not explain well but you are absolutely |
---|
0:51:09 | right that you know you don't have |
---|
0:51:11 | you know you can be all over the place if you're just looking for the |
---|
0:51:14 | sum of but its time we just want to one |
---|
0:51:21 | a high thank you for the great war decision model for microsoft research so my |
---|
0:51:25 | question is for the coarse to fine decoding would you think of its potential in |
---|
0:51:30 | generating natural language outputs like dialogue like summarisation |
---|
0:51:35 | a what get come again ask the question again what would be o |
---|
0:51:40 | would you think of the potential of you close to find that's a good question |
---|
0:51:44 | that connection question so |
---|
0:51:46 | i think well i think it's very interesting now |
---|
0:51:51 | for a |
---|
0:51:52 | sentence generation so you mentioned summarisation i'll do one thing at a time so if |
---|
0:51:57 | you're just want to generate |
---|
0:51:59 | from some input a sentence |
---|
0:52:02 | you want to do surface realization people have already done this is a rash they |
---|
0:52:06 | have a very similar model where the first sort of |
---|
0:52:11 | produce a template which they learn in from the temple at the surface realize a |
---|
0:52:15 | sentence |
---|
0:52:16 | however summarization which is the more interesting case |
---|
0:52:20 | you would have to have a document template |
---|
0:52:24 | and |
---|
0:52:25 | it's not clear what this document template might look like in how you might learn |
---|
0:52:29 | it so you may |
---|
0:52:31 | for example i assume that the template it uses some sort of a tree or |
---|
0:52:36 | a graph |
---|
0:52:37 | with generalizations and then from there you just generate the summary |
---|
0:52:41 | and i believe it's like very |
---|
0:52:44 | we should do this but it will not be as trivial as |
---|
0:52:50 | what to do right now which is the encode the document in the vector and |
---|
0:52:53 | that have attention and then a bit of coffee and then here's your summary |
---|
0:52:57 | so the question their want the template is |
---|
0:53:01 | nobody has an answer |
---|
0:53:13 | i was wondering if you could elaborate on your very late this work on generating |
---|
0:53:19 | the abstract meaning representation because of course my reaction |
---|
0:53:23 | what you are saying in the first five was |
---|
0:53:26 | well |
---|
0:53:27 | it's all good then where and when you have you know |
---|
0:53:29 | a |
---|
0:53:30 | corpus where you at the mapping between the query and did not and the and |
---|
0:53:35 | logical form what do you do if you don't have which is the majority of |
---|
0:53:40 | cases |
---|
0:53:41 | see okay so this is a tough problem a so how do you do inference |
---|
0:53:47 | with weak supervision a |
---|
0:53:49 | and there is two things their that we found out that have |
---|
0:53:56 | because the space you have dinner somewhere doing a but merely a it's |
---|
0:53:59 | of |
---|
0:54:01 | potential programs that execute and we haven't always signal |
---|
0:54:04 | other than the right answer |
---|
0:54:06 | so because the only signal is the right answer there's two things that can happen |
---|
0:54:10 | one is ambiguity |
---|
0:54:12 | so |
---|
0:54:13 | it's entities it may be ambiguous we can be can be another turkey or both |
---|
0:54:17 | took the country interactively |
---|
0:54:20 | government |
---|
0:54:21 | and so that then you're screwed and you will get things and the other one |
---|
0:54:24 | is spurious this so you have things that execute to the right answer |
---|
0:54:29 | they don't have the right intent the right semantics |
---|
0:54:31 | and so what people do what do things we do the templates here |
---|
0:54:36 | and then we have another step which actually again tries to do |
---|
0:54:41 | some structural matching and tries to say okay so i have this abstract program |
---|
0:54:46 | this will cut down the search space |
---|
0:54:49 | and then |
---|
0:54:49 | you also have to do some alignment and put some constraints of the sensei for |
---|
0:54:55 | example |
---|
0:54:55 | i cannot have |
---|
0:54:57 | column silver repeated twice |
---|
0:55:00 | because this is no well formed |
---|
0:55:02 | but |
---|
0:55:02 | the accuracy of these i didn't put it is like forty four percent |
---|
0:55:06 | knots you know |
---|
0:55:09 | note anywhere i mean the global in amazon would laugh |
---|
0:55:12 | there is a more work to be |
---|
0:55:18 | so thank you for the talk so i have a question about your calls lane |
---|
0:55:21 | deporting so you go your course plaintiff or being you use a meaning representation but |
---|
0:55:27 | you're the whole being final deporting these of these two based on the cross marks |
---|
0:55:33 | it'll be both old ones but it to be politically |
---|
0:55:37 | o and it means that there is no guarantee that the meaning representation we use |
---|
0:55:42 | the on wavelet the that intonation without but in some cases so we need to |
---|
0:55:48 | consider such things because if we consider of the semantics some arguments over the eight |
---|
0:55:54 | it was something |
---|
0:55:56 | of the d scene which should be included in that the warnings |
---|
0:56:00 | that is a very good i'm glad they are you guys were paying attention so |
---|
0:56:04 | yes we don't have we don't have this and |
---|
0:56:08 | we saved constraint a coding but what you really do is you constraining the encoding |
---|
0:56:12 | hoping of their your decoder will be more constrained by the encoding |
---|
0:56:17 | you could include we didn't know analysis where we saw two things one is how |
---|
0:56:22 | good are the temple so if you're templates are |
---|
0:56:25 | not great so what you're saying |
---|
0:56:28 | will be more problematic |
---|
0:56:31 | and we didn't analysis let me see if i have a slide that shows that |
---|
0:56:34 | actually the templates are working quite well |
---|
0:56:37 | i might have a slight i don't remember |
---|
0:56:41 | yes |
---|
0:56:42 | so this slide shows you see |
---|
0:56:46 | the sequence to sequence model the first row use the sequence to sequence model |
---|
0:56:50 | and without any sketches |
---|
0:56:53 | and the second is a coarse to fine where you have to predict the sketch |
---|
0:56:56 | and you see that the coarse to fine predicts a sketch is much better |
---|
0:57:01 | then the one stage more than one but does sequence to sequence |
---|
0:57:04 | so this tells you that you |
---|
0:57:06 | are kind of winning but not exactly |
---|
0:57:09 | so it's i don't know what if what would happen if you includes these constraints |
---|
0:57:14 | might |
---|
0:57:14 | my answer would be this doesn't happen a lot it could be but it's the |
---|
0:57:18 | logical forms we tried if you have vary along very complicated so we've and then |
---|
0:57:23 | you really huge sql where is then |
---|
0:57:25 | i would say that you're approach |
---|
0:57:27 | would be required |
---|
0:57:30 | okay no it's |
---|
0:57:33 | this could do |
---|
0:57:35 | so maybe ask one question okay it's that in the last time that's what you |
---|
0:57:40 | said that the model seventies this doesn't |
---|
0:57:43 | so you so what is i mean it double that all use related to the |
---|
0:57:47 | qa or once in this and one up but in a dialogue case we have |
---|
0:57:52 | a multiple times |
---|
0:57:54 | so what is the common problems more will be good |
---|
0:57:57 | yes so i i'll send you i have a nice of this so we did |
---|
0:58:01 | try to do |
---|
0:58:02 | this paper in submission multiple turns |
---|
0:58:06 | so where you say an example i want to buy this levi's jeans |
---|
0:58:14 | how much to the course to do you have the mean another side |
---|
0:58:18 | or other two why well what is the colour so you know you elaborate a |
---|
0:58:22 | new questions and there's patterns of you know these multiturn dialogue but you can do |
---|
0:58:28 | and |
---|
0:58:29 | you can do this but the one thing that we actually need to sort out |
---|
0:58:34 | before doing please |
---|
0:58:35 | is coreference |
---|
0:58:37 | and |
---|
0:58:37 | because right now this model some take a reference into account if you model coreference |
---|
0:58:42 | in the simple way of like a look at the past and they do modeled |
---|
0:58:45 | as a sequence it doesn't really work that well so i think definitely |
---|
0:58:49 | sequential question answering is the way the goal i have not seen any models that |
---|
0:58:54 | make me go like all this is great but |
---|
0:59:00 | yes it's a very problem and the very not sure but you know one step |
---|
0:59:04 | at the time |
---|
0:59:05 | so thank you much so that sense because they give him |
---|