but we have a session sure

okay thank you got real so that we want to the office keynote

so using this time i vector you do not use

the first keynote

the first keynote speaker is needed about the

the proposal

school of informatics university obeyed embark

but not be due to use a proper so natural language processing in the school

of informatics that's the university of edinburgh okay

how results will can see his own

one getting compute us to understand reasonably and generate natural language so zero talk about

that he's got how kind of research activities

there's a more information on the other proceedings a node but

she doesn't

okay right okay you can hear me i sat right

okay at

right like that what it like a was saying earlier this talk is gonna be

about learning

natural language interfaces with neural models

and so i'm gonna give you a bit of

and introduction as to what these natural language interfaces are

and then we're gonna see how we build a more problems are related to them

and you know what future lies ahead

okay so what he's a natural language interface it's the most intuitive thing one wants

to do to a computer

you just want to speak to it the computer in an ideal world understands and

executes what you wanted to do

and this

billy don't know it is like one of the first things that people

wanted to do with nlp so in the sixties

when we didn't have computers the computers didn't have memory

we didn't have neural networks none of this

the first systems that appeared out there

had to do

speaking to the computer

and

getting some response so green at are in nineteen fifty nine

presented this system called the conversation machine

and this was the system that was having conversations with a human can people guests

or know what about

the weather

well it's always the weather

first the weather and then everything else so that they said okay the what the

weather is a bit boring let's talk about baseball

and this work very primitive systems they just had models they had grammars you know

it was all manual but the intent was there we want to communicate with computers

well in a little bit more formally

what the task entails

is we have a natural language

and natural language has to be translated

by what you see the arrow thereby parser you can think of it as a

model or some black box the takes the natural language

and translates it

to something

the computer can understand

and this cannot be natural language had it must be

either sql or lambda calculus or some internal representation that the computer as

to give you an answer

okay

so as an example

it is again has been very popular within the semantic parsing field you query a

database

but you actually don't want to learn the syntax of the database and you don't

want to learn a square you just ask the question what are the copy those

of states bordering texas

you translate these into these logical form you see down there

okay you don't need to understand this is just something that the computer understands you

can see there is variables it's a form a language

and then you get the answer and i'm not gonna tell you the answer you

can see here texas is bordering a lot of states

now i start from asking data bases the questions another task and this is an

actual task that people have deployed in the real world

is instructing a role board to do something that you wanted to do

again this is a another example you can tell the robot if you have it

one of this little robots of make you coffee and you know go up and

down the corridor

you can say at the chair move forward three steps past the sofa

again the robot has to translate this into some internal representation but you understands

in order not to crash against the software

another example is actually doing question answering and

a there is a lot of systems like this using a big knowledge base like

freebase doesn't exist anymore

it's called knowledge graph

but this is issue much graph with millions of entities and connections between them

and the delayed congolese using

it's when you ask a question i mean to have many modules but one of

them is that

so

one of the questions you may want to ask is for the male actors in

the titanic and again this has to be translated

in some language

that freebase or your knowledge graph understands and you can see here this is expressed

in

lambda calculus but you have to translate it meant that some sql that the freebase

again

understand

so you see there is many applications in the real world of that

necessitate semantic parsing or some interface with a computer

and

here comes the man himself so bill gates

the costume mit publishes this technology review it's actually

very interesting i suggest that you take a look

and it's not very mit centric they talk about many things

and so this year they when an asked a bill gates they said to him

okay what do you think are the new technological breakthroughs theme pensions the of two

thousand nineteen

that will actually change the world

and so if you read the review

he starts by saying you know i want to be able to detect premature babies

fine

then he says you know with a couple free burger

so no meat

you make a burgers so you know because the world has so many animals

then he talks about drugs for cancer and the very last

he's

smooth talking ai assistance so semantic parsing comes last which means that you know it's

very important to bill gates

now

i don't know why i mean no why

but anyway he thinks it's really cool

and

of course is not only bill gates

every company you can fit coref has a smooth talking a is system or is

working on one

or using the back of their head or they have prototypes

and there's so many of them

i so i'll xa is your sponsor

there is cortana a context has at least what will

decided to be different of the call it will hold not some female name

then god

so there is get salience of these things

and can i see is shorthand how many people have one of them at home

very good

do you think do you think that work

how many how do you think they work

exactly so here i want this think the set alarms for me all the time

i mean it they work if you're in the kitchen they use a lexus set

for half an hour

or can do you have to monitor the kids homework

but

we want these things to

go beyond simple commands

now i'll just show here

and there is the reason why there's so much talk about these smooth talking i

assistance because

they could have in society a four

not able people for people who cannot see for people who are you know are

disabled

is actually pretty huge if it worked

now i'm gonna show here

if we deal

the video is the parity of i'm as an l x to

and you see it and then you understand immediately

why

there's no sound

hello

we check the sound as well before

should i do something

i raise of the volume is raised

to the max

amazon and everyone asking for help

technology isn't always easy to use for people others are you thinking

that's why i was on par with a darpa to present amazon so we only

smart speaker device designed specifically we used five greatest generation it's to rule out and

response in even remotely close to

and there is a forty i agree i

i

i

no

using hold true

one two three

this is like your thermostat i was set to ten

i one

i feel may have

you amazon co silver placed on the music they loved when they were a

it also has a quick skin feature to help them find things

right

feature for a long rambling stories i is the one i

so i

i really great of yours did i say yours today to them as a nickel

silver said to check or money order to do not go right i think that's

not exist

okay

it's saturday night live sketch

but you can see how we could help the elderly

or those in need it could to remind you for example to take two pills

or you know it could help you feel more comfortable in your own home

now

let's get a bit more formal a so what are we going to try to

do here we will try to learn this mapping from the natural language

to the

for remote

representation that the computer understands and the landing setting is we have

sentence logical form

and biological form i will use the terms logical form

meaning representations interchangeably because

the model so will be talking about do not care about what the

meaning representation is what the program if you like that the computer will execute days

so we assume we have sentence logical form pairs

and this is a setting the most of the work has focused on a previously

so it's like machine translation but except that you know the target is a an

executable and which now

this task

is harder than it seems for three reasons

first of all

their ease

it's severe mismatch between

d natural language and the logical form

so if you look at this example how much does it cost a flight to

boston

and look at the representation here

you will immediately notice that

they're not very similar this structures mismatch

and a only there is a mismatch between the logical form

and the natural language string

but also its syntactic representation so you couldn't even using text if you wanted to

get the matching

so here for example

flight

would align to fly

and two and boston to boston but then fair corresponds to these huge natural language

phrase how much does it cost and the system must

in federal of that

now

this is the first challenge of destruction mismatching

the second challenge has to do with the fact

that

the former language

the program if you like that we have to execute with a computer

has structure any has to be well-formed

you cannot just generate anything and hope that the computer will give you an answer

so this is a structure prediction problem and

if you look here for the male actors and the titanic there is

three mating representations

do people see which one is the right one

i mean they all look similar you have to squint that it

the first one

hasn't bound variables the second one has apparent this is that is missing

so the only right one is the last one

you cannot do it approximately

it's not like machine translation you're gonna get the gist of it you actually need

to get the right logical form of that executes the computer

now the fact challenge

and this is when you deploy google holman lx that the people who developed these

things immediately notice is that people will say

i mean

so

the same intent can be realized in very many different expressions who created microsoft

microsoft was created by

who founded microsoft qualities the founder of microsoft and so on and so forth

and all that maps to this little bit from the knowledge graph which is

well under bill gates are the founders of microsoft

and we have to be able the system has to be able you're semantic parser

to actually deal

we've all of these

different ways that we can express

are intent

okay

so in this talk we have three parts

well actually three parts so first i'm gonna show you how with neural models we

are dealing with this

structural mismatch

using something that is very familiar to all of you the encoder decoder paradigm

then i will talk about the

structure prediction problem and the fact that you're and not if you're like your formal

representation has to be well-formed using this coarse to fine decoding algorithm i will explain

it and then finally i will show you solution to the coverage problem

okay

now i should point out that there are many more challenges that and are there

and i'm not going to talk about but it's good to flag of them

where do we get the training data from so i told you that we have

to have

natural language logical form pairs to train the models for creates this and some of

it is like i actually quite complicated

what happens if you have out-of-domain queries if you have a parser trained on one

domain let's say the weather and then you want to use it for baseball

what happens if you don't have actually only

independent questions and answers but you have codependent there's coreference between the aquarius now we're

getting into the territory of dialogue

what's with speech we all pretend here that speech is to solve problem it is

and a lot of times alexi doesn't understand children doesn't in the some people with

accents like me

and then you talk to design wasn't people and you say but okay so do

you use the lattice and the good old the lattice we use on a lattice

of one because you know

if it it's to each slows us down the so there is many

technical and actual a challenge is that you know

have to all work together to make this work this thing work

okay

so let's talk about the structure mismatches

and so here the model is something you all must be a bit familiar with

and it's

one of the like

there is three or four things with neural models that get a recycled a over

and over again the encoderdecoder framework is one of them

so we have natural language as input

we encoded with using an lstm or whatever favourite model you have a you can

use a transform all the transformers don't work for this task

but well because the datasets are small

whatever the next thing is you encoded you get a vector out of it then

these encoded vector is serves as an input to

another lstm that actually decoded into

and logical form

and you will not use here i say you decoded into a sequence

or a tree

i will not talk about trees but i should flak that there is a lot

of work trying to decode

the natural language into this tree structure which makes sense since

the logical form has structures there's parentheses there is a there is a recursive

however in my experience these models

are weighted complicated to get to work

and

the advantage over the assuming that the logical form is a sequence is not that

great so for the rest of the talk we will assume that we have sequences

in and we get sequences out and we will pretend

but the logical form is a sequence even though it isn't

okay

a little bit formally the model will map

the natural language input

which is a sequence of tokens x to logical form

representation of its meaning a which is a sequence of tokens y

and we are modeling the probability of

the

input

given

the representation of the meaning

and the encoder

we'll just in called the language into the vector this vector then will be fed

into the decoder which will the generated conditioned on the encoding vector

and of course we have the

very important

attention here the attention mechanism that the original models did not use attention but then

everybody realised in particular in semantic parsing it's very important because it deals with this

structure mismatching problem

so i'm assuming people are familiar here it instead of actually generating the tokens in

the logical form one by one without considering the input the attention will look at

the input be able

wait

the output given the input and you will get things you will get some sort

of certainty that you know

if to generate mountain maps two mountain in my input

now

this is a very sort of simplistic view of semantic parsing

it assumes that not only natural language is a string

but what the logical form

does is also a string and

and this may be okay but maybe it isn't

there is a problem so i and i'll explain

so we train this model by maximizing the likelihood of the logical forms

given the natural language input to this is a standard

its time

we have to predict the locks the logical form that for any input utterance

and we have to find the one that actually maximizes this probability

of the output given the input

now trying to find this

argmax can be very computationally intensive and if you're google you can do beam search

if you're university of edinburgh you just too greedy search any works just fine

now

can people see the problem with this assumption of actually decoding into a string

remember the second problem but i said we have these we have to make sure

that the logical form is a well formed

and by assuming that everything is a sequence i have no way to check for

example that my parentheses are being matched

i don't all these because i've forgotten what i've generated

so i keep going to get mine at some point i

it he the end of sequence and that's it

so we actually want

should be able to enforce some constraints of well formedness on the output

so how are we gonna do that

we're gonna do this with this idea of coarse to fine decoding which i'm gonna

explain

so again we will have are not sure language input here all slides from dallas

before ten am

and i what we would do before is we will be called the entire

natural language string into this logical form representation but now what can insert a second

stage

where we first

the cold

to a meaning sketch

what the meeting's sketch does is it abstracts away details

from the very detailed logical form it's an abstraction

it doesn't have arguments it doesn't have variable names you can think of it

if you're familiar with

template it's a template of the

logical form of the meaning representation

so first we will have a natural language

to decode into this meeting sketch and then we will use this meeting this case

to fill in the details

know why does this make sense

well there is several arguments first of all you disentangle higher level information from low-level

information

so there are some things that are the same

across logical forms

but you want to capture

so you're meaning representation in this case at the sketch level is gonna to be

more compact so in if for example a need to switch is the dataset we

work with

these catch use nine point two tokens as opposed to twenty one twenty one tokens

is a very long logical form

another thing that is important is that the model level because then you explicitly share

the core structure

that is the same for multiple examples so you use your data more efficiently

and you learn to represent commonalities across examples which the other model did not know

so you do provide global context

to do the find meaning decoding no i have a graph coming up in a

minute

now

the formulation of the problem is the same as before we again map natural language

input to the logical form representation

except now that we have two stages in this model and so we again the

model the probability of the output given the input

but now

this is factorized into two terms

the probability of

the meetings kitsch given the input

and the probability of the output

given the input in the meetings catch

so the meetings get

i is shared between those two terms

and i'm sure you a graph here so the

green nodes are to be encoder units the orange or brown i don't know how

comes out here this colour

are the decoder human it's so in the beginning we have a natural and which

we will encoded with your favourite encoder

here are you see a bidirectional lstm

then we will use this encoding

to decode two s catch

which is this abstraction of the high-level meaning representation

once would you call it this catch we will

and coded again

we do not or bidirectional lstm into some representation

that we will fit in to our final decoder that fills in all the details

we're missing

and you can see at their the red bits are the information that i'm filling

in

you will see a list of the this decoder

this the coder takes into account

not only the encoding

all of the sketch

but also the input

remember in the probably probability terms it is

be probability of x given x and a

the probably y given x n a y and use our output x is their

input and the a is the encoding of my sketch

okay this is what why we say

the sketch provides context for the decoding

okay

no training and inference works the same way to gain maximizing the log-likelihood of the

generated meaning representations given the natural language

and a test set i'm again we have to predict both the sketch and the

more detailed logical form

and we do this via greedy search

okay so a question that they have not addressed is where do these templates come

from

where do we find the meaning sketches

and if the answer that i would like to give you use our work we

would just an errand

now

that is fine we can their them

but a first will try something very simple no show you examples because of the

simple thing doesn't work then learning will never work

so

actually example so the different meanings sketches

for different kinds of a meaning representations

so here we have logical form lambda calculus

and it's very trivial

to understand how would you would get the meeting sketches you would just

get rid of arable information

you know lambda counts and arg max this gets you would anything that is specific

to that would remove we would remove any notions of arguments

and

a any sort of

information that may be specific to the logical form so you see here

this is the details for and this

whole the expression becomes lambda to a fight there is known numeric information so these

are variables

this is for logical form

if you have source code this is python a thinks are very easy actually would

just substitute tokens with token types

so here is the python called and

s will become a name for will become a number

named here is the name of the function and then this is a string

of course

we want to keep the structure of the expression as it is so we will

not substitute delimiters operators or built-in keywords

because that would change actually what the problem program is meaning to do

if we have sql query is

it's again simple to get this meeting sketches so this is above you can see

this is the s two l syntax

so we have a select clause and we have two

first select the columns so industrial we have tables and they have columns

here we have to select the call them and then

we have the where clause that has conditions on it so in the example we're

selecting a record company

and here we are saying

the where clause put some conditions so the hearer reporting in this record company has

to be after nineteen ninety six of the contact conductor has to be

michael thus need cohesive russian composer now if you want to create a meeting scheduled

very simple

well we'll just have the syntax of the were close where

larger and

and equal

so we'll just have the were close in the conditions on it

these are not filled out yet so we could apply

too many different columns in an sql table

okay let me show you some results so i'm gonna compare

the simple model that have shown you the simple is supposed to sequence model

with this more sophisticated model but that's constrained decoding

and this is comparing two state-of-the-art of course

the state-of-the-art is a moving target in the sense that now all these numbers with

barrett

a people are familiar with paired rate and so these numbers with paired

go up by some percent so whatever show you

you can add in your head

two or three percent

it so this is that it is models do not use but so this is

the previous to the state-of-the-art this is geo query and the eighties this some gonna

trigger results for and

different datasets

and this important to see that it works in different datasets with very different meaning

representation so somehow of logical form do you play an eighties have logical form

and then we have an example with python code and with sql so here is

the system

uses syntactic the coding

so it uses

i

quite sophisticated grammatical operations that then get compose two with neural networks

to perform semantic parsing

this is the simple sequences you ones model or showed you before

and this is coarse to fine decoding so

you do get a three percent increase

with regards to eight is a this is very interesting it has fan every very

long utterances in very long logical forms

again at six you do almost as well

remember what is said about you know

syntactic the coding does not give so much of an advantage

and then again

we get a bows with coarse to fine

and a similar pattern can be observed when you use

sql

for you jump from seventy four to seventy nine

and the john goal use these

pi phone so you execute python code and again from seventy to seventy four

okay

now this is on the side no just mention it a very briefly

all the all the tasks and i'm talking about here

are dealing with the fact that you have

your input and you're output pre-specified some human goal was and writes down to logical

form

for the utterance

and the community has realise that this is not scalable

so what we're also trying to do is to work with weak supervision where you

have the question

and then you have the answer

no logical form

the logical form is latent

and you have to

come up with it the model has to come up with it so now this

is good because it's more realistic

but it opens another huge kind of warms which is you have to come up

with a logical forms you have to have a way of generating them

and then you have a and their this variance because you don't know which ones

are correct and which ones are and

so here we show you table you're given the table

you're given how many silver medals in the nation of turkey when

and the answer which is zero and that you have to hallucinate all the rest

so this idea of actually using the meaning skate used

is very useful in this scenario

because it sort of restricts the search space

so rather than actually a looking for all the types of logical forms you can

have you sort of first generate a map struck

program or and meaning sketch

and then

once you have that

you can feel in pdtb so this idea of obstruction

is helpful that would say

in this scenario even more

okay

now

let's go back to the third challenge which has to do with linguistic coverage

and this is the problem

that will always be with this it will be whatever used all of the human

is unpredictable

i think that you know what was it things that you're model does not anticipate

and so we have to have a way of dealing with it

okay so

this is not then you at a

whatever has done question answering has come up with this problem

or of g how do i increase the coverage of my system

so what people have done and this is actually unbounded thing to do you have

a question there and you paraphrase it to in ir for example people to query

expansion it's the analogous idea what i have a question i will have some paraphrases

that will paraphrase it and then

you know what i will submit the paraphrases and i will get some answers and

the this is the problem solved

except that it is and if any of you have worked with paraphrases you see

but you know

the paraphrases can be really bad

and so you get a couple answers so now you have the problem and then

you've created a problem and the reason why this happens is because the

paraphrases are generated

independently

all your task of the qa module but you have so you have accurate module

you paraphrasing the questions and then you get answers and that not point do you

have v

and sir communicate with the paraphrase

to get something that you know

is appropriate for the task or for the qa model

so what i'm gonna show you now is how

we train these paraphrase model jointly

with a qa model for and then turn task and our task is again semantic

parsing except that this time because this is a more realistic tasks we're gonna be

asking a knowledge base like freebase or was knowledge graph

and of course there is a question that i will address in the bit where

do the paraphrases come from

who gives the most who what where are they

okay so this is don think this slide of but it's actually really simple and

i'm gonna take it through this so this is how we see the

modeling framework as

we have a question who created microsoft

and we have some paraphrases

bettered even with this and i will tell you mean the minute whole gives the

paraphrases assume for a moment we have these paraphrases

now what we will do is we will first take all these paraphrases here

and score them

okay

so we will then called we will get question vectors we will have a model

that gives the score how what is this paraphrase for question

how would is who founded microsoft as a paraphrase for who created microsoft

now once we normalize this course

then we have our question answering module so we have two modules one is the

paraphrasing module in one the question answering module and their trained jointly

so once i have my scores for my paraphrases these are gonna may be used

to weight the answers given the question

so this is gonna tell your model well look

this answer is quite good given your paraphrase or this answer is not so good

giving your paraphrases do you see now that you kind of latter which paraphrases are

important for your task

for your question answering model

and your answer jointly

okay

so

a bit more formally we have

them the modeling problem is we have the an answer

and we want to model the probability of the answer given the question

and this is factorized into two models one is the question answering model

and the other one is the paraphrasing model

now for the question answering model you can use whatever you like

your latest neural qa model you can plug in there and

this is what the paraphrase model

if whatever you have as long as you can actually

and called them somehow

it doesn't really matter

now i will not talk a lot about the question answering model we used an

in-house model that is based on graphs that the

is quite simple be it just as graph matching on wheels knowledge graph

and i'm gonna tell you a bit more about the paraphrasing model

okay so this is how we score of the paraphrases

we have a question

we generate paraphrases for this question

and then for each of these paraphrases so we will just

score them how good r-d given

my question

and this is you know a dot product essentially

is a good paraphrase or not

but it's trained and they're and

with the answer in mind

so

is this paraphrases going to help me to find the right answer

and now

as far as the paraphrases are concerned again this is applied can play module you

can use your favourite so if you are in limited domain you can write them

yourself

manually

you could use wordnet

or pp db which is this database which has a lot of paraphrases

but we do something else a

using neural machine translation

okay so this like to put it i know everybody knows it but it's my

favourite slide of all times

because

but we address tried to do this slide again

it's not as good as the original

like you do it in particular if you go to machine translation talks about that

all this is a machine translation

or ever come to capture so beautifully

the fact that bob sorry the fact that you have this language here

you have this english language and that you have attention weights so beautiful

and then you take it is sensational weights and you wait them

with the decoder and hey presto you get the french language

so

this is your usual machine translation your vanilla machine translation engine

it's again and encoder-decoder model with attention

and we assume we have access to this engine

now

you may wonder how i'm not gonna get paraphrases out of this

this again an old idea which goes back a back actually the martin k somatic

a i think can be eighties

notice this thing so what we wanted to ease

in the case of english goal from english to english

so we want to be able to sort of paraphrase and english expression to another

english expression but in machine translation i don't have any direct path

from english to english

what i don't have is a path from english to german

and german to english

so

the theory goal is if i have to english phrase is

like here under control

and

in check

if they are aligned or if they correspond to the same phrase in another language

there are likely to be a paraphrase

now i'm gonna use these alignments this is for you'd understand the concept but you

can see that i have english i translate english to german

then german gets back translated to english

i have my paraphrase

more specifically

i have my input which is in one language

okay i encoded i decode it into some translations in the foreign language g stance

here for german

i encode my german and then i decoded back to english

there is

two or three things you should not just about this thing

first of all

these things in the middle the translation so called people it's

and you see that we have k people it's

i don't have one translation but i have multiple translations distance out to be really

important because a single translation may be very wrong and then i'm completely screwed i

have very bad paraphrases

so i have to have multiple people it's i don't only that i could also

have multiple people it's in multiple languages

which then i take into account while i'm the coding

now this is very different from what do you may think of as paraphrases because

the paraphrases there never

explicitly stored anywhere they're all model internal

so what this thing variance i give it english you just paraphrases english into english

but i don't have an explicit database

with paraphrases

and of course they are all vectors and they're all scored but

i you know i cannot ball in say

where is that paraphrase i cannot give the model the paraphrase and it generates another

one which is very nice because you do generation for free in the past if

you had rules you have to see how you actually use them to generate something

that is meaningful and so on

okay

let me show again example

this is a paraphrasing the question what is the zip code of the largest car

manufacturer if we put people through french

so french tells us what is the zip code of the largest vehicle manufacturer or

what is the zip code of the largest car producer

if we people through german

what's the postal code of the biggest automobile manufacturer

what is the postcode of the biggest car manufacturer

and if we people through check

what is the largest car manufacturers postal code

or zip code of the largest car manufacturer

can i see a show of hands which are people to language do you think

gives you the best

paraphrases

i mean it's a sample of two

check

very good

check

proved out to be the best pay but

for the by german

french was not so good

and again here there's the question how many people it's to use what languages do

you choose i mean these are all experimental variables that you can manipulate okay

then we show you some results

the grey you don't need to understand

these are all be used baselines that somebody can use

to show that the model is doing something over and above the obvious things

this is

c grad the this graph here is using nothing so you go from forty nine

to fifty one

this it from sixteen to twenty

these are web questions a graph questions is our datasets that people have developed this

graph questions is very difficult it has like

very complicated questions that have a multihop reasoning so who's the bombers daughters friend dog

called a very difficult that's why the performance is really bad

what you should a c d's that

here pink is apparent that

is so in all cases

using the hold on a pad paranoid is pink

a here is second best system

and

read here is best system and you can see that it is very well in

the difficult dataset

in the other dataset there is another system that is better

but they use a lot of external knowledge which we don't have a better exploits

the graph itself which is another avenue for future work

okay

now this my last slide and then our take questions

what have we learned is so there is a couple of things that are interesting

first of all he's that

if you use encoder-decoder models

are

good enough

for mapping natural language to meaning representations with minimal engineering effort and the cannot emphasise

that

more

before

these paradigm shift

what we used to do is we would spend a huge is coming up with

features that we would have to re engineer

for every single domain so if i go from lambda calculus to sql and then

to python code are would have to do the whole process from scratch

here you have one model

with some experimental variables that you know you can keep fixed or change and it

works very well of across domains

a constrained decoding improves performance and only for this setting the type show to you

but for more weakly supervised settings

and i'll people are using this constraint encoding even

not in semantic parsing i so you know in generation for example

the paraphrases n and hands the robustness of the model and in general it would

say their useful

if you have other tasks leave for dialogue for example

you could give robustness to a dialogue model to generate answer of a chat board

and the models could transfer to other tasks or architectures i've shown for the purposes

of this talk

you know so as not to overwhelm people

simple architectures but you know you can put neural networks left right and centres you

feel like

now in the future i think there is a couple of a venues from future

work worth pursuing one is of course learning the sketch is so big could be

a latent variable in your model trying to you know generalise and that would mean

that you don't need to do any preprocessing you don't need to give the algorithm

the sketches

how do you do with multiple languages that have a semantic parser in english

how do i try switching chinese big problem in particular industry they have the come

up this problem a lot and their answers we higher annotators

how do you

train this model seaview have no data at all so just a database

and of course there is something but i would be in of interest to you

is how do i actually

do coreference how do i

model a sequence of turns

are suppose to a single turn

and without further ado i have one last slide and it's a very depressing slide

so

when they get this talk like a couple months ago i used to have this

where it was to resume

and a this is on twitter and she's to the david the jockeys to resume

will ask alexi to negotiate for her

and it will be fine i try to find another one with boris johnson

and failed i don't think it does technology

so and he doesn't of negotiating either

so she would have been she would at least negotiate and at this point out

just a questions thank you very much

really

and my store

the time for question

thank you this is result from i j p morgan so my question is do

we really need to do

to extract the logical forms

given the fact that

probably humans don't do we really except in really complicated

case

about my daughter that

do we really need to do that for a well in that world machine translation

we don't really extract all these things

but we do translate i even to

like personal data stuff

that's a that's a good question so the answer is the

yes no

so if you look at a lexus l or google these people

they have very complicated systems where they have

one module that does what you're say i don't translate to logical form i just

you know like to query matching and then extract the answer

but for some of the highly compositional way switch to get with to execute the

mean databases

and they all have internal representations of what they're which means

also

if you are developer and for example

whenever you have a database

and that has think so i seven genes or i still fruit and have a

database and the deal with

customers and i have to have a spoken interface there you would have to extracted

somehow now for the phone when you say cv a set my alarm clock i

would agree with you there you just need to recognize intents

and do the attribute slot filling

and then you're done

but whenever you know how

more like to beak infrastructure in the

output a of the answer space and then you do this

thanks for a very nice to

had a question on the on the paraphrase

the scoring and it seem to me something wasn't quite right if i understood it

well but what's more the you have an equation with the summation of thing that's

what so intuitively

to make the right thing is to you look for the closest paraphrase that actually

has an answer that you can a good quality actually can find it so you're

trying to optimize that's two things by finding something that means the same that we're

i can find an answer if i can't find a matter of the original question

but when you some that the problem as paraphrases that have been an equal

distribution out of some phrases have many paraphrases are many paraphrases in a particular direction

but maybe not so many in the others just depending on how many synonyms you

haven't so trying to add them up and weight them if you have a lot

of paraphrases here for the wrong answer and one for something that's better you know

it seems like the

closeness should dominated if you have a very high quality after and it seems like

your models trying to do something different that i'm wondering if that

is causing problems or something that are not seen that no right so this is

how morally strange at the case we have to make it robust

and you can manipulate the n-best paraphrases

access time you're absolutely right would just find the one the one max the one

that is best

so you are right it's and i did not explain well but you are absolutely

right that you know you don't have

you know you can be all over the place if you're just looking for the

sum of but its time we just want to one

a high thank you for the great war decision model for microsoft research so my

question is for the coarse to fine decoding would you think of its potential in

generating natural language outputs like dialogue like summarisation

a what get come again ask the question again what would be o

would you think of the potential of you close to find that's a good question

that connection question so

i think well i think it's very interesting now

for a

sentence generation so you mentioned summarisation i'll do one thing at a time so if

you're just want to generate

from some input a sentence

you want to do surface realization people have already done this is a rash they

have a very similar model where the first sort of

produce a template which they learn in from the temple at the surface realize a

sentence

however summarization which is the more interesting case

you would have to have a document template

and

it's not clear what this document template might look like in how you might learn

it so you may

for example i assume that the template it uses some sort of a tree or

a graph

with generalizations and then from there you just generate the summary

and i believe it's like very

we should do this but it will not be as trivial as

what to do right now which is the encode the document in the vector and

that have attention and then a bit of coffee and then here's your summary

so the question their want the template is

nobody has an answer

i was wondering if you could elaborate on your very late this work on generating

the abstract meaning representation because of course my reaction

what you are saying in the first five was

well

it's all good then where and when you have you know

a

corpus where you at the mapping between the query and did not and the and

logical form what do you do if you don't have which is the majority of

cases

see okay so this is a tough problem a so how do you do inference

with weak supervision a

and there is two things their that we found out that have

because the space you have dinner somewhere doing a but merely a it's

of

potential programs that execute and we haven't always signal

other than the right answer

so because the only signal is the right answer there's two things that can happen

one is ambiguity

so

it's entities it may be ambiguous we can be can be another turkey or both

took the country interactively

government

and so that then you're screwed and you will get things and the other one

is spurious this so you have things that execute to the right answer

they don't have the right intent the right semantics

and so what people do what do things we do the templates here

and then we have another step which actually again tries to do

some structural matching and tries to say okay so i have this abstract program

this will cut down the search space

and then

you also have to do some alignment and put some constraints of the sensei for

example

i cannot have

column silver repeated twice

because this is no well formed

but

the accuracy of these i didn't put it is like forty four percent

knots you know

note anywhere i mean the global in amazon would laugh

there is a more work to be

so thank you for the talk so i have a question about your calls lane

deporting so you go your course plaintiff or being you use a meaning representation but

you're the whole being final deporting these of these two based on the cross marks

it'll be both old ones but it to be politically

o and it means that there is no guarantee that the meaning representation we use

the on wavelet the that intonation without but in some cases so we need to

consider such things because if we consider of the semantics some arguments over the eight

it was something

of the d scene which should be included in that the warnings

that is a very good i'm glad they are you guys were paying attention so

yes we don't have we don't have this and

we saved constraint a coding but what you really do is you constraining the encoding

hoping of their your decoder will be more constrained by the encoding

you could include we didn't know analysis where we saw two things one is how

good are the temple so if you're templates are

not great so what you're saying

will be more problematic

and we didn't analysis let me see if i have a slide that shows that

actually the templates are working quite well

i might have a slight i don't remember

yes

so this slide shows you see

the sequence to sequence model the first row use the sequence to sequence model

and without any sketches

and the second is a coarse to fine where you have to predict the sketch

and you see that the coarse to fine predicts a sketch is much better

then the one stage more than one but does sequence to sequence

so this tells you that you

are kind of winning but not exactly

so it's i don't know what if what would happen if you includes these constraints

might

my answer would be this doesn't happen a lot it could be but it's the

logical forms we tried if you have vary along very complicated so we've and then

you really huge sql where is then

i would say that you're approach

would be required

okay no it's

this could do

so maybe ask one question okay it's that in the last time that's what you

said that the model seventies this doesn't

so you so what is i mean it double that all use related to the

qa or once in this and one up but in a dialogue case we have

a multiple times

so what is the common problems more will be good

yes so i i'll send you i have a nice of this so we did

try to do

this paper in submission multiple turns

so where you say an example i want to buy this levi's jeans

how much to the course to do you have the mean another side

or other two why well what is the colour so you know you elaborate a

new questions and there's patterns of you know these multiturn dialogue but you can do

and

you can do this but the one thing that we actually need to sort out

before doing please

is coreference

and

because right now this model some take a reference into account if you model coreference

in the simple way of like a look at the past and they do modeled

as a sequence it doesn't really work that well so i think definitely

sequential question answering is the way the goal i have not seen any models that

make me go like all this is great but

yes it's a very problem and the very not sure but you know one step

at the time

so thank you much so that sense because they give him