but we have a session sure
okay thank you got real so that we want to the office keynote
so using this time i vector you do not use
the first keynote
the first keynote speaker is needed about the
the proposal
school of informatics university obeyed embark
but not be due to use a proper so natural language processing in the school
of informatics that's the university of edinburgh okay
how results will can see his own
one getting compute us to understand reasonably and generate natural language so zero talk about
that he's got how kind of research activities
there's a more information on the other proceedings a node but
she doesn't
okay right okay you can hear me i sat right
okay at
right like that what it like a was saying earlier this talk is gonna be
about learning
natural language interfaces with neural models
and so i'm gonna give you a bit of
and introduction as to what these natural language interfaces are
and then we're gonna see how we build a more problems are related to them
and you know what future lies ahead
okay so what he's a natural language interface it's the most intuitive thing one wants
to do to a computer
you just want to speak to it the computer in an ideal world understands and
executes what you wanted to do
and this
billy don't know it is like one of the first things that people
wanted to do with nlp so in the sixties
when we didn't have computers the computers didn't have memory
we didn't have neural networks none of this
the first systems that appeared out there
had to do
speaking to the computer
and
getting some response so green at are in nineteen fifty nine
presented this system called the conversation machine
and this was the system that was having conversations with a human can people guests
or know what about
the weather
well it's always the weather
first the weather and then everything else so that they said okay the what the
weather is a bit boring let's talk about baseball
and this work very primitive systems they just had models they had grammars you know
it was all manual but the intent was there we want to communicate with computers
well in a little bit more formally
what the task entails
is we have a natural language
and natural language has to be translated
by what you see the arrow thereby parser you can think of it as a
model or some black box the takes the natural language
and translates it
to something
the computer can understand
and this cannot be natural language had it must be
either sql or lambda calculus or some internal representation that the computer as
to give you an answer
okay
so as an example
it is again has been very popular within the semantic parsing field you query a
database
but you actually don't want to learn the syntax of the database and you don't
want to learn a square you just ask the question what are the copy those
of states bordering texas
you translate these into these logical form you see down there
okay you don't need to understand this is just something that the computer understands you
can see there is variables it's a form a language
and then you get the answer and i'm not gonna tell you the answer you
can see here texas is bordering a lot of states
now i start from asking data bases the questions another task and this is an
actual task that people have deployed in the real world
is instructing a role board to do something that you wanted to do
again this is a another example you can tell the robot if you have it
one of this little robots of make you coffee and you know go up and
down the corridor
you can say at the chair move forward three steps past the sofa
again the robot has to translate this into some internal representation but you understands
in order not to crash against the software
another example is actually doing question answering and
a there is a lot of systems like this using a big knowledge base like
freebase doesn't exist anymore
it's called knowledge graph
but this is issue much graph with millions of entities and connections between them
and the delayed congolese using
it's when you ask a question i mean to have many modules but one of
them is that
so
one of the questions you may want to ask is for the male actors in
the titanic and again this has to be translated
in some language
that freebase or your knowledge graph understands and you can see here this is expressed
in
lambda calculus but you have to translate it meant that some sql that the freebase
again
understand
so you see there is many applications in the real world of that
necessitate semantic parsing or some interface with a computer
and
here comes the man himself so bill gates
the costume mit publishes this technology review it's actually
very interesting i suggest that you take a look
and it's not very mit centric they talk about many things
and so this year they when an asked a bill gates they said to him
okay what do you think are the new technological breakthroughs theme pensions the of two
thousand nineteen
that will actually change the world
and so if you read the review
he starts by saying you know i want to be able to detect premature babies
fine
then he says you know with a couple free burger
so no meat
you make a burgers so you know because the world has so many animals
then he talks about drugs for cancer and the very last
he's
smooth talking ai assistance so semantic parsing comes last which means that you know it's
very important to bill gates
now
i don't know why i mean no why
but anyway he thinks it's really cool
and
of course is not only bill gates
every company you can fit coref has a smooth talking a is system or is
working on one
or using the back of their head or they have prototypes
and there's so many of them
i so i'll xa is your sponsor
there is cortana a context has at least what will
decided to be different of the call it will hold not some female name
then god
so there is get salience of these things
and can i see is shorthand how many people have one of them at home
very good
do you think do you think that work
how many how do you think they work
exactly so here i want this think the set alarms for me all the time
i mean it they work if you're in the kitchen they use a lexus set
for half an hour
or can do you have to monitor the kids homework
but
we want these things to
go beyond simple commands
now i'll just show here
and there is the reason why there's so much talk about these smooth talking i
assistance because
they could have in society a four
not able people for people who cannot see for people who are you know are
disabled
is actually pretty huge if it worked
now i'm gonna show here
if we deal
the video is the parity of i'm as an l x to
and you see it and then you understand immediately
why
there's no sound
hello
we check the sound as well before
should i do something
i raise of the volume is raised
to the max
amazon and everyone asking for help
technology isn't always easy to use for people others are you thinking
that's why i was on par with a darpa to present amazon so we only
smart speaker device designed specifically we used five greatest generation it's to rule out and
response in even remotely close to
and there is a forty i agree i
i
i
no
using hold true
one two three
this is like your thermostat i was set to ten
i one
i feel may have
you amazon co silver placed on the music they loved when they were a
it also has a quick skin feature to help them find things
right
feature for a long rambling stories i is the one i
so i
i really great of yours did i say yours today to them as a nickel
silver said to check or money order to do not go right i think that's
not exist
okay
it's saturday night live sketch
but you can see how we could help the elderly
or those in need it could to remind you for example to take two pills
or you know it could help you feel more comfortable in your own home
now
let's get a bit more formal a so what are we going to try to
do here we will try to learn this mapping from the natural language
to the
for remote
representation that the computer understands and the landing setting is we have
sentence logical form
and biological form i will use the terms logical form
meaning representations interchangeably because
the model so will be talking about do not care about what the
meaning representation is what the program if you like that the computer will execute days
so we assume we have sentence logical form pairs
and this is a setting the most of the work has focused on a previously
so it's like machine translation but except that you know the target is a an
executable and which now
this task
is harder than it seems for three reasons
first of all
their ease
it's severe mismatch between
d natural language and the logical form
so if you look at this example how much does it cost a flight to
boston
and look at the representation here
you will immediately notice that
they're not very similar this structures mismatch
and a only there is a mismatch between the logical form
and the natural language string
but also its syntactic representation so you couldn't even using text if you wanted to
get the matching
so here for example
flight
would align to fly
and two and boston to boston but then fair corresponds to these huge natural language
phrase how much does it cost and the system must
in federal of that
now
this is the first challenge of destruction mismatching
the second challenge has to do with the fact
that
the former language
the program if you like that we have to execute with a computer
has structure any has to be well-formed
you cannot just generate anything and hope that the computer will give you an answer
so this is a structure prediction problem and
if you look here for the male actors and the titanic there is
three mating representations
do people see which one is the right one
i mean they all look similar you have to squint that it
the first one
hasn't bound variables the second one has apparent this is that is missing
so the only right one is the last one
you cannot do it approximately
it's not like machine translation you're gonna get the gist of it you actually need
to get the right logical form of that executes the computer
now the fact challenge
and this is when you deploy google holman lx that the people who developed these
things immediately notice is that people will say
i mean
so
the same intent can be realized in very many different expressions who created microsoft
microsoft was created by
who founded microsoft qualities the founder of microsoft and so on and so forth
and all that maps to this little bit from the knowledge graph which is
well under bill gates are the founders of microsoft
and we have to be able the system has to be able you're semantic parser
to actually deal
we've all of these
different ways that we can express
are intent
okay
so in this talk we have three parts
well actually three parts so first i'm gonna show you how with neural models we
are dealing with this
structural mismatch
using something that is very familiar to all of you the encoder decoder paradigm
then i will talk about the
structure prediction problem and the fact that you're and not if you're like your formal
representation has to be well-formed using this coarse to fine decoding algorithm i will explain
it and then finally i will show you solution to the coverage problem
okay
now i should point out that there are many more challenges that and are there
and i'm not going to talk about but it's good to flag of them
where do we get the training data from so i told you that we have
to have
natural language logical form pairs to train the models for creates this and some of
it is like i actually quite complicated
what happens if you have out-of-domain queries if you have a parser trained on one
domain let's say the weather and then you want to use it for baseball
what happens if you don't have actually only
independent questions and answers but you have codependent there's coreference between the aquarius now we're
getting into the territory of dialogue
what's with speech we all pretend here that speech is to solve problem it is
and a lot of times alexi doesn't understand children doesn't in the some people with
accents like me
and then you talk to design wasn't people and you say but okay so do
you use the lattice and the good old the lattice we use on a lattice
of one because you know
if it it's to each slows us down the so there is many
technical and actual a challenge is that you know
have to all work together to make this work this thing work
okay
so let's talk about the structure mismatches
and so here the model is something you all must be a bit familiar with
and it's
one of the like
there is three or four things with neural models that get a recycled a over
and over again the encoderdecoder framework is one of them
so we have natural language as input
we encoded with using an lstm or whatever favourite model you have a you can
use a transform all the transformers don't work for this task
but well because the datasets are small
whatever the next thing is you encoded you get a vector out of it then
these encoded vector is serves as an input to
another lstm that actually decoded into
and logical form
and you will not use here i say you decoded into a sequence
or a tree
i will not talk about trees but i should flak that there is a lot
of work trying to decode
the natural language into this tree structure which makes sense since
the logical form has structures there's parentheses there is a there is a recursive
however in my experience these models
are weighted complicated to get to work
and
the advantage over the assuming that the logical form is a sequence is not that
great so for the rest of the talk we will assume that we have sequences
in and we get sequences out and we will pretend
but the logical form is a sequence even though it isn't
okay
a little bit formally the model will map
the natural language input
which is a sequence of tokens x to logical form
representation of its meaning a which is a sequence of tokens y
and we are modeling the probability of
the
input
given
the representation of the meaning
and the encoder
we'll just in called the language into the vector this vector then will be fed
into the decoder which will the generated conditioned on the encoding vector
and of course we have the
very important
attention here the attention mechanism that the original models did not use attention but then
everybody realised in particular in semantic parsing it's very important because it deals with this
structure mismatching problem
so i'm assuming people are familiar here it instead of actually generating the tokens in
the logical form one by one without considering the input the attention will look at
the input be able
wait
the output given the input and you will get things you will get some sort
of certainty that you know
if to generate mountain maps two mountain in my input
now
this is a very sort of simplistic view of semantic parsing
it assumes that not only natural language is a string
but what the logical form
does is also a string and
and this may be okay but maybe it isn't
there is a problem so i and i'll explain
so we train this model by maximizing the likelihood of the logical forms
given the natural language input to this is a standard
its time
we have to predict the locks the logical form that for any input utterance
and we have to find the one that actually maximizes this probability
of the output given the input
now trying to find this
argmax can be very computationally intensive and if you're google you can do beam search
if you're university of edinburgh you just too greedy search any works just fine
now
can people see the problem with this assumption of actually decoding into a string
remember the second problem but i said we have these we have to make sure
that the logical form is a well formed
and by assuming that everything is a sequence i have no way to check for
example that my parentheses are being matched
i don't all these because i've forgotten what i've generated
so i keep going to get mine at some point i
it he the end of sequence and that's it
so we actually want
should be able to enforce some constraints of well formedness on the output
so how are we gonna do that
we're gonna do this with this idea of coarse to fine decoding which i'm gonna
explain
so again we will have are not sure language input here all slides from dallas
before ten am
and i what we would do before is we will be called the entire
natural language string into this logical form representation but now what can insert a second
stage
where we first
the cold
to a meaning sketch
what the meeting's sketch does is it abstracts away details
from the very detailed logical form it's an abstraction
it doesn't have arguments it doesn't have variable names you can think of it
if you're familiar with
template it's a template of the
logical form of the meaning representation
so first we will have a natural language
to decode into this meeting sketch and then we will use this meeting this case
to fill in the details
know why does this make sense
well there is several arguments first of all you disentangle higher level information from low-level
information
so there are some things that are the same
across logical forms
but you want to capture
so you're meaning representation in this case at the sketch level is gonna to be
more compact so in if for example a need to switch is the dataset we
work with
these catch use nine point two tokens as opposed to twenty one twenty one tokens
is a very long logical form
another thing that is important is that the model level because then you explicitly share
the core structure
that is the same for multiple examples so you use your data more efficiently
and you learn to represent commonalities across examples which the other model did not know
so you do provide global context
to do the find meaning decoding no i have a graph coming up in a
minute
now
the formulation of the problem is the same as before we again map natural language
input to the logical form representation
except now that we have two stages in this model and so we again the
model the probability of the output given the input
but now
this is factorized into two terms
the probability of
the meetings kitsch given the input
and the probability of the output
given the input in the meetings catch
so the meetings get
i is shared between those two terms
and i'm sure you a graph here so the
green nodes are to be encoder units the orange or brown i don't know how
comes out here this colour
are the decoder human it's so in the beginning we have a natural and which
we will encoded with your favourite encoder
here are you see a bidirectional lstm
then we will use this encoding
to decode two s catch
which is this abstraction of the high-level meaning representation
once would you call it this catch we will
and coded again
we do not or bidirectional lstm into some representation
that we will fit in to our final decoder that fills in all the details
we're missing
and you can see at their the red bits are the information that i'm filling
in
you will see a list of the this decoder
this the coder takes into account
not only the encoding
all of the sketch
but also the input
remember in the probably probability terms it is
be probability of x given x and a
the probably y given x n a y and use our output x is their
input and the a is the encoding of my sketch
okay this is what why we say
the sketch provides context for the decoding
okay
no training and inference works the same way to gain maximizing the log-likelihood of the
generated meaning representations given the natural language
and a test set i'm again we have to predict both the sketch and the
more detailed logical form
and we do this via greedy search
okay so a question that they have not addressed is where do these templates come
from
where do we find the meaning sketches
and if the answer that i would like to give you use our work we
would just an errand
now
that is fine we can their them
but a first will try something very simple no show you examples because of the
simple thing doesn't work then learning will never work
so
actually example so the different meanings sketches
for different kinds of a meaning representations
so here we have logical form lambda calculus
and it's very trivial
to understand how would you would get the meeting sketches you would just
get rid of arable information
you know lambda counts and arg max this gets you would anything that is specific
to that would remove we would remove any notions of arguments
and
a any sort of
information that may be specific to the logical form so you see here
this is the details for and this
whole the expression becomes lambda to a fight there is known numeric information so these
are variables
this is for logical form
if you have source code this is python a thinks are very easy actually would
just substitute tokens with token types
so here is the python called and
s will become a name for will become a number
named here is the name of the function and then this is a string
of course
we want to keep the structure of the expression as it is so we will
not substitute delimiters operators or built-in keywords
because that would change actually what the problem program is meaning to do
if we have sql query is
it's again simple to get this meeting sketches so this is above you can see
this is the s two l syntax
so we have a select clause and we have two
first select the columns so industrial we have tables and they have columns
here we have to select the call them and then
we have the where clause that has conditions on it so in the example we're
selecting a record company
and here we are saying
the where clause put some conditions so the hearer reporting in this record company has
to be after nineteen ninety six of the contact conductor has to be
michael thus need cohesive russian composer now if you want to create a meeting scheduled
very simple
well we'll just have the syntax of the were close where
larger and
and equal
so we'll just have the were close in the conditions on it
these are not filled out yet so we could apply
too many different columns in an sql table
okay let me show you some results so i'm gonna compare
the simple model that have shown you the simple is supposed to sequence model
with this more sophisticated model but that's constrained decoding
and this is comparing two state-of-the-art of course
the state-of-the-art is a moving target in the sense that now all these numbers with
barrett
a people are familiar with paired rate and so these numbers with paired
go up by some percent so whatever show you
you can add in your head
two or three percent
it so this is that it is models do not use but so this is
the previous to the state-of-the-art this is geo query and the eighties this some gonna
trigger results for and
different datasets
and this important to see that it works in different datasets with very different meaning
representation so somehow of logical form do you play an eighties have logical form
and then we have an example with python code and with sql so here is
the system
uses syntactic the coding
so it uses
i
quite sophisticated grammatical operations that then get compose two with neural networks
to perform semantic parsing
this is the simple sequences you ones model or showed you before
and this is coarse to fine decoding so
you do get a three percent increase
with regards to eight is a this is very interesting it has fan every very
long utterances in very long logical forms
again at six you do almost as well
remember what is said about you know
syntactic the coding does not give so much of an advantage
and then again
we get a bows with coarse to fine
and a similar pattern can be observed when you use
sql
for you jump from seventy four to seventy nine
and the john goal use these
pi phone so you execute python code and again from seventy to seventy four
okay
now this is on the side no just mention it a very briefly
all the all the tasks and i'm talking about here
are dealing with the fact that you have
your input and you're output pre-specified some human goal was and writes down to logical
form
for the utterance
and the community has realise that this is not scalable
so what we're also trying to do is to work with weak supervision where you
have the question
and then you have the answer
no logical form
the logical form is latent
and you have to
come up with it the model has to come up with it so now this
is good because it's more realistic
but it opens another huge kind of warms which is you have to come up
with a logical forms you have to have a way of generating them
and then you have a and their this variance because you don't know which ones
are correct and which ones are and
so here we show you table you're given the table
you're given how many silver medals in the nation of turkey when
and the answer which is zero and that you have to hallucinate all the rest
so this idea of actually using the meaning skate used
is very useful in this scenario
because it sort of restricts the search space
so rather than actually a looking for all the types of logical forms you can
have you sort of first generate a map struck
program or and meaning sketch
and then
once you have that
you can feel in pdtb so this idea of obstruction
is helpful that would say
in this scenario even more
okay
now
let's go back to the third challenge which has to do with linguistic coverage
and this is the problem
that will always be with this it will be whatever used all of the human
is unpredictable
i think that you know what was it things that you're model does not anticipate
and so we have to have a way of dealing with it
okay so
this is not then you at a
whatever has done question answering has come up with this problem
or of g how do i increase the coverage of my system
so what people have done and this is actually unbounded thing to do you have
a question there and you paraphrase it to in ir for example people to query
expansion it's the analogous idea what i have a question i will have some paraphrases
that will paraphrase it and then
you know what i will submit the paraphrases and i will get some answers and
the this is the problem solved
except that it is and if any of you have worked with paraphrases you see
but you know
the paraphrases can be really bad
and so you get a couple answers so now you have the problem and then
you've created a problem and the reason why this happens is because the
paraphrases are generated
independently
all your task of the qa module but you have so you have accurate module
you paraphrasing the questions and then you get answers and that not point do you
have v
and sir communicate with the paraphrase
to get something that you know
is appropriate for the task or for the qa model
so what i'm gonna show you now is how
we train these paraphrase model jointly
with a qa model for and then turn task and our task is again semantic
parsing except that this time because this is a more realistic tasks we're gonna be
asking a knowledge base like freebase or was knowledge graph
and of course there is a question that i will address in the bit where
do the paraphrases come from
who gives the most who what where are they
okay so this is don think this slide of but it's actually really simple and
i'm gonna take it through this so this is how we see the
modeling framework as
we have a question who created microsoft
and we have some paraphrases
bettered even with this and i will tell you mean the minute whole gives the
paraphrases assume for a moment we have these paraphrases
now what we will do is we will first take all these paraphrases here
and score them
okay
so we will then called we will get question vectors we will have a model
that gives the score how what is this paraphrase for question
how would is who founded microsoft as a paraphrase for who created microsoft
now once we normalize this course
then we have our question answering module so we have two modules one is the
paraphrasing module in one the question answering module and their trained jointly
so once i have my scores for my paraphrases these are gonna may be used
to weight the answers given the question
so this is gonna tell your model well look
this answer is quite good given your paraphrase or this answer is not so good
giving your paraphrases do you see now that you kind of latter which paraphrases are
important for your task
for your question answering model
and your answer jointly
okay
so
a bit more formally we have
them the modeling problem is we have the an answer
and we want to model the probability of the answer given the question
and this is factorized into two models one is the question answering model
and the other one is the paraphrasing model
now for the question answering model you can use whatever you like
your latest neural qa model you can plug in there and
this is what the paraphrase model
if whatever you have as long as you can actually
and called them somehow
it doesn't really matter
now i will not talk a lot about the question answering model we used an
in-house model that is based on graphs that the
is quite simple be it just as graph matching on wheels knowledge graph
and i'm gonna tell you a bit more about the paraphrasing model
okay so this is how we score of the paraphrases
we have a question
we generate paraphrases for this question
and then for each of these paraphrases so we will just
score them how good r-d given
my question
and this is you know a dot product essentially
is a good paraphrase or not
but it's trained and they're and
with the answer in mind
so
is this paraphrases going to help me to find the right answer
and now
as far as the paraphrases are concerned again this is applied can play module you
can use your favourite so if you are in limited domain you can write them
yourself
manually
you could use wordnet
or pp db which is this database which has a lot of paraphrases
but we do something else a
using neural machine translation
okay so this like to put it i know everybody knows it but it's my
favourite slide of all times
because
but we address tried to do this slide again
it's not as good as the original
like you do it in particular if you go to machine translation talks about that
all this is a machine translation
or ever come to capture so beautifully
the fact that bob sorry the fact that you have this language here
you have this english language and that you have attention weights so beautiful
and then you take it is sensational weights and you wait them
with the decoder and hey presto you get the french language
so
this is your usual machine translation your vanilla machine translation engine
it's again and encoder-decoder model with attention
and we assume we have access to this engine
now
you may wonder how i'm not gonna get paraphrases out of this
this again an old idea which goes back a back actually the martin k somatic
a i think can be eighties
notice this thing so what we wanted to ease
in the case of english goal from english to english
so we want to be able to sort of paraphrase and english expression to another
english expression but in machine translation i don't have any direct path
from english to english
what i don't have is a path from english to german
and german to english
so
the theory goal is if i have to english phrase is
like here under control
and
in check
if they are aligned or if they correspond to the same phrase in another language
there are likely to be a paraphrase
now i'm gonna use these alignments this is for you'd understand the concept but you
can see that i have english i translate english to german
then german gets back translated to english
i have my paraphrase
more specifically
i have my input which is in one language
okay i encoded i decode it into some translations in the foreign language g stance
here for german
i encode my german and then i decoded back to english
there is
two or three things you should not just about this thing
first of all
these things in the middle the translation so called people it's
and you see that we have k people it's
i don't have one translation but i have multiple translations distance out to be really
important because a single translation may be very wrong and then i'm completely screwed i
have very bad paraphrases
so i have to have multiple people it's i don't only that i could also
have multiple people it's in multiple languages
which then i take into account while i'm the coding
now this is very different from what do you may think of as paraphrases because
the paraphrases there never
explicitly stored anywhere they're all model internal
so what this thing variance i give it english you just paraphrases english into english
but i don't have an explicit database
with paraphrases
and of course they are all vectors and they're all scored but
i you know i cannot ball in say
where is that paraphrase i cannot give the model the paraphrase and it generates another
one which is very nice because you do generation for free in the past if
you had rules you have to see how you actually use them to generate something
that is meaningful and so on
okay
let me show again example
this is a paraphrasing the question what is the zip code of the largest car
manufacturer if we put people through french
so french tells us what is the zip code of the largest vehicle manufacturer or
what is the zip code of the largest car producer
if we people through german
what's the postal code of the biggest automobile manufacturer
what is the postcode of the biggest car manufacturer
and if we people through check
what is the largest car manufacturers postal code
or zip code of the largest car manufacturer
can i see a show of hands which are people to language do you think
gives you the best
paraphrases
i mean it's a sample of two
check
very good
check
proved out to be the best pay but
for the by german
french was not so good
and again here there's the question how many people it's to use what languages do
you choose i mean these are all experimental variables that you can manipulate okay
then we show you some results
the grey you don't need to understand
these are all be used baselines that somebody can use
to show that the model is doing something over and above the obvious things
this is
c grad the this graph here is using nothing so you go from forty nine
to fifty one
this it from sixteen to twenty
these are web questions a graph questions is our datasets that people have developed this
graph questions is very difficult it has like
very complicated questions that have a multihop reasoning so who's the bombers daughters friend dog
called a very difficult that's why the performance is really bad
what you should a c d's that
here pink is apparent that
is so in all cases
using the hold on a pad paranoid is pink
a here is second best system
and
read here is best system and you can see that it is very well in
the difficult dataset
in the other dataset there is another system that is better
but they use a lot of external knowledge which we don't have a better exploits
the graph itself which is another avenue for future work
okay
now this my last slide and then our take questions
what have we learned is so there is a couple of things that are interesting
first of all he's that
if you use encoder-decoder models
are
good enough
for mapping natural language to meaning representations with minimal engineering effort and the cannot emphasise
that
more
before
these paradigm shift
what we used to do is we would spend a huge is coming up with
features that we would have to re engineer
for every single domain so if i go from lambda calculus to sql and then
to python code are would have to do the whole process from scratch
here you have one model
with some experimental variables that you know you can keep fixed or change and it
works very well of across domains
a constrained decoding improves performance and only for this setting the type show to you
but for more weakly supervised settings
and i'll people are using this constraint encoding even
not in semantic parsing i so you know in generation for example
the paraphrases n and hands the robustness of the model and in general it would
say their useful
if you have other tasks leave for dialogue for example
you could give robustness to a dialogue model to generate answer of a chat board
and the models could transfer to other tasks or architectures i've shown for the purposes
of this talk
you know so as not to overwhelm people
simple architectures but you know you can put neural networks left right and centres you
feel like
now in the future i think there is a couple of a venues from future
work worth pursuing one is of course learning the sketch is so big could be
a latent variable in your model trying to you know generalise and that would mean
that you don't need to do any preprocessing you don't need to give the algorithm
the sketches
how do you do with multiple languages that have a semantic parser in english
how do i try switching chinese big problem in particular industry they have the come
up this problem a lot and their answers we higher annotators
how do you
train this model seaview have no data at all so just a database
and of course there is something but i would be in of interest to you
is how do i actually
do coreference how do i
model a sequence of turns
are suppose to a single turn
and without further ado i have one last slide and it's a very depressing slide
so
when they get this talk like a couple months ago i used to have this
where it was to resume
and a this is on twitter and she's to the david the jockeys to resume
will ask alexi to negotiate for her
and it will be fine i try to find another one with boris johnson
and failed i don't think it does technology
so and he doesn't of negotiating either
so she would have been she would at least negotiate and at this point out
just a questions thank you very much
really
and my store
the time for question
thank you this is result from i j p morgan so my question is do
we really need to do
to extract the logical forms
given the fact that
probably humans don't do we really except in really complicated
case
about my daughter that
do we really need to do that for a well in that world machine translation
we don't really extract all these things
but we do translate i even to
like personal data stuff
that's a that's a good question so the answer is the
yes no
so if you look at a lexus l or google these people
they have very complicated systems where they have
one module that does what you're say i don't translate to logical form i just
you know like to query matching and then extract the answer
but for some of the highly compositional way switch to get with to execute the
mean databases
and they all have internal representations of what they're which means
also
if you are developer and for example
whenever you have a database
and that has think so i seven genes or i still fruit and have a
database and the deal with
customers and i have to have a spoken interface there you would have to extracted
somehow now for the phone when you say cv a set my alarm clock i
would agree with you there you just need to recognize intents
and do the attribute slot filling
and then you're done
but whenever you know how
more like to beak infrastructure in the
output a of the answer space and then you do this
thanks for a very nice to
had a question on the on the paraphrase
the scoring and it seem to me something wasn't quite right if i understood it
well but what's more the you have an equation with the summation of thing that's
what so intuitively
to make the right thing is to you look for the closest paraphrase that actually
has an answer that you can a good quality actually can find it so you're
trying to optimize that's two things by finding something that means the same that we're
i can find an answer if i can't find a matter of the original question
but when you some that the problem as paraphrases that have been an equal
distribution out of some phrases have many paraphrases are many paraphrases in a particular direction
but maybe not so many in the others just depending on how many synonyms you
haven't so trying to add them up and weight them if you have a lot
of paraphrases here for the wrong answer and one for something that's better you know
it seems like the
closeness should dominated if you have a very high quality after and it seems like
your models trying to do something different that i'm wondering if that
is causing problems or something that are not seen that no right so this is
how morally strange at the case we have to make it robust
and you can manipulate the n-best paraphrases
access time you're absolutely right would just find the one the one max the one
that is best
so you are right it's and i did not explain well but you are absolutely
right that you know you don't have
you know you can be all over the place if you're just looking for the
sum of but its time we just want to one
a high thank you for the great war decision model for microsoft research so my
question is for the coarse to fine decoding would you think of its potential in
generating natural language outputs like dialogue like summarisation
a what get come again ask the question again what would be o
would you think of the potential of you close to find that's a good question
that connection question so
i think well i think it's very interesting now
for a
sentence generation so you mentioned summarisation i'll do one thing at a time so if
you're just want to generate
from some input a sentence
you want to do surface realization people have already done this is a rash they
have a very similar model where the first sort of
produce a template which they learn in from the temple at the surface realize a
sentence
however summarization which is the more interesting case
you would have to have a document template
and
it's not clear what this document template might look like in how you might learn
it so you may
for example i assume that the template it uses some sort of a tree or
a graph
with generalizations and then from there you just generate the summary
and i believe it's like very
we should do this but it will not be as trivial as
what to do right now which is the encode the document in the vector and
that have attention and then a bit of coffee and then here's your summary
so the question their want the template is
nobody has an answer
i was wondering if you could elaborate on your very late this work on generating
the abstract meaning representation because of course my reaction
what you are saying in the first five was
well
it's all good then where and when you have you know
a
corpus where you at the mapping between the query and did not and the and
logical form what do you do if you don't have which is the majority of
cases
see okay so this is a tough problem a so how do you do inference
with weak supervision a
and there is two things their that we found out that have
because the space you have dinner somewhere doing a but merely a it's
of
potential programs that execute and we haven't always signal
other than the right answer
so because the only signal is the right answer there's two things that can happen
one is ambiguity
so
it's entities it may be ambiguous we can be can be another turkey or both
took the country interactively
government
and so that then you're screwed and you will get things and the other one
is spurious this so you have things that execute to the right answer
they don't have the right intent the right semantics
and so what people do what do things we do the templates here
and then we have another step which actually again tries to do
some structural matching and tries to say okay so i have this abstract program
this will cut down the search space
and then
you also have to do some alignment and put some constraints of the sensei for
example
i cannot have
column silver repeated twice
because this is no well formed
but
the accuracy of these i didn't put it is like forty four percent
knots you know
note anywhere i mean the global in amazon would laugh
there is a more work to be
so thank you for the talk so i have a question about your calls lane
deporting so you go your course plaintiff or being you use a meaning representation but
you're the whole being final deporting these of these two based on the cross marks
it'll be both old ones but it to be politically
o and it means that there is no guarantee that the meaning representation we use
the on wavelet the that intonation without but in some cases so we need to
consider such things because if we consider of the semantics some arguments over the eight
it was something
of the d scene which should be included in that the warnings
that is a very good i'm glad they are you guys were paying attention so
yes we don't have we don't have this and
we saved constraint a coding but what you really do is you constraining the encoding
hoping of their your decoder will be more constrained by the encoding
you could include we didn't know analysis where we saw two things one is how
good are the temple so if you're templates are
not great so what you're saying
will be more problematic
and we didn't analysis let me see if i have a slide that shows that
actually the templates are working quite well
i might have a slight i don't remember
yes
so this slide shows you see
the sequence to sequence model the first row use the sequence to sequence model
and without any sketches
and the second is a coarse to fine where you have to predict the sketch
and you see that the coarse to fine predicts a sketch is much better
then the one stage more than one but does sequence to sequence
so this tells you that you
are kind of winning but not exactly
so it's i don't know what if what would happen if you includes these constraints
might
my answer would be this doesn't happen a lot it could be but it's the
logical forms we tried if you have vary along very complicated so we've and then
you really huge sql where is then
i would say that you're approach
would be required
okay no it's
this could do
so maybe ask one question okay it's that in the last time that's what you
said that the model seventies this doesn't
so you so what is i mean it double that all use related to the
qa or once in this and one up but in a dialogue case we have
a multiple times
so what is the common problems more will be good
yes so i i'll send you i have a nice of this so we did
try to do
this paper in submission multiple turns
so where you say an example i want to buy this levi's jeans
how much to the course to do you have the mean another side
or other two why well what is the colour so you know you elaborate a
new questions and there's patterns of you know these multiturn dialogue but you can do
and
you can do this but the one thing that we actually need to sort out
before doing please
is coreference
and
because right now this model some take a reference into account if you model coreference
in the simple way of like a look at the past and they do modeled
as a sequence it doesn't really work that well so i think definitely
sequential question answering is the way the goal i have not seen any models that
make me go like all this is great but
yes it's a very problem and the very not sure but you know one step
at the time
so thank you much so that sense because they give him