so my name is monday madam and i'm presenting today work which was actually done
by my intent
at the beginning of the and the we should but i got vol
okay
so this work is based on the same data set that was just presented no
by on trash
and the this was the first release of that was in march
with an updated version in june actually
this dataset is you have just scene consisting about
fifty j meaning representation reference utterances
but as
off
of the following kind is as you can see here
and a starting point in this work
was to test this idea which
as been advocated in this block quite well known in the our and then deep
learning community
the unreasonable effectiveness of recurrent neural networks
a by an underage property
actually is i is it especially you know if stressing tractor just not about bits
required no
and
we wanted to test this simple id if a can we go
with and out of the box car based sec two sec model
with minimal intervention on all parts
so i in april about the same time as the data for the challenge was
first released
was really this framework by then you brits and collaborate tackle the
tf for transfer rule
sec two sec framework
which was original downfall experiments massive experiments with different configurations option and so on
in your own machine translation
with many yes
options and parameters which are pretty simple
to the two
to concede your
in net namely the number of layers of the hour and then is whether it
is the gru lstm
optimization regimes with the stochastic gradient descent of different types
a bidirectional on coding is possible we so that in the previous talk bidirectional coding
a different attention mechanisms also
and option web based as opposed to cap based
and this is the picture representation from that paper
overview of the model is a standard
on code the decoder sing with this possibilities this options
what we need
so we directly train a complex version of this framework
with bi directional encoding the source and plus some attention mechanism
on the data
namely this means that if you look at this data a namely the meaning representation
name the elicit right so
we take that as a simple string of characters with out any preprocessing any change
to that right
and similarly for the utterance the generated utterance than we hear human produced
we take this string of characters
we don't do any
well pretty a post-processing we don't do any tokenization no low a casing
and maybe very importantly no then lexically station
the icsi causation is we have since produce in some talks
is the process but we you replaced certain and name that it is typically by
such as small but at the start time
so i want to make a note that there isn't the right there is a
problem well known problem with word based model in sec two sec
called the real world problem where well problem
which is due to the fact that you need to have very big vocabularies and
that the value that type sec two sec model
and in section six mobile
doesn't know how to copy
words i it only knows how to
known
that a web scores in the source corresponds to a word in the target and
you need to
to learn these things in the plenty of each other so that means that excuse
nation is way to avoid this problem and all other mechanisms for like coping mechanisms
to handle this problem too
but with a base model you don't have this problem at all because the vocabulary
of symbols
is very small
or in our case in the order of fifty seventy characters were used in total
and no need to do delexicalise
and then and we conducted an evaluation
of our results on a the original dataset
so the bleu score there was a twenty five which is pretty low
but this was used to the fact that the original dataset dean group
the
the human reference a menu several around five missing human references
meaning representation different
and zero that's that the group them
and meaning that the blue evaluation that we did was a basically a single write
the evaluation
which gives much lower result then the result more recent evaluation that we did
on the probably grouped with multi rate
and this gave us all an order of seventy
point blue point which is much more
we also need a small scale human evaluation
i wish to evaluate those and what we found there is that the predictions of
the model where almost perfect in terms of linguistic quality
in terms of grammatical at and naturalness
there were no unknown words produced a normal has been invented words isolated words which
can happen
could i and we scatter bayes model because they produce character by character they don't
evolution of where
and the
annotators of and judge that the
prediction from the model was superior to the actually to the human reference which sometimes
was not was not great
in terms of linguistic
the content is
i thought that where some important semantic adequacy issue
the this the source prediction of the model that the prediction of the model right
was semantically correct in only fifty percent of the cases
and the main problem actually deal almost only problem was the admissions some sometimes a
mission of semantic material
all in or around fifty percent of the cosby test
have a perfect solution boss linguistically and from the point of view of semantic content
was found a into twenty based least
in around seventy percent of the cases
you know and
this is what we're stop that the summation time but
since then we have we've been working on explore trying to exploit that reranking model
models and
and similar scene
so
since a lot
i think
many details because they want to see that the pasta also