Speech Transcript - Conversational Eliciture in a Bayesian Model of Language Interpretation

she could mining at b one next get static

i think we have i where i know keynote speaker an extremely glad to that

coming can

a and detail is a perfect setups linguistics i've invested californians ninety eight

and if you see that you like is that he had a broad the second

one thing that out succumbs

from at the expense of a

a professor of linguistics like is also then if at the sri international in the

past their arts and sun microsystems

and he is associated celtic general of logic and complication and he has and then

at the executive both

i think ideas and is what actually e focus the and it's "'cause" interpretation so

you guys a lot of computational modelling luck but also experiment a lot and which

can be is stealing and think that i is to listen relationship you're that idea

i'd and like to nominate

as the sink a pragmatic and delta psycholinguistic features of language

and some of the things he's not that trinkets trumpet on and it's this collecting

instruction on a events

and ten

and his book a coreference utterance and that the idea of grammar isolate and a

and b citation

and any maybe come up with a set of speaker is a who would broadly

at the cable i think i understand dialogue in an ideal is a very a

backchannel an ideal choice

and in have been talking to a bunch of people like yesterday and today before

and you come from a variety of backgrounds comments and makes it might makes that

the psycholinguistic at least at the end you have something to say tell a few

and i just like and take get from there

okay thank you mean and you

being we okay

alright

well thank you very much

for having you here and four

a median in this morning

so that

famously positive to competing desiderata and language design right one he called the auditors economy

right which is kind of biased towards here

right that languages should enable here's to get the speakers intended a message

with minimal interpreted in inferential after right so that

pushes mine which stores having more products the unless the ambiguity

is that what we would like

language to have when we're building system right we want the information right there where

we can grab

unfortunately for systems there's a competing does it around

which is the speakers a kind of which has the languages should allow speakers to

get

their message across with minimal articulatory effort

right so that pushes towards

less felix the and greater amounts of ambiguity

kind of the limit if you see variants of a galaxy

it's kind of the group language right good so always says i am group and

then everybody has been for well what it means by that

one way to speakers can be economical

i and still be expressive in getting them there'll a message across

it's a designer utterances would take advantage to be here's

cognitive apparatus mental state incapacity for inputs

so is to be able to convey more information than what they explicitly say

and this voices problem week constantly face when we're building discourse and dialogue systems because

the systems don't have that same apparatus thing capability that languages kind of wrapped itself

around

now of course the source of these pragmatically determine aspects of meaning

also been kind of the focus

in pragmatic since its birth and it's become an industry of its own since the

seminal work of rice

what i'm gonna focus on this part is a type of actually semantic enrichment that

i'll clean them fit neatly into any the other kind of enrichment of interest custom

the list linguistics and philosophy literatures

so let's illustrate by treating right in the some examples

like a jogger with it by far out about the last night

you're probably getting that the victim was india somebody who jobs

but was actually jogging at a time

right

the sentence doesn't entailed

right and you can see that by comparing with one be a farmer it was

hit by acquired how about the last night

it's for less inevitable where you get an inference that the victim was four and

half the time right in fact if you're knowing about how afterwards pretty unlikely even

though it could be that require veered off the road one so far in the

field of markup for guy also extractor right

you're probably not getting so that you don't need

to get that inference

in a case like one b would cause one to ask what why are you

getting it and one a

it's not limited to choice of a nominal

you get a with adjectives as well

the drug addled undergrad fell auditory pints clips

you probably getting

not only that

the victim follow the clips

and

was on drugs

but fell off the cliff

or because they were on drugs

but if you get you compare with to be the well liked undergrad about the

storyline squareds you're probably not saying ty why would being well like to call somebody

the fall off

into c

the normally with skippers undergrads of auditory by waves

you're probably getting kind of a contrary to expectation inference there wondering why somebody who's

risk of course would find themselves in such a document

finally you get it would relative clauses and referring expressions as well

so the company fire the manager who was embezzling money

again you probably getting narrowing that they were embezzling money they were they were fired

and embezzling money

but they're fired because they were embezzling money

you can compare that the three be the company fire commander whose tired in two

thousand two again doesn't send you off on the search for a while being hired

two thousand would cause one

to be fired

and

i then three c is another case of the a bilabial expectation kind of inference

right so

mean you think about a dialogue system

i be perfectly natural to respond the freebie

by saying y

right but it would be a little i to respond that way to three a

well that's a speaker was trying to convey the reason for the fire

use if you ask why haven't picked up on the inference that the speaker intended

to get across

so for one of and interpret it appropriate term of or i'm gonna brand x

conversational elicited right it's meant to kind of play on use other terms and pragmatics

implicature explicate sure imps the structure and so forth which are we talking about the

moment

to get at the idea that what you have is a speaker who is choosing

her referring expressions among alternatives

so as to trigger inferences on the part of her here that wouldn't otherwise be

drawn

so wanna do one is talking is that i'm first gonna

the gap a little bit on the kind of linguistics and philosophy side so a

topic

they're with me on that

and kind of talk about why this is a new type of richmond in the

literature and what are the car what are the aspects of

people's cognitive ask apparatus

that the speakers taking advantage in being able to communicate this extra content

and that's can be largely joint work with jonathan how women the philosophy department use

est

then i'm gonna go experimental

with joint work with honda roller at university of edinburgh in talk about how a

list features are just important for getting all the content however at of the message

but also actually impact

the interpretation of language an unexpected places in this case illustrated with pronoun interpretation

and then

i will conclude with some slides on the ramifications of the model that will build

for computational work in the area

so if you are you know

from hollywood pragmatics

you probably react

to beat examples pressing one that sounds familiar that sounds like could be

in cases of the gradient implicature

right so

i think a lot of we only know what implicature is that won't going to

detail but the important thing is that

a according to grice's implicature results from assumptions of a rationality and whopper activity among

me in the lock interlocutors

you can just at out in terms of for maxims i well i will read

them but will be most interested in

the first quantity maxim

which says you know say is much improved

information is required

the third some maxim of manner that says to be brief

avoid unnecessary fill actually and then finally the relation maximum says be relevant

so that the important thing i want to focus on is that implicature is a

failure driven process

meaning

the here and encounters a problem and ralston implicature to fix it so basically what

happens is the speaker

says something that has the literal meeting say color p

and the here says gee it just a really means peace you wouldn't be very

what order

but

rather than that

this one

but i identify some after information for q

i assume that she's can trying to convey

then she becomes cooperative again

and so mean

i think in fact she intended that i do this whole calculation and draw the

inference q in addition to the content

p so to illustrate right we're gonna be talking about referring expressions amiss talk

and grace was the first denote the choice of referring expression

so i can in some cases have hallmarks of implicature so he's kinda

rather dated example was for actors meeting a woman this evening

which would normally implicate that the woman being mentioned is not ex's wife

sister mother and so even know those are all when

so the idea is that

if you're talking the speaker was talking about acts as y

then

she what is said white

but n and n in accordance with the maxim of quantity give as much information

is required

since the speaker didn't do that

we're gonna draw the inference that in fact

the referent the space of four of possibilities for a woman don't include these other

kind of salient possible a reference that would be denoted by terms like a system

otherwise and so

so implicatures

right

or kind of we would those out with standard tasks

basically when you have implicata content

you could do a few things with it you can actually asserted input on the

record that's a reinforcement

they can say x of meeting of a woman this evening in implicatum out his

wife and then you can actually save

but not his wife

and that doesn't have a strong sense of redundancy

you can select

in fact ceases wife

a or you can actually get on the record that you don't know that the

two status of the imply could consist

impact that's in fact possibly twice

well you or times are examples satisfy these tests as well right you can say

the company fired manager whose embezzling money in fact that's why you got fired

that's a reinforcement

but that's not widely got fired

cancellation

and that mainly why he got five which is the suspension

so our researchers just implicatures

there's one

one person who i think is really given a serious pragmatic example of exam analysis

of examples of the kind a general character

that i'm talking about here and have a and we profile

so i took this

this is a kind of a an example come

from the first

hillary clinton donald trump presidential debate in the us lester whole

is the moderator from n b c

and what is in trouble starting to ask a question

he did not say seven a

right research on for five years you perpetuated of false claim that brought about what

was

not an actual word that is

it's not only said what he said instead with seven b

it should run for five years you perpetuated of false claim that the nation's first

black right

was not a natural born citizens

those two sentences are extensively equivalent

right they differ in these over for rain expression that denote the same individual

but seventy goes beyond seven a

right in kind of

giving rise to this idea that there could be some kind of causal relation between

drama hassling a one man and his status as the first

why are present

fortunately nothing had happened the sense that you make its worry about rampant racism

sarcasm

and if we compare that with seven same as for example five years to perfect

way to false claim that the first part of the place to one of women

on that some key where was i do not report susan that gets a little

kinda confusing

using the one

explain actually to referring expression even know

that possible first to a bomb

compels ideas that you for you see that these referring expressions are longer and more

descriptive the need to

they violate the product c d sub maxima and the maxim of quantity

and what i and basically what you do is happens with some kinds of implicatures

you rescue it

by way of another max

in this case relation you find

this relevancy relationship

that justifies the use of the more probable x more informative referring expression there's a

lot of technical detail here that i'm just gonna gloss over

okay so now making it case so far that a list features or a species

of the implicatures

but the in general

these cases do not pattern with template

can maybe try triggered by the maxim of manner

not really right probably studies

an issue

probably the use my

require so and he'd age on fire the employee who was always late

you get the elicit your

john fired employee who is read here we generally don't

the relative pauses just picking out one

salient employee

and there's no real difference meaningful difference in perplexity between those two referring expressions

and ac john fire the employee who is right here appeared in glasses

is more products but you still don't get the a causal inference

so what the maximum and at elvis is that e c

might be side in a situation where at what it's advice right of there's only

one employee would right here why going on about to be are in classes but

its orthogonal to the existence of a causal inference in again like eight

a another reason for doubting

maxim of manner being

relevant here

is that

these examples lack kind of the canonical

the heat here

implicatures driven by mail or so

what

larry horn call the division of pragmatically or so

if we compare john kill bill with john "'cause" tilted i

those essentially have the same view notation

but you get this division where the shorter version tends to describe the more typical

situation and a longer version them or a typical situation so

you know when i say john hospital did i

you was probably be surprised if you wanna john just one often shot building

you can get the sense that do exist

might a bit indirect causation or accidental killing or something like that

because only because

if gunshot build a it probably would just said john killed

so in of are cases you don't have this you just talking about competing referring

expressions of all denote the same reference

there is no this characteristic division of the do you notational space

so what about the maximum relevance you might be thinking relation it might be thinking

these are just kind of relevance implicatures

but that doesn't really work

either "'cause" the problem is relatively more restrictive relative clauses there

can stream

the dean the

the reference a b and p to which they attach

are kind of by definition relevant

so it can be a couple if i really am manager whose higher in two

thousand and two

that relative clause

is fine

even though it doesn't give rise any pair of causal inference

so by then relation you don't have a an explanation for why you go beyond

that draw comp a causal inference in the case like a ten day

really what the feeling is that the these inferences are not

triggered by gracie in maxim violation

it's the it's are already are machinery for recognizing relevance

thank

gives rise

so the inference right

by the time you would think of in terms of triggering the maximal relevance

you've already identified the relevancy relation

it's a more automatic process

there's a number of other types of pragmatic enrichment that have been discussed in the

literature you know i'll go to just

cut on this we use quickly

you know from rice

you know it's a pretty simple picture right you would

hearer's

interpret sentences do a little work we on that in terms of fixing reference i

index tickles tends interpretation

and b ambiguity resolution

and then everything else is left to implicature other researchers have argued that there's other

types of enrichment that go beyond

what's literally said but

we wouldn't wanna call implicatures so

it's is box implicit sure and part of what constitutes a explicate your relevance this

so these are cases like lemonade i'm always true crazy

well we don't really can't even decided to value to that unless we know you

know to pray six or what

so that's called a completion

in a way of other cases like second class cases like eleven b i haven't

had breakfast

which you know that usually need ever it just means today

right so you can compare that to a sentence like

i haven't headset

which usually means ever and not today

unless you live image society of course where people typically have sex every morning but

very rarely have breakfast and then presumably the justice record of slot

so the crucial thing there's a lot to be said about of these but

crucial thing is these all constitute

developments

expansions completions to the logical form of a single

utterance

where and it again their failure trip

either the sentence is an even complete enough to assign a to a value or

it is complete

but it can represent something that the speaker would plausibly once the same as in

the breakfast example

so you have to narrow it's t d notation

elicited don't have a characteristic in all right the sentences are perfectly well formed

without

the inferences in question

they're not triggered

by any

communicative any risk of communicative failure

okay

and the and then they involving inference of

then do not the completion of a logical form but they it it's an additional

inference additional proposition so the company fired employee who's always late

and another obstacle

it was the lateness because the five

so i there's a lot this is said in terms of other types of enrichment

and but i'm not i won't

i think you get the picture so then the question is where do these a

list features come from

and i'm gonna argue that they come from

part of our contributions apparatus

that

many of you actually in this audience will be familiar with

less so for other audiences the type of presented this at

presents two

which basically it's or it's the same machinery that we used to establish

or world is coherent

right so

it's well known that we interpret when we interpret our world we go well beyond

what our perceptions give

right so

if we're working at more or something and you see this chronically tardy employee show

up late for work

and then witness a few minutes later

and getting fired

you probably draw inference

that there's a call a causal inference between the two

the feasible you know could be wrong

but you draw all these kind of inferences

anyway

but if you see a party employ articulatory employee coming late again

and couple minutes later class for what to say where is the automotive department

you don't draw causal relation between those two it's just two events that happened in

the world is perfectly coherent otherwise

so if we make these kinds of enrichment

were not as a situation

so guess

as we interpreter world

it only makes sense that we would make similar kinds of inferences

when we understand natural language descriptions

of the world

right so which is why we see the boss fired employee who came in late

again you might draw this inference

and when you see a customer s employee who came in late again with the

automotive department is

you want draw a causal inference

so many ways he's inferences of the or the most pedestrians or right there just

the kind of inferences we draw

to establish the coherence of our environment

and as a argue it's a very different kind of process

then the other kind of more value driven processes that underlie other kinds of pragmatics

enrichment

so what are these cognitive principles well

yes there will be

familiar to a lot of them you lot of you

they're the same kind of principles that underlie reestablish of establishment of coherence

in discourse between set s

in seven a the boss fired him for you came in late again its essentially

the same kind of inference that

you will get to establish an explanation coherent relation

for seven b

the boss fire the employee

it came in late again

it typically infer causal relation we've also seen by a labial expectation relations

the company fired to manage to is a long history what words same inferences if

you break it up between sentences

the company filed the manager he long history of corpora towards

we've also seen cases a better non-causal or maybe just like enable my relations like

i with very hard collocation

we employ you want to the still the but we employed want to the store

bought a bottle of scotch

for the authors part i have somebody said that to you and them

somebody later ask so where the employee get this sky

you probably say at the grocery store

not probably not notice that sentence doesn't never says

it's just an inference that you draw to connect the going to the grocery store

and the binary files got

just like you would draw for across causes the employee went to the store she

bought a wildcard

for the office party

the crucial difference how ever

is that

when you're establishing coherence between

these sentences

that's a failure driven process right language mandates that when you have sentences within the

same discourse segment you have to find some kind of relevancy relation between

less we be satisfied for discourse is like not dale's twenty be replicated adaptation so

you know the employee broke his leg

you like models

we'll probably strike you as a kind of a discourse right you don't to say

you know i just i just one two things about the employee great

move on

right now you might object and say

well wait a sec

i think that could be coherent may the employee happened upon a problem tree try

to climate to get a aplomb and so i'll broke is like

now it's hard pointed out many years ago that they are shows you

right that you are within two car in its to by the search for coherence

right you

you know is interpreted has to check this sense you want to search for coherence

between the utterances and you willing to accommodate a certain amount of a context to

do that that's totally different from twenty eight

by say that what we employ a would like one broke is laying does not

send you off on the search for coherence

it's just employee broke his leg which one o one like ones among we others

okay

same time of the machinery twenty eight feet is so tell your free try nothing

in the sentence is explicitly telling you have to search for coherence in a way

that twenty be does

so really what's happening here it's just like other kinds of pragmatic enrichment

right where the speaker is taking advantage of her here's cognitive some aspect of our

current cognitive apparatus in constructing a referring are utterances

so the case of implicature again its reasoning about you know rationality poverty the right

a five assigning grades and as soon as me about the grades in my class

and i say some students will get an eight

i'm not being cooperative

if it turned out that it every student

even though it's like able students in a actually gives students and

a you have cases like indirect speech acts right where we know like

these are all over dialogue and you have the reason about

the plan-based goals of the interlocutors

beliefs desires and intentions and all that kind of thing

it's the same kind of thing except the aspect of cognitive

here's cognitive apparatus taking advantage of this is more basic

kind of associative a reasonably kinds of reasoning that can extract the last in a

in a temporally extended convolutive

sequence

so basically we have this machinery for understanding coherence in our world we use that

for understanding coherence across utterances in

dialogue

and discourse

and then the speaker takes advantage of that into using a referring expressions within a

sentence to give rise to these inferences even though they're not mandated by anything that's

explicit in the utterance okay

so i think this is the structures are particularly difficult challenge problem

when you're building computational systems precisely because

there's yes right we build systems we think of

triggering interpretation problems we see an utterance

and we have to you know we have to interpret it we see a problem

in we have to search for reference

we see multiple sentences and we have to find a coherence relations

cases of the list of judges just nothing there

that's saying hey you have to try to search for you know every possible any

kind a causal relation that could occur between the content of any two constituents right

it's something that a rises automatically

when you have the cognitive apparatus that we

hopefully

at this point can be into solicitor's arms

important part of

extracting the for meaning out of utterances now minutes which years and to

experimental mode

with the joint work with on a roller

but in argue that

i can elicit yours is an important part of

tracking discourse meaning

and ultimately can affect

interpretation of downstream linguistic some i'm gonna do that

make that case with respect to

a particular problem pronoun interpretation

so i think it's

the safe to say

but is in a common wisdom in the field

reference for

for decades which is that there's this unified notion

of energy salience or prominence mediates between pronoun production interpretation

speakers

use pronoun to refer to salient reference

and then hearer's users think use the salience

to interpret

the reference

they're mirror images of each other

happen to be any other way

and then so then you know the pulse stress discourse terrace

it's just identify what are these different contributors to energy scaling some i put you

know

a very

i a partial list their own

in the bowl

don't and it's fifty minutes or so i'm gonna kind of this completely this is

used to of this idea

the experiment so i'm gonna describe all about implicit causality concept so

let me take a moment to tell you what those are

right these are

is or verb student very well studied in the psychology literature

and their said to impute causality to one

of there are two of an artist a tense

such a ten that

computing of causality then affect

downstream referential by a six

so if you run a little experiment

in your lab or on mechanical turk us people to complete the sentence

amanda mazes britney because she

right there it completions you have three annotators tell you what you refers to i

can tell you what's gonna happen

by enlarge the vast majority are gonna write something about amanda

we just found that amanda is amazing

and we're gonna here we

okay so those are subject biased implicit causality verbs

you can compare that to the second case amanda detest britney because she

now we're gonna hear about britney

we just for different means detestable

and we're gonna find out what those are updated by implicit causality verbs

now a couple things worth mentioning here if you run in experiment

where you don't include

but well so the but that was here usually experiments as like a linguistics literature

use the cars and of course that indicating a particular type of coherence relation an

explanation relation you're gonna hear a cause

or reason that follows

and

that's really what these strong bias a user or try to

so if you ran a study that we just adam animations britney

and let people write the next sentence a couple things will happen

one is

you'll still get the biases but they won't be a strong

because you're gonna get some other coherence relations decides explanation you're not gonna have the

same by sees if somebody tells you know what happened next or something like that

but the other interesting thing that happens

and i wrote a showed years ago

is that you will get

many more explanation relations

in an implicit causality context

then

for other kinds of content

so it should make some sense if i say amanda just has britney

what do you thinking

why

you can tell me why i need to know why provides a you know amanda

solver e

you're not thinking

wow i need to know why okay well what happened next right so they generate

god greater expectation you're gonna get a cause or reason

in an icy context and i'm foreshadowing that's gonna become important a couples slides yes

to give some background there was this study is very influential

in my thinking by rosemary stevenson and colleagues and nineteen ninety four

where they did set task completion studies vary across a different context types including the

two implicit

and they compared

what happens if you give people a pronoun prompt verses no problem

so in the first case you get my pronoun it's ambiguous between the two then

participants and you see how they assign to run

in the in the three prompt condition you find out to things

you find out who they mention next

and

what form a reference to they choose

do they use a pronoun where they use any

they found to really interesting facts one is that

when you given the problem now

you always get more references to the previous okay

then when you do

across all a context types

now the overall is might not be to the subject

might not be in an object by simplistic causality context

but you still get more to the subject

when you let them take the referring expression

the second thing that happens is that

again across all context types there is a strong production tendency when they're referring to

the previous okay

they like to use the pronoun

maybe that at each one

and when they were for to the previous non stop

they like to repeat and me

so that is computed for a little while

well of people clearly have

this production bias it says for normalize the previous subject

don't problem lies in previous object

why would you have

ever get an object

found out as and but not by simple the called out of context

in terms of the that's actually not paradoxical

at all

once you can ask the relationship

between interpretation and production in terms of bayes rule

so this term on the left

is the interpretation problem

or interpreter see the pronoun and has to figure out what the reference

the first time in the numerator is

are production is the production bias

our speaker knows what you want to refer to and has to decide whether use

of pronoun or not

bayes rule tells us that these two one your images of each other

there's another term there in the numerator

the prior

the prior probability that a particular referent is going to get mention next

regardless of the for linguistic form

other speaker chooses to do it

okay

so there's nothing paradoxical about having a production bias

that says pre-normalized the subject

much more than minimizing the object

and then interpretation by s

they close to the object

as long as the prior probability of who's gonna get mentioned next

is weighted strongly enough

towards the arc as it is interrupted by simply the called out

now

theory can comes into forms kind of the weak formant a strong for the week

form just as

we expect interpretation production to be related by bayesian principal

but we posit that the stronger form "'cause"

all the evidence that we have seen at a time

pointed to the fact that the to use the types of contextual factors the condition

the two terms in the numerator

seem to be very different

all the semantics and pragmatics stuff semantics like verbs i

implicit causality

pragmatics like coherence relations

seem to be affecting not problem interpretation directly but the prior

those are pushing you your expectations it's about who's going to get mentioned

the production via seemed much more basic based on things like grammatical role some get

a or probably more probably information structure what's the top

you know pronouns like a lot centering theory basically say hate i think i was

talking about before and still talking about it

no when you can see like this makes in extremely counterintuitive prediction

which is that the speaker in deciding whether she's gonna use a pronoun or not

is ignoring a rich set of semantic and pragmatic pisces

that's those conditioning the prior

that the interpreter is nonetheless going to bring the bear in interpreting the problem

i think very a

but despite its honest

a number of experiments have provided evidence that is in fact

the case

that's it you're is a and experiment from

and a rotors thesis the three by two

should look familiar this twenty

the three way to three waiver five

comparison

subject by a simplistic all value added biased

i see verbs

and an icy verbs

and in the from you and affiliation

three problem versus pronoun problems

so the prediction is that verb phrase verb type should affect the prior

and imagined of the effect in the prior for a cascade to affect interpretation

but that verb type

will not

affect production

right so

again in the in the three prime condition we get to measure two things we

see who they mentioned next

that's our measurements of the prior

and we see what number of reference way to get that you

they choose whether use a pronoun and so we get the production bias

and then down here we wanna given the pronoun we get direct access to their

interpretation

giving them a pronoun how to interpret the

okay so

we're predicting an affect

a verb type on both the prior and

pronoun interpretation and that's exactly what we

so you see more subject references the subject i z condition

the least in the object i c condition

and then on ice verbs or somewhere in between

and then you see that the light or light

blue bars those of the pronoun problem condition

data

are always a little higher than

the prior the dark blue bars and that's the actor production bias coming in the

production term that's tilting everything towards the subject from the baseline presented by the prior

okay so that works out

now did verb type affect

production when speakers to use pronouns verses names any answers no not at all

only thing that matters

is grammatical role lot of pronouns for subjects not a whole lot for objects

right so to put a fine point on this

right people or no more likely to use a pronoun

to refer to the direct object

in a biased implicit causality context

then in this update bias implicit causality

and then one or more likely to use a pronoun to refer to the subject

and a sub device context and a bias context there is a dissociation between production

by sees and interpretation

and a noun take the last two parts of the talk

and bring them together and one b new tiny little experiment it's a two by

two

when a very prompt i as before

and we're gonna have a model that manipulation that involves and the literature

so you compare the boss widely employed was hired in two thousand two verses of

all so far we employ was embezzling money

now most there is a condom interpretation and i pretty much all the taurus i

think

don't predict any difference and pronoun by season those two cases

the same subject the same for the same object

the relative clause is a little different

that's and introduce any new reference who cares

but are analysis the bayesian analysis does predict the difference

based on this interconnected sheen

of referential incoherence driven dependencies

so here's

gives a crucial slide

what are we expecting that

when you have

the when you have

you know at in the literature

in the relative clock so we call that you split at all

or three condition

right the relative also gives you an explanation

versus the control condition when it doesn't

i told to first that

when you have a these are all gonna be uttered by simplistic causality verbs when

you have an icy context

you're really expecting an explanation to come

we exploit the lot of a

exhalation coherence relations exact

in the explanation or c condition

we are defined explanation

it was in the relative cost

so we predict that you're gonna get fewer explanation coherence relations

after those cases then in the control condition

why give an explanation when the proper already have one

batch and then can say to affect the prior the next mentioned bias

user i've requires verbs we expect a lot if we have a lot of explanation

relations you expect a lot of

object references

but then we have you have fewer exclamation relations in the explanation or c condition

then you're gonna get fewer object mentions

because

the object biases try to there being an explanation relation

so we expect an effect on the prior

we also expect

and effect of the production by this what we seen before

in interpretation we expect to see more pronouns

referring more mentions of the previous subject when you get more prone then when you

down

i'm sorry the production by we expect people to produce more pronouns to refer to

subjects

then objects

and then when you put those two together at the bottom

both terms the prior and the likelihood term should affect interpretation

more or fewer references to the object that is more to the subject

in the exclamation rc condition

and also within the pronoun problem condition compared to the free problem condition

the crucial thing about this slide right is that

here's a little graphical model for influences on pronoun interpretation

and all the interesting stuff is on the right-hand side

all the stuff that's completely independent

a pronunciation

that

all building on the right is about predicting

the message who's going to get mention next

the most boring part of the slide is the part

over here where a pronoun comes into play

notice that this part of the a pop years possible to affect

on interpretation directly only indirectly

okay

so first predictions do we get

fewer explanations

in the when the relative clause already gives you one yes

people

do you still get some explanations but

not as monies in the control condition

people one explain why the person higher than two thousand and two got fired more

than they wanna explain

why the person who was embezzling money got fired

does that affect the next mention biasing yes

as we expected you get more mentions of the direct object

in the control condition than in the explanation or c condition

the and the existence of a causal literature in a relative clause

affect production or not

not at all

same pattern we seem before

all a matter was grammatical role

and then when you put these two things together you get expected interpretation patter

you get the existence of the literature pushes around

the prior when we so like to slide to go about those of the white

blue bars

and i map object references here so when you give people pronoun prompt

those parts go down because you get

the production by given by using everything towards subject reference so fewer object references when

you give them a pronoun

okay so

this idea that again production and interpretation

are mirror images of each other

is clearly not happening and something is kind of subtle is the existence of the

list such are way up here

you can see how often cascades to affect

several other things and ultimately down here then tweaks your by sees for how you

would interpret

quickly we can do little model comparison

you know passes completion studies don't really

rate that's highly on the sex appeal meter and cycle

but i want doing them because they give us actual fine grained

numerical

measurements for biases

and so we can use that to compare different models so again what we can

we can estimate interpretation by using our free prompt condition

we get can measure

really mentioned next that gives us the prior we get to see whether they use

the pronoun are not that gives us the production bias we can plug them into

this equation get interpretation by s

then we can compare that with the actual interpretation by s

there we

c in the pronoun prompt condition

right so we're estimating

we coming up and the estimated bias from the free from condition using this formula

in comparing it to the actual one we find in the pronoun condition

we can compare this with two kind of competing models that are out there one

and of the what i've been calling them your model

that's where in there so

what we reference

was the speaker most likely to use a pronoun to refer to

so we can calculate that by taking the production bias and normalizing

i wrote it this way

just to point out that its essentially like bayes rule set without the prior

the other model is the agenda for arnold expectancy model she said look what's happening

you greater generating expectations about who's gonna get mentioned that

and if you have a

you see a pronoun

and it matches and gender number or not i think

that tells you say that's the thing

it's the thing you're expecting get mention x

that's essentially just the prior now the priors already probability distribution soapy referent would have

sufficed

but i wrote it this way to show you that this is

basically bayes rule except without the production bias

and

when you compare the numbers basically the bayesian model when so these in the actual

column or the actual numbers we get

for article percentage of object references

in the problem i'm condition

and then you see three sets of numbers

four where we plug in the frequencies that we get in the free problem condition

into those different equations and you see

the bayesian members of predictions are actually pretty close and have a higher degree of

correlation

we expect all

the other models to have some correlation because

as i just showed you

essentially those models of being combined in the bayesian model but it's the combination of

the two that

that makes the best predictions

so to summarise this part of the talk

we see that you know pronoun temptation is

since it is very kind of subtle

coherence prevent factor where

production isn't

which

is counterintuitive but is exactly the dissociation that the bayesian model

would project

so contrary to this is that there is no unified notion of salience it's between

production interpretation

there's always in this problem in the pronoun interpretation literature right where

you know you read somewhere in the first paragraph of the paper it says you

know pronouns refer to salient reference

you say okay well what are the contributors the salience

as well that

go look at a corpus and see look identities pronoun to refer to

i two basic unit variance pronouns before to the kinds of entities that pronoun to

refer to its completely circular right so bad i have any meaning

right you're notion of salience has to be treated derived from

something that independent of choice of referential form which is what we're trying to predict

so for me i don't follow

l email to clocking here and three it's this next mention buys the prior

that's the best measurement we have for salient

right who you're expecting to get mentioned

but as we've seen pronoun vices don't know one directly

with that notion of salience

okay

so let me conclude with this a few quick slides oaks i think there are

some lessons for computational work here

ideas that i wanted to follow up on a long time adjust can ever get

a student interested enough so

i hope somebody here doesn't step

i think it's safe to say that when we've done computational work on reference

if you look over the last number of years

using a lot more progress on them on the mission the modeling side

that

the feature engineering side right

many new machine learning method

not aligned in terms of new

linguistic features right people still can be used the same three dozen or so features

gender number

distance maybe little grammatical role information that kind of thing

and for good reason because retraining these and systems unsupervised mode

you can ask people to annotate morton to three thousand pronouns

and so you can never ask questions

in your features that like is this an implicit adopted by some close to causality

you never have enough data to it to you know

to do something like that

well this

the bayesian model contest you don't need that indicate

because

you know the prior doesn't care all the semantic and pragmatic stuff

conditions the prior

and apply would you can calculate

doing so reference

for cocoa reference in general and not just for pronouns

you can go into data and have your system fine

case of the car reference that is really sure about

right repeated proper names

definite descriptions with substantial

lexical overlap

with their antecedents and pretend that human when and said that's co reference

you could get calculate millions of get millions of examples like that of the corpus

and then have a model that can has seems very fine grained features now you

might have a hundred

two hundred thousand

implicit causality verbs in there and be able to model that get some predictive power

added

all you need annotated data for is the pronoun specific part the production price and

a couple thousand pronouns is going to be plenty

to learn

that people pre-normalized

subjects the most and then less and less as you move down the oblique this

hierarchy

it was not at all obvious before that you could take

apply the factors that you learn for co reference in general using only like kinda

high probability cases the co reference and the teleport directly onto the pronoun interpretation problem

the situation is entirely analogous to bayesian models of other

kinds of things right now

machine translation in those the or in this case speech recognition right

you doing speech recognition with a bayesian model you could write well we could try

to train

a you know a model can directly that maps from acoustic signal to work

but we don't do that because

then when somebody says to

you've no idea they said t o

t o where t w well

right so we don't do that

instead we reverse it into production model given the words we predict

what's the likelihood that the speaker produce that acoustic signal

for that word

and then we can plug in the prior a language model like an n-gram model

imac and help tell us

where in that context

it is at o p w or well

the same idea right pronouns just like ambiguous words are used

underspecified signals a place strong constraints on their interpretation

but you need context a fully resolved

is it would have an efficient language should allow speakers to take advantage of

whatever aspects of or interlocutors cognitive apparatus you can get our hands on basically

for implicature that

collaboratively rationality for an indirect speech acts that plan planning and satisfying goals least designers

intentions for literatures it is more basic

aspect of our cognitive abilities that is

inferring relations have you do with

causality

com security

and the more

basic associated principles

you know when it when we know we build systems and easy the think of

language interpretation as a

as there is a reactor process right overall scheme

i need interpreter is a pronoun i need to a search

right everything happens when you see

the trigger right

on the other hand that the bayesian model

right is a more directly captures what is become

a more modern view of interpretation of

not as a reactive process but

one where interpretation is what happens

when you're top-down proactive

expectations about the ensuing message

commonly contact with the bottom-up linguistic evidence

by the by utterances

right

and so it's important i think of the case of the literature in a really

spells out the important

of doing that proactive modeling

right recognising these kinds of inferences and having that discourse update occur

so it's ready by the time you get

particular linguistic forms in the input like problem right you don't wanna wait to you

see a problem down a to run around a context and try to figure out

whether there's some of the list a true if you the

and i will

stop there thank

thanks very inspired design and i

definitely agree with a the

kind of approach to these kinds of inferences and the bayesian status great i had

couple of questions so

i guess you made a distinction about the

understands rely coherence relations versus intra-sentence and i don't think there's really a difference there

that i

i think you're sentences we're not really parallel and twenty and twenty b and

exactly the same kinds of coherence issues whether it's within one sentence or cross

to that

so that twenty seven twenty be like that if it was you know the employee

the likes plan c broke his leg that's

that's fine

and similarly the employee

likes plans and broke his leg this is just as weird as twenty p

so it's

the thing is i think issue thing so maybe

well let me

i are you are you commenting on my characterisation is intersentential versus intra sentential right

so i

i would nine i probably

there's is not a good term for that

i think it's exactly of what i want

right to compensate intra clausal purses inter-pausal

because the cases where you have been here

i'm not i'm those are still intersentential from me

and i'm but once you start saying intra clausal wow you now relative clauses the

clause and everything so

if you put

you know

a like a because in here or you know one hand or something i'd still

treat those as

intra sentential i intersentential

right we need we need to have

are coming here and

machinery come along and tell us

well i think i might have employed work is late because you like models

you don't a well

okay there must be causal relationship where it's been asserted i'm happy

no you need to establish the causal relation

right you're not happy until you

see you know

so that the crucial point is that in twenty a

right although it needs to happen to this to be explicit this is that we

can pick you know which employed we're talking about it doesn't trigger

this search well process but you know what i think i think it is that's

that is very search process that

you know the reason for this kind of free markets to identify particular employee and

that is a coherence relations

this you know identification purse or that i mean i depends on your area

coherence

but the crucial thing is you you're

you're not often running

here

this twenty we send you are running trying to figure out

how liking problems

could relate costly or otherwise right to breaking or like in a way that

twenty eight does most of the time

you know use a

this morning sense of the relative also there's no causal researcher

and it doesn't mean that where confused by all of those utterances right so the

question you know in the theory of pragmatics then is

when there is one why would you ever draw and that's

what's problematic for just about every type of enrichment that's out there

the that the triggers that day these different

if you know implicature implicit your

explicate sure for relevance theory and there's a mother's two

the comedies work in

you know local

pragmatic strengthening things like that

none of them had the triggers a need to give rise the inferences

i joint time

when doing a little bit of several works

so that you know it's well parents don't that early work on this is a

constant for

i really interesting they predict properties

it can send your last few slides out like where this is the computational approaches

is that

we need a corpus there surely the second sounding words rule

work are

and then we just the sse probabilities the implicit causality where cases where

we have all referring expressions we could

actually in there at least that's causality spend any system theory and

there i from a norse

somewhere right

so i displayed if we kind of ways

done it is part of your time and not just

i get my corpus blog stories

and i think a you know for all these lexicons nodding or any see what

happens next

and maybe i don't need to look for implicit causality groups need to adjust

wow

g you just have a lexical

probabilities with every

so i see you like that

yes that that's exactly right i mean you could

if you had enough data you could calculate

a probability

you know for every kind of or some or all four and of n participant

complex so there's no reason

you know to employers causality is a very weird

kind of concept in terms of

it really a cover term for a set of verbs that tend to have solar

by a six

there is no

deeper

definition for what implicit causality for is there are there are consistent subclasses so

experience or a stimulus for so you know annoyance surprise and you know the test

and

those kinds of verbs tend to be impose a causality but there are others it's

just you know like hit you know or things like that

have

just have strong by season with sailors and thus causality

there's a reason i think if you're gonna do modeling you know anything if you

have enough data

to just limit yourself to those kind of verbs because in fact

all verbs

have

some kind of biased you might want to account for is just gonna be more

meaningful when it's stronger one way or the other

i mean this hits on like a real problem all

in the cycle

you know the very one of the very first week

one the very first experiments we when that we random it can been that's what

we have these none of the so called reality

we had twenty of them

an hour now we forty of

and we calculated what the next mentioned by sees or for those that the prior

and only in it with

every spot

in a from between zero and one

okay you know starting from verbs it really should be considered impose causality concern so

far towards the end even though there's no real causality involved

there's ones in the middle in every point

help in

so i don't the cycle linguistics literature on column permutation

in people just make up a bunch of examples and saying

there's no pragmatic bias

i don't have any pragmatic bias

there's no such thing is a sentence it doesn't have

i'm having is

and some the verbs

i can take my three verbs it you want me to show you to a

noun setup a subject is

i'm gonna run a study with these twenty

you want me to show that there's no such that is

i'm gonna run at least one

right

now they are i'm gaming it "'cause" i know with verbs have you know

but not based on problem interplay to five is only the next mentioned by c

i can take one it has an x men a prior that eighty percent of

the object

run in a pronoun interpretation study

that pronoun is gonna pool be eighty percent to fifty percent under sail there's no

you know there's no bias it exactly what happens with transfer possession groups you know

john handle booked ability you will get fifty john a bill you don't put the

he there it's eighty five percent of bill so this is a huge problem in

the literature "'cause" nobody's and warming everybody treats the baseline like it fifty as the

baseline between subject and object

that's not the baseline

the baseline is

the prior

and two there's always confusion could people say well

pronouns are

or something biased

except when they're not i can transfer possession of verbs wonder fifty m and when

they're towards the object like a not device simplistic a value for

all that is wrong

every context

when you give a problem

it a few contribute a subject bias

over

it suggests that

the it's over the baseline

of what the next mentioned bias would have

so the it may appear there's a fifty bias but it's a

it's a strong subject bias because if you don't given the pronoun it's eighty five

percent to the up

it does that make sense

this is a long winded way of saying

yes

not only would you want to capture these by things are gonna be important for

your statistical model for all urban context i one computational systems

but also what's really important

in psycho-linguistic work

and you know we talking about this for a decade and you still just i

guess get papers review every year that you sell here's my you know

and adapted normal may happen they have to control for next mention by z having

control for coherence relations none of the stuff isn't

i'm sorry are talking too much

the just like a

with the image plane is a real injuries

i think that the weights consonants symmetrical because both used for use with this here

is model

used to model distribution for the extension to rotation nice right that one problem that's

this was more time

it is perhaps

the problem of using it probably have a museum or any reference maybe it should

probably it is different for the for the speaker so that we find no

something like that

right in which i like watching what i like to

see

this way

the speaker

if a probability distribution o is a real image rain rate

so i missed my and health

you know that's

and discussion and from which is

you guys i and i

o where there's right so that there's two notions of asymmetry here

so than the but the one i was talking about was really there

the production and interpolation are really based on different factor so

i'm saying that even know the speaker could have a model of

the hearer's prior that is not coming endeavour decision to produce a problem i'm saying

that at the at that i that symmetry is not there

now

you're pointing out also that

be here

doesn't have direct access to the speaker's production bias is a he has to estimate

her production by season put that into his interpretation equation

and that could be off to

but

when we have these mix that's where we're not tracking each other right you each

other's reference

it could be due to either of those asymmetries it could be you know i'm

not tracking the discourse right on the speaker's perspective

work could be that the speaker she's

i'm being little of thinking the discourse is going in one direction

and she's taking in other directions is producing

pronouns

based on you

and has a positive and i get i get messed up because

i'm not tracking

the prior right and she's not even using the prior to produce or problem to

begin

so what it could be either

so we take that it is in the next

Conversational Eliciture in a Bayesian Model of Language Interpretation

Keynotes

Andrew Kehler