hello everybody again and welcome to class tokenization

i would like to say

talk about the data driven model of explanation for chat about that helps to practise

conversation in a foreign language

this work has been done known as i was a at the university of maximum

that's why see here to those but

and no one with the different this situation

this is a different kind of data driven it differs a lot from those data

driven approaches that

the first keynote speaker at this conference presented a novel but there's still

level c l what we can do with the data

besides that was the statistical one is just approaches

but first let me

outline what

tends to happen

in the next twenty minutes at first i would like to give you a little

bit more background about to start itself the nets with was written in the people

are just that the it it's a extra lights a premium service for the participants

of the conference

and then i would just there

explain what y dot and the weighted that this way

i will present the data a more of a piece of the data and

just explain all the empirical findings and then we will go to the maybe more

interesting part

for you to the computational modelling to a all the race generalization of these empirical

findings and to the

to the case study psychology implementation case study i will explain why it's like a

and

then we will finish what the overview of the results of the huge to the

big battle field

where it was a time

is started with the

artificial companions would be ideal to the machine that interacts with language learners

just you know as it is an artificial for and to have a kind of

france in your instant messenger

it was two thousand eleven it was before the chequebook around

and then you just and this check whether the context into account at least and

just

right it's start talking

practise for language and this

in this and

the way

but then later i found out okay the wrong on the people what to simplify

things and they work in the area

cold computers i intelligent computer assisted language learning and so these two things are somehow

overlap on the intersection between those fields and

we you can imagine how many people from different disciplines already were very natural language

processing for language teaching

second language acquisition computer science

journal corpus research computational linguistics in general

and don't

on the other hand they so many publications in conversation analysis

which exactly focus on the learner

interactions between language learners one that non-native speakers nineteen speakers

for between only two speakers

and no idea you just look what the query

at one or

one and conversation analysis these buttons to than they require domain we see what

can what within

okay model

because

i had initially this idea of

having a machine that

i don't i

behaves like a language experts in the channel buttons it is not a teacher

because i

do you have a clue you can do about what is not exactly i was

not in table two

so top of what we loaded experiment for data collection because it didn't have any

idea

about

what exactly this person these operators there was a lot that's to me to behave

like a language expert in an informal chat

and i for like the dataset

examples of future work

and text

you can take it for free it's on the language resource a repository it's in

germany

a dataset of truman evident needed only two speaker conversations it's seventy two dialogues it's

about

for now that wasn't

turns

and

that was my treasure

so i to this data and

a lot okay what is the

i met that's of conversation analysis because i didn't have anybody this all

what to look for and that's what they call unmotivated looking

it just look at a guy without any idea what

will you will find

and then you may collections of

interesting sequences of typical

sequencers and then you try to generalize describe prototypical structure of this

sequences

and then

is a computer scientist

i then looked at these prototypes and transform them into grammars and roles

and

sometimes it was even possible to do very simple machine learning

and then i set up this implementation case that i is a call that case

study because you can take a dialogue system

any complexity but i two the simplest one

i took and ai ml-based chat but

that and based language understanding

and so how far can go

just to give an overview of what have found

there are different and directional practice

of how

participants of an interaction can orient to the air

linguistic identities all

language learners or language experts in chat

it includes a different forms of face working of negation where language learners six q

is made matrix you made excuses for they are insufficient knowledge for errors timit health

assessment but that was not real self assessment it was on the very beginning of

the interaction that was more like

you know fishing for compliments

or

they got brace for excellent language learners for their

talk one

during the during this data collection and then what you are far different types like

me to talk about language lorna learning and collaborative learning the people

practice

like in the role playing

i data x m situations for instance or

they compared grammatical systems of their native language used

so it was talk about the language

and then we have this

very prominent type of

a positioning

hum is not obvious are expecting some bins and somebody writes

a different kind of creepy a in this case it was rupiah would linguistic troubles

of still that this also problems

in all grace repair sequences was

because what was what's due to

insufficient knowledge of the foreign language

and the focus of this talk is marked the rat

both their explanations upon request is only one type of text while i'm

one subtype of this one type of all the

possible

incarnations of a language expert

and this is

this is the research objective of this paper so i wanted to create computational models

of interactional practice

where only two speakers of truman in chat the of what troubles in comprehension in

a chat but conversation for learning would native speakers

why conversation for learning because it was an informal chat but it wasn't this

yes the bit the participants met because of the they are status of native and

non-native speakers of the with rubber but together because they have these different statuses that's

why it was a conversation for learning it was not just the naughty a conversation

in this sense

why is that challenging i said in the beginning

i had forty five thousand about forty five thousand turns

and maybe you remember all of elements that in the in his you know talk

a eleven reappears

that there are challenging for speech recognition or core approximately every two and half torrance

i had only thirty

so i can i can forget all the machine learning

and

ideally an example of

these

what i five sequences

so that the data original data are in german let there are a translations

he did not need not

non-native speaker has the difficulty to understand or not the i do magic expression and

how can

request

a clarification how this clarification is formatted it's just repeat

all these

probably might think it's not what

there is no

did you mean how what is a it's just a repetition in the question mark

off the dock

and

this is only one

format of a repair initiation but there are many others

and then

after to really be initiation

the unknown speaker

provides the explanation so we it carries out the repair that the but this is

the prototypical structure

of repair sequence we have what troubles source

which can be everything

it never know what will corset

problem and in comprehension then there is a rip initiation which can theoretically your occur

everywhere even have to silence it has been shown already

and then it can be followed by a repair carry out but it doesn't have

to

and

okay the empirical part

would be

finished in this place

what the what i found was

questioning is the praxis but it was not really my finding i just conform to

what has been found before for oral interaction but it

what the same in chat

and

the right different

devices

specific the in the interaction resources

that we have a unit chat to signal that we have trouble

and there are also a specific interaction of resources

that well we half an hour these pet a disposal

two point to the trouble source also every pair initiation contains

kind of signal and the kind of

reference to the trouble source

only repeat initiations the time talking about l corresponding

to the second position

repair initiations

so it's the first structurally defined place where the other speaker can initiate but they

can still

immediate or delay because it and this is because of the of the specific

structure of chat because we can

just you know have mount multiple threads or

in certain things in between and but

that is they steal the su of all the same type of second position

and

but some of them come directly after trouble source or and some of them a

little bit later

and the this has an influence on the resources that need to be

employed for the area a pointing to this trouble source

then

i am used on this example

there was a repeat as a as a as a as a reference to the

trouble source used

but their own

because we have to deal with non-native speakers

but cannot say that

only

as syntactic i syntactic you can be repeated

i it regardless of unit boundaries so a piece of

trouble source to an can be copied and pasted

so we i do you cannot rely on the on the completeness of the second

syntactic structure

and then

what is very common for all interaction

but it's

i can find it in chat

that when you didn't understand something completely i just acoustically but because it's difficult to

follow the overall talk native speakers of mandarin native speaker sometimes

and

then i is the repair

and just the representation of the troubles source

is it is okay is acceptable you don't find it in chat case you can

just really the everything

but still i have was surprised at

some people really

i read it in the wrong way but it becomes usable not through the rip

in each iteration

but there through

i don't things where people try to repeat so that every time the that things

and you see from these retyping that they we applied

role labeling

and there are there are also

things that are typical for money non-native speakers

and if we have very much from the native speaker talk it's to the design

of the repair itself so it's

it's more about the sense of the word that it explains the meaning of the

word order the meaning of the of the of the use this

yes semantic unit

and their it's less it's less about it

something like functional or a foreign

the intention or something like that it not an intention but with the meaning of

the word was

repeat or explained

for their repair

carry out

the of the

participants you was used a different direction results again

like it just looking synonym so paraphrases

but sometimes they also just

you know use google translate

and translated everything in the native language of the l two learners

not to be added one going out to be funny or something they translated that

really with machine translation and that not explanation

and

or they just the arm

because it was difficult explain some of the phenomenon a like what is that what

is a

lapsed we it was difficult explain than words and they just

pasted linked one example

and then it was clear somehow and

again to the same as a rip initiation survey carry out can be delayed or

immediate but the same reasons

and we have a distinct is

so that it was type of repeat

very pi carry out here is a

and i so i called it's speech reap here if

l where

utterance is unclear

or a longer part of a longer utterance is unclear the and not

every word is explained somehow but

only something that is supposed to be difficult

so with that it is clear that didn't

units in each difficult unit is explained but not everything is rephrased all par for

a store

elaborated somehow

so what we need to know for the chat what's your

and first

what does the chat but

need to be able to

do the same joke was a native speaker do you hear the first to chatbot

needs to recognise we can initiate and then detect what is to extract a trouble

source and then generate a repair proper because you cannot predict

what it will be you cannot just used

scripts for ep is forty packing it needs to be generated from what linguistic database

maybe

and is what i've done so why i just used

dictionary

as the linguistic resources and a field templates with the knowledge from the dictionary

and the interactional resources at which my machine looked where

all these signals that are found in any

corpus and with question marks dishes and

a quotation marks and then lexical and things like unclear or i don't understand

the directional resources not allowed to print the trouble sources include repeats but also just

the adjacent addition because lp initiation may consist of only three question marks and then

only the position of this trip initiation points to the trouble source target it's exactly

the previous turn so these

but for instance this type of

pointing cannot be used in the delay position

for the implementation case study that said i used

and i ml based chat about it was

the program d its name a limb interpreter for german and their use the as

a baseline this german the emails that

we take standard by several categories allow that element is to render the

rip here

carry out

based on the island imaginary

now let's and i added to processors the processors in the in problems

process different tasks and i added to different tax that the law to do with

three pairs that was down explanation and meaning tag why this three because

we have a

two different types of questions

that there are kind of baseline questions to which all the rip initiations can be

mapped it's

apple are questions requiring a yes or no

hence there were it's a content question and out that requires an explanation like synonyms

of paraphrases

which translation and then

i need to distinguish between

two of down

i automatically and that's why all the all the request were mapped only two

to this functions and there's white

i had only these two processors

what does that mean for the linguistic knowledge that we need for not

it to recognize repair initiations it might be sufficient just to have this pattern based

language understanding

and

and determine formats that o can be used to initially a creepy a

can be described as patterns

but

we have still real related nlp problem sets are really hard for either princess referring

expression generations because our pointers to the to the trouble source

are referring expressions

but only the domain is a different one we have don't have the whole conversation

only in this a local rupiah domain low in the local bps sequences what we

need your

and in contrast to

to the other two d or overall problem of they're referring expression referring expression generation

there we are normally nouns and

pronouns i seen as the main the main results for that here we can see

also

entire sentence or sentences or phrases or works because a repetition of labor

points to the trouble source of them were

and

then for the

repair carry out

we can use

as a set their definitions paraphrasing synonyms translations and demonstrations and you know probably that

paraphrasing is a hard problem

synonyms is hard problem

finding it automatically

it's also hard to say if the if you're

to in a

confirmation in a in

i mean exact situation but to use it things are expressed

that's this one mean the same as this one it's hard to say

yes or no just

without specific resources

but not worse

low numbers

is not the only challenge other challenges contingency

so

utterances form as rip initiations can have also different accents on their functions like jokes

or error correction

or rejection of surprise are many others

and that's why

it remains still challenging because i don't have a solution

and

it is so i have i don't only one minute but maybe the time is

over again

so i i'm just i just a finishing we have different

results forget regarding the complexity of rip initiations and their repair carry out i compared

with literature that i us in before with work well by david

that's line and

work from conversation analysis like documents and it by the way we are this for

like described

rip initiation formants

across languages and their own

i think that it's quite

language-independent

and that's why

for me to the most of the most

and

positive outcome of this work was that they can use this model

first the cover other languages and second to cover other domains because definition talk works

in the same way in engineering and model and in every other domains what i

need to explain something

and then

a so i can go beyond duty cycle

application case

just to zooming out ic not

conversational this

method helps understand what's going on in human interaction and help to

ground

our conditional models and them into built on a

but we need datasets good data set of

good quality is really large

but of a specific quality

not

we take a to speech sixteen systems that we want to simulate in the and

so we i want to simulate a

dialogue between line i learner and an artificial friend i want to see first how

it works in a similar thing i cannot take

an interview for that

as an days

and the

maybe we can have

just simple chat bots is amenable waibel product in this case but

if you want to cover everything it's but it becomes

very quickly and a complete on we need all the end of knowledge that

well that the that people had produced you know

to

cover all the phenomena that interest

okay we have the two and half minutes for questions

so i'm also interested in computer mediated human interaction

and i wonder if did you serve in these interactions some kind of the interleaving

of comments

"'cause" i imagine that would be the problem with two humans having a conversation over

messenger rather than a human robot because they're we would be more interleaving

in like the manner that people do in

spoken conversation

a about fitting

how much interleaving is there between the utterances of your computer mediated dialogs and rt

similar to spoken

it was between can be that can be eliminated in spoken

is there a lot of interleaving of

i didn't compare datasets i only compared what i found to go to define it

will findings that are described in literature

okay

and the

the right

there are things that i the same

like

formats all replay initiations

some of them are the same as an oral interaction

but

the because we have different directional resources available in chat

we don't have the prosody for instance we don't have data the phase we don't

have the voice

i am that they are they are somehow replaced internet by other things like a

motion a multi consider and instead of laugh

or

when determining when you want to twenty participants wanted to emphasise something

they made uppercase a word stretches

or i had

one example

the data collection that took place in two thousand twelve a what is it

european some cocoa

football cup and the at this time and sometimes participants that just typed at the

same time or in front of their t

and watched again

incremented

and that's how i don't know the word german work goal for what

sixty two holes in the high

and this is really what you say well what the what a reply data and

then to relative to these oral while and when they screen

and so it's

i would say

there are the same things but the expressed by different directional resources

that's the first thing and the other thing is some of the things

cannot be replaced because they become

irrelevant because

but don't we don't have the voice for instance because that's why i didn't find

any repair initiation that require the repetition after that because it's not necessary you can

read everything but these are two differences that i would describe

okay so what to do this

one of things you just informed about the database is that it's montague that you

mentioned supervector doodle for one straight or something but effective some just curious with mobility

longitude minimum assessment luminance

perhaps not increase the learning used to do something like that because but also potentially

useful project work because you want so what kind of increasing importance density distribution of

the material is thanks so that evaluating was not the focus that just adding but

that

whistle normally when you talk about a talk about learning or at least

i'll with this is second language acquisition theory to your in the background

well normally people look at error corrections as a sign for learning

or any kind of a meeting negotiation a call it may negotiations all these repair

sequences that it or are we explain to date technical it meaning negotiations you know

what

and then

this may be costly also obvious

normally only these two things are an online but

i so also

the learning for all

i'm sorry at

i forgot the word

in this

but they wouldn't but there is just the

the null something or didn't use a structure and then

based on example of that and repeated that without any rate wer so that you

want to say that but

you know but

not observation but

making likely making a native speakers

and then and then no i found also that

they learn from implicit corrections which are really hard to capture which are normally not

use the bathroom research

or not the not no not the that they are not use the use of

the wrong word but normally people don't pay attention to that because it's not evident

enough it's not

a node in there is no evidence that people butlers notice these corrections

but i have evidence that

in the data

because they've repeated things that have been corrected through implicit embedded corrections later in later

sections for instance then repeated that's an incorrect wait for it

it's more than just

i'm afraid that drifted a little different direction

anyway changes over time so that the why i explain this thing with artificial companions

in the beginning i guess that posterior have these artificial friendship knows the user and

userspreferences and everything and

and that's why i set up to study the data collection in this way that's

why i'm talking about a specific speech actually systems every participant of the study was

put out every load it was would wherein appear within a speaker and they directed

in pairs for a longer time

and i wanted to see the development

and i can say

the development in learning was not only because they interact longer but because some of

them engage in these corrections and in this evident obvious selling sequences in the beginning

and that's why it developed somehow more intensively later and in either appears it was

not relevant

they just

so i don't have a so we can continue offline five minutes once it uses

this isn't the speaker