0:00:16hi everybody
0:00:18so
0:00:18creating in characterizing the diverse corpus of sarcasm the dialogue
0:00:24they want to start by explaining why we study sarcasm
0:00:28and then the need for a large-scale corpus of sarcasm
0:00:32different examples of sarcasm in the wild
0:00:35followed by how we build our corpus some experimental results and linguistic analysis and then
0:00:40conclusions
0:00:42so why study sarcasm
0:00:44well it's as we all kind of no it's creative complex and diverse here are
0:00:49some examples
0:00:50things like this or missing the point
0:00:53i love it when you bash people for stating opinions and no facts then you
0:00:56turn around to do the same thing
0:01:00and even more complex my pyramidal tinfoil hat is an antenna for knowledge and truth
0:01:05it reflects idiocy and this into deep space
0:01:08as we can see
0:01:10it's very creative it's very diverse
0:01:12and
0:01:13it gets more and more ambiguous in complex
0:01:15very long tell problem
0:01:19so further motivation is it's very prevalent so estimated around ten percent in debate forums
0:01:26dialogue which is kind of our domain of interest
0:01:30and this sort of dialogue is very different from traditional mediums like independent tweets or
0:01:34reviews for products things like that
0:01:38so it's very interesting to our group
0:01:41also part of the motivation is that things like sentiment analysis systems are supported by
0:01:45misleading sarcastic postal people
0:01:47being sarcastic thinking something is really great about their product and then it's very misleading
0:01:53also for question answer systems it's important to know when things are not sarcastic to
0:01:57use that it's good data right so it's also important to differentiate between
0:02:01the classes sometimes you wanna look at the not sarcastic post sometimes you care about
0:02:05the sarcastic once
0:02:07so some examples of sarcasm the wild
0:02:12so sarcasm is clearly not a unitary phenomenon gives into thousands developed a taxonomy of
0:02:19five different categories of sarcasm on conversations between friends
0:02:23so you talks about sarcasm as speaking positively to convey negative intent
0:02:28this is kind of a generally accepted way
0:02:31to define sarcasm
0:02:33but you also defines different categories where sarcasm is probably things like rhetorical questions so
0:02:38somebody asking a question implying a humorous are critical assertion
0:02:42things like hyperbole expressing a non-literal meeting by exaggeration
0:02:46on the other side of the scale understatement so under playing the reality of a
0:02:50situation
0:02:52and jock hilarity so humouring teasing humours weights
0:02:56so this is a little bit more fine grained
0:02:59as a taxonomy for sarcasm
0:03:03and it's kind of
0:03:04accepted that people use the term sarcasm to meet all of these things as like
0:03:07a big rollback for anything that could be sarcastic
0:03:11but the okay theoretical models side that there is often a contrast between what is
0:03:16said
0:03:17and a literal description of the actual situation
0:03:19so that's a very common thing that characterizes much of sarcasm in different domains
0:03:25so no previous work has really operationalize these different categories that it gives is defined
0:03:30gives an other work people have defined
0:03:34so that kind of the focus of our corpus building
0:03:37so we explore in great detail rhetorical questions and hyperbole us to very prevalent
0:03:44subcategories of sarcasm in our online debate
0:03:46every probably in our debate forums and they can be used in fact sarcastically or
0:03:50it not sarcastically sounds interesting binary
0:03:53classification question
0:03:55so to kind of showcase why that's true here are examples of rhetorical questions answers
0:04:02that in the top row is used sarcastically in the bottom row not sarcastically
0:04:06so
0:04:07something like then what you call politician who ran such measures liberal
0:04:11yes it's "'cause" you're public and you're a conservative at all
0:04:15what without proof we would certainly show that it animal adapted to blah more of
0:04:18like an informative sort of thing
0:04:20so rhetorical questions exist in both categories
0:04:25similarly for hyperbole
0:04:27something like thank you from
0:04:29making my point better that i never do
0:04:31or again i'm astonished by the fact that you think i will do this
0:04:35so there's kind of different ways that you can use these categories in both sarcastic
0:04:39or not sarcastic
0:04:42with sarcastic or not sarcastic intent
0:04:46so kind of going into why do we need a large scale
0:04:49scale corpus of sarcasm
0:04:52first of all like i tried creativity and diversity make it difficult to model generalizations
0:04:58and subjectivity makes it very difficult to get high agreement annotation and we see that
0:05:02from lots of previous work on sarcasm
0:05:04people often use hash like sarcasm or use you know positive or negative
0:05:11a sentiment in different mediums to try to
0:05:15highlight where sarcasm exists
0:05:17because it's very difficult to get high agreement annotations
0:05:20and these annotations are costly and they require kind of expert workers
0:05:25so for example in and out of the blue context something like got your sosa
0:05:29think simple found i think i love you it's hard to tell if that's really
0:05:33sarcastic right
0:05:34out of the blue
0:05:36something like humans are nominal mammal that the fact it you just this in the
0:05:40real schools
0:05:41very subtle we don't know right
0:05:44so it's pretty hard to ask people to do this sort of annotations you have
0:05:48to be a little bit clever about it that kind of what we try to
0:05:51do
0:05:52so we need a way to get more labeled data and the short-term to study
0:05:56sarcasm
0:05:57to allow for better linguistic generalisations
0:06:00more powerful classifiers in the long term not kind of the promise of our corpus
0:06:04building stage
0:06:06how do we do it
0:06:08so we do bootstrapping
0:06:10we begin by replicating looking and walker's bootstrapping setup from twenty thirteen
0:06:14and the idea behind this is that
0:06:17you begin with a small set of annotated sarcastic and not sarcastic post
0:06:21and use some kind of the linguistic pattern extractor to find
0:06:26cues that you think are highly
0:06:28precise indicators of sarcasm and not sarcasm in the data
0:06:32once you have these sorts of cues you can go out against huge sets of
0:06:35an annotated data look for those cues
0:06:37and anything that matches we're gonna call the bootstrap data
0:06:41drop it back in the original annotated data and then kind of iteratively expand your
0:06:45data set that way
0:06:47that's kind of the premise that we use
0:06:48well i really the crux of this is that
0:06:51did you could bootstrapping we need this
0:06:53portion right here
0:06:55which requires the high-precision linguistic patterns to be really good we need really good high
0:06:59precision patterns so we try to get them using or
0:07:04using the linguistic patterns are out of slot t s
0:07:08so others log on the well by relevant ninety six is a weakly supervised pattern
0:07:11learner
0:07:13and we use it's extract lexical syntactic patterns highly associated with both sarcastic and not
0:07:18sarcastic utterances
0:07:19so the way that works is that it has a bunch of patterns templates that
0:07:23are defined so things like
0:07:25some sort of a subject followed by a passive verb phrase et cetera
0:07:29and it uses these patterns to then find instantiations in the text and then brings
0:07:33these different instantiations based on probability of occurrence in a certain class and frequency of
0:07:38occurrence
0:07:39so something like if you had the sentence in your data there are millions of
0:07:43people saying all sorts of stupid things about the president
0:07:46and you know run out of soggy we match this
0:07:49for example would match this but noun phrase proposition
0:07:53noun phrase pattern
0:07:55millions of people
0:07:57and then if this pattern was very frequent and highly
0:08:00probably occurring in sarcasm and then you would float up to the top of are
0:08:03ranked list
0:08:06so we do this
0:08:09and give each extraction pattern of frequency data off in a probability therapy
0:08:13and we classify post as belonging to a class it has at least and of
0:08:17those patterns existing
0:08:20so the first round that we observe
0:08:22looking at the small sample data
0:08:24is
0:08:25so here's some examples so something like say about your head get over a current
0:08:30sarcastic posts
0:08:31with these frequencies and probabilities of association
0:08:34and things like
0:08:35natural signal selection big thing area of our probabilities not sarcastic post
0:08:40and just to kind of sparse
0:08:43we find that the not sarcastic class contains
0:08:45a lot of very technical jargon scientific language topic specific things
0:08:49and then we can get
0:08:50high precision when classifying post based on just these templates
0:08:55it's up to about eighty percent
0:08:57whereas the sarcastic classes you can see are much more varied not high precision
0:09:01thirty percent
0:09:03and so it's difficult and you bootstrapping
0:09:06on data where the precision of these patterns is relatively low
0:09:11so
0:09:13we decided to make use of this high precision not sarcastic set of patterns that
0:09:17we can collect
0:09:18so actually expand our data trying to find post that would be good to get
0:09:23annotated
0:09:24that we think would have a higher probability than ten percent of being sarcastic and
0:09:28based on that original metric from a sample of to be forms data
0:09:32so using a pull of thirty k we filter out tools that we think contain
0:09:36not sarcastic post
0:09:38so pos that containing a any of those not sarcastic patterns that we identified
0:09:43and we end up with about eleven k posts that we believe have higher likelihood
0:09:47of being sarcastic and we put those out for annotation on mechanical turk
0:09:51and the way the kind of annotation task looks as they get a definition of
0:09:54sarcasm and an example of responses that contain sarcasm
0:09:58and don't contain sarcasm
0:10:00and then we show them a quote response pair so this is like a dialogic
0:10:03pair where we have a dialogic parent and the response and we asked them to
0:10:07identify sarcasm in the response
0:10:10so that's what are annotators are seeing
0:10:14and then using this method were able to skew the distribution of sarcasm to from
0:10:18ten percent up to thirty one percent
0:10:21so kind of getting annotated that pair that poll of eleven k
0:10:26depending on where we set our agreement threshold where he was askew this distribution quite
0:10:30high
0:10:31so here from nineteen to twenty three percent using this relatively can sort of conservative
0:10:35threshold of six out of nine annotators agreeing
0:10:38that posttest sarcastic
0:10:40we kind of since it so subjective and diverse we wanna make sure that or
0:10:44annotations are
0:10:45clean
0:10:46so that's why use a relatively high threshold
0:10:52so having more data
0:10:54means we're we can do better at the boot-strapping task but we still need we
0:10:58still observe some of the same trends
0:11:00so highly-precise not sarcastic patterns less precise sarcastic
0:11:05but and were still not quite at the point we wanted to be a propose
0:11:08trapping
0:11:09so
0:11:10kind of given up
0:11:12the diversity of the data we decide to revisit that
0:11:15categorization i talked about earlier
0:11:18so sarcasm rhetorical questions hyperbole understatement regularity
0:11:23so we make this observation that somebody's lexical syntactic cues are frequently used sarcastically
0:11:30so for example
0:11:31i
0:11:32well
0:11:32let's all copper that great argument revolution as well
0:11:35well
0:11:37the what's your plan how to
0:11:40how to realistic my friend
0:11:43interesting someone hijacked your account role
0:11:46central
0:11:47so pretty funny and really a combination of words expel an arm mean to expel
0:11:53arms louse use the creative genius
0:11:55so kind of these different terms that are pretty probably in the terms like this
0:12:00are pretty probably in sarcastic post and we try to make use of this observation
0:12:05in our data collection
0:12:08so the way we do that is
0:12:12we develop projects a search for different patterns in or an annotated data
0:12:16so we get annotations for different things that we think are quite probably the data
0:12:20things like well
0:12:22and things like
0:12:24all the all of these ones pretty much fantastic et cetera
0:12:28and we find that were able to get again distributions that are much higher than
0:12:31ten percent searching for post that only contains a single cues so it's interesting to
0:12:37note that just a single q have such a well large distribution of sarcasm so
0:12:41something like well
0:12:42used forty four percent of the time about
0:12:44about something post
0:12:46so using these observations we begin constructing are sub corpora
0:12:51one for rhetorical questions and one for hyperbole
0:12:55and the way we gather more data for this is that we observe that their
0:12:58use both sarcastically and not sarcastically for argumentation
0:13:02and we use this middle of posts heuristic to estimate whether a post is whether
0:13:07questions actually use rhetorically or not
0:13:09i'm so one a speaker
0:13:11ask the question then continues on with their turn their not giving it a chance
0:13:15whether
0:13:17the listener at actually respond and so it's a question at least that doesn't require
0:13:21at answering from someone else
0:13:23in the view of the writer
0:13:25so we do a little pilot annotation find that seventy five percent of these rhetorical
0:13:29questions that we gather in this way are in fact used
0:13:34artifact annotated to be
0:13:35rhetorical
0:13:36and we do annotations of these new post ending up with eight hundred fifty one
0:13:40post per class so something like do you wish to not have a logical to
0:13:44be
0:13:45already then god bless you anyway
0:13:47proof that you can't prove that i got
0:13:49and given anything but in salt et cetera so these things where someone this is
0:13:53the same post some was asking questions going on with their turn
0:13:58the second subcorpus we look at is hyperbole so hyperbole exaggerated situation we use intensive
0:14:03fires to capture these sorts of instances and we can get more annotations so calls
0:14:08in an o'brien side this sort of situational scale this contrast in fact i was
0:14:13talking about earlier
0:14:14so hyperbole can shift utterances across the scale so chipped something into extremely positive i
0:14:21don't way from literal and also into strictly negative and away from literal and so
0:14:27intensify or is kind of sort of this purpose
0:14:28so something like wow i'm so amazed by your comeback skills
0:14:33do you go on "'em" so impressed by or intellectual argument
0:14:35things like that
0:14:38so the statistics for a final corpus we get around six thousand
0:14:43five hundred post for are generic sarcasm corpus
0:14:46and then rhetorical questions and hyperbole with this distribution and more information on the dataset
0:14:51is available there
0:14:53it's in the paper
0:14:55so it's kind of validate the quality of our corpus
0:15:01we do simple experiments using very simple features bag of words about features
0:15:06noting previous work has achieved about seventy percent with more complex features
0:15:11and we end up with distributions that are higher than that so we get we
0:15:16do this kind of
0:15:18segmented
0:15:20segmented set of experiments where we test at different dataset sizes
0:15:23and we see that are f-measure is continue to increase above our peak right now
0:15:27seventy four with these simple features
0:15:29so that warrants you know expanding our dataset even more
0:15:33also we do again r weakly supervised experiments with other slot its just see what
0:15:38sorts of precisions we can get now for bootstrapping
0:15:41and we see much higher precision is that we were getting before at reasonable because
0:15:45for bootstrapping so that's good use as well
0:15:47so now we could expander method to be weakly supervised and gather more data more
0:15:51quickly
0:15:53and this is the numbers of new patterns that we learned so patterns that we
0:15:56never searched for in the original data
0:15:59so we're get we're learning a lot of new patterns that we didn't originally search
0:16:02for
0:16:03for all of the datasets
0:16:05and then some linguistic analysis quickly
0:16:09so we aim to characterize the differences between our datasets so again user some others
0:16:14love instantiations still and are generic data we see these
0:16:20creative sorts of different instantiations were sarcastic posts whereas again the not sarcastic pose that
0:16:26these highly technically
0:16:28technical jargon sort of terminologies
0:16:31for the rhetorical questions we observe a lot of the same properties for the not
0:16:36sarcastic class
0:16:38but for the start has the class we observe that
0:16:40there's a lot of attack on basic human abilities right on these debate forum dialogue
0:16:44some people say things like can you read it can you write
0:16:46do you understand
0:16:48so we kind of went through looking at some of the dependency parses on these
0:16:52sorts of questions
0:16:53i'm just found a lot of things that really relate to basic human ability so
0:16:56people are attacking people
0:16:58not really attacking their argument that's very probably on rgb boards data
0:17:03and finally for probably we find that the adjective an adverb patterns are really common
0:17:09even though we don't search for these of originally in our metrics experiments
0:17:13so
0:17:14when there and things like contrast by exclusion used
0:17:24samples of hyperbole are really interesting that we pick up
0:17:29so in conclusion we develop a large-scale highly reliable corpus of sarcasm we reduced annotation
0:17:35cost and effort by skewing the distribution of waiting having to annotate huge boobs of
0:17:39data
0:17:40and we operationalize lexical syntactic cues for rhetorical questions and hyperbole
0:17:45and verify the quality of our corpus empirically qualitatively
0:17:49for future directions you wanna do more feature engineering more model selection based on our
0:17:54linguistic observations
0:17:56develop more generalisable models of different categories of sarcasm that we haven't looked at
0:18:01and explore characteristics of our lower agreement data see if there's anything interesting there as
0:18:05well
0:18:06thanks
0:18:15questions
0:18:35so first of all so we have we began with not looking at those categories
0:18:39right so we start with this really generic sarcasm so definitely there it's kind of
0:18:44it so long tail right so there's a lotta different exaggerations
0:18:48definitely the problem
0:18:49we began initially talked leaving was just sarcasm a general but it kind of interesting
0:18:55to get into the more refined categories and look at how those are different and
0:19:00yes there's also different sorts of things that we could look at the understatement is
0:19:06quite prevalent as well
0:19:07so it's it doesn't only existing to be formed it just quite pronounced in the
0:19:11form so
0:19:12good to look at their
0:19:27right so the question is the question is about the words of x features so
0:19:31do we train them
0:19:34we train the word back model on our corpus are to be use existing model
0:19:37so we don't both on these results that are reporting are actually on the google
0:19:40news trained actors which is kind of
0:19:43it correlates with our with our data as well it
0:19:45the debate forums
0:19:47we have used our own trained model it today perform as well as this probably
0:19:51because that the smaller amount of data
0:19:53and the google news is trained on a huge amount of data so that definitely
0:19:55worth exploring in the future as well
0:20:12right so actually i didn't mention the numbers here are there's more detail in our
0:20:17in our paper but are level of agreement were about seventy percent for each of
0:20:22the for each of the tasks and they were actually better for the smaller tasks
0:20:26where what in generic sarcasm is a little bit are more constraint
0:20:29i think it's
0:20:31no that's actually agreement with the majority label so just
0:20:36and
0:20:37so is actually better for the sub categories in fact then the and the generics
0:20:42are can talk it's pretty hard to
0:20:45to get high agreement annotations rhetorical
0:20:52so i was wondering about the idea of twelve contrast try so you set
0:20:59these somewhere that highlights the fact that the entire time and it is some contrast
0:21:02between let us thing and what you think that element and so i guess
0:21:06that
0:21:08and also this idea that is that the t seven a meaning that is non
0:21:12leader right yes so i was thinking about the possible connection with method for and
0:21:18with the task of metaphor detection right and so here you are focusing on trying
0:21:23to find patterns that can act the rights a constant
0:21:27but for instance in some working metaphor detection the goal is to
0:21:31to
0:21:32to capture contrast rice to what makes a particular use different from the little use
0:21:37so by looking at how the sarcastic intended indicate actually be far from the regular
0:21:44used by was wondering into the so it's a very open question i was wondering
0:21:47that they have you had thought about the task in
0:21:51in this their arms
0:21:53that's really interesting so looking at kind of
0:21:57maybe trying to measure how far away tonic sort of a contrast scale that would
0:22:01definitely be interesting we haven't
0:22:02do not explicitly but i mean
0:22:04like the different intensified can have different affect so it's kind of
0:22:08trying to map it across the scale
0:22:13other questions
0:22:19a question
0:22:21it when you're doing the mining of the data
0:22:24and you're identifying different
0:22:27phrases that removes some more socially with sarcasm and non sarcasm
0:22:32did you do things to make sure that the dataset was not biased you know
0:22:37for "'cause" it utilizing portals kind of phrases
0:22:40so that if they don't later someone wanted to build an automated system to detect
0:22:44sarcasm an hour and sarcasm they would just
0:22:47reader paper and they are gonna go after these phrases "'cause" this was used to
0:22:50construct the corpus
0:22:51right so far are generic sarcasm corpus that was a random sample
0:22:55so all of that is not sampled anyway the for the rhetorical questions of hyperbole
0:23:01we would select those posts but
0:23:04the poster actually contain all sorts of other cues and it's important to note that
0:23:09if we ever selected a cue it would exist in both sarcastic i'm not sarcastic
0:23:12both
0:23:13so it's not like you would only find the mid one and that kind of
0:23:16what made it interesting that you can use those think used in both sorts of
0:23:20infatuation so it would be by so that lee