0:00:16 | hi everybody |
---|
0:00:18 | so |
---|
0:00:18 | creating in characterizing the diverse corpus of sarcasm the dialogue |
---|
0:00:24 | they want to start by explaining why we study sarcasm |
---|
0:00:28 | and then the need for a large-scale corpus of sarcasm |
---|
0:00:32 | different examples of sarcasm in the wild |
---|
0:00:35 | followed by how we build our corpus some experimental results and linguistic analysis and then |
---|
0:00:40 | conclusions |
---|
0:00:42 | so why study sarcasm |
---|
0:00:44 | well it's as we all kind of no it's creative complex and diverse here are |
---|
0:00:49 | some examples |
---|
0:00:50 | things like this or missing the point |
---|
0:00:53 | i love it when you bash people for stating opinions and no facts then you |
---|
0:00:56 | turn around to do the same thing |
---|
0:01:00 | and even more complex my pyramidal tinfoil hat is an antenna for knowledge and truth |
---|
0:01:05 | it reflects idiocy and this into deep space |
---|
0:01:08 | as we can see |
---|
0:01:10 | it's very creative it's very diverse |
---|
0:01:12 | and |
---|
0:01:13 | it gets more and more ambiguous in complex |
---|
0:01:15 | very long tell problem |
---|
0:01:19 | so further motivation is it's very prevalent so estimated around ten percent in debate forums |
---|
0:01:26 | dialogue which is kind of our domain of interest |
---|
0:01:30 | and this sort of dialogue is very different from traditional mediums like independent tweets or |
---|
0:01:34 | reviews for products things like that |
---|
0:01:38 | so it's very interesting to our group |
---|
0:01:41 | also part of the motivation is that things like sentiment analysis systems are supported by |
---|
0:01:45 | misleading sarcastic postal people |
---|
0:01:47 | being sarcastic thinking something is really great about their product and then it's very misleading |
---|
0:01:53 | also for question answer systems it's important to know when things are not sarcastic to |
---|
0:01:57 | use that it's good data right so it's also important to differentiate between |
---|
0:02:01 | the classes sometimes you wanna look at the not sarcastic post sometimes you care about |
---|
0:02:05 | the sarcastic once |
---|
0:02:07 | so some examples of sarcasm the wild |
---|
0:02:12 | so sarcasm is clearly not a unitary phenomenon gives into thousands developed a taxonomy of |
---|
0:02:19 | five different categories of sarcasm on conversations between friends |
---|
0:02:23 | so you talks about sarcasm as speaking positively to convey negative intent |
---|
0:02:28 | this is kind of a generally accepted way |
---|
0:02:31 | to define sarcasm |
---|
0:02:33 | but you also defines different categories where sarcasm is probably things like rhetorical questions so |
---|
0:02:38 | somebody asking a question implying a humorous are critical assertion |
---|
0:02:42 | things like hyperbole expressing a non-literal meeting by exaggeration |
---|
0:02:46 | on the other side of the scale understatement so under playing the reality of a |
---|
0:02:50 | situation |
---|
0:02:52 | and jock hilarity so humouring teasing humours weights |
---|
0:02:56 | so this is a little bit more fine grained |
---|
0:02:59 | as a taxonomy for sarcasm |
---|
0:03:03 | and it's kind of |
---|
0:03:04 | accepted that people use the term sarcasm to meet all of these things as like |
---|
0:03:07 | a big rollback for anything that could be sarcastic |
---|
0:03:11 | but the okay theoretical models side that there is often a contrast between what is |
---|
0:03:16 | said |
---|
0:03:17 | and a literal description of the actual situation |
---|
0:03:19 | so that's a very common thing that characterizes much of sarcasm in different domains |
---|
0:03:25 | so no previous work has really operationalize these different categories that it gives is defined |
---|
0:03:30 | gives an other work people have defined |
---|
0:03:34 | so that kind of the focus of our corpus building |
---|
0:03:37 | so we explore in great detail rhetorical questions and hyperbole us to very prevalent |
---|
0:03:44 | subcategories of sarcasm in our online debate |
---|
0:03:46 | every probably in our debate forums and they can be used in fact sarcastically or |
---|
0:03:50 | it not sarcastically sounds interesting binary |
---|
0:03:53 | classification question |
---|
0:03:55 | so to kind of showcase why that's true here are examples of rhetorical questions answers |
---|
0:04:02 | that in the top row is used sarcastically in the bottom row not sarcastically |
---|
0:04:06 | so |
---|
0:04:07 | something like then what you call politician who ran such measures liberal |
---|
0:04:11 | yes it's "'cause" you're public and you're a conservative at all |
---|
0:04:15 | what without proof we would certainly show that it animal adapted to blah more of |
---|
0:04:18 | like an informative sort of thing |
---|
0:04:20 | so rhetorical questions exist in both categories |
---|
0:04:25 | similarly for hyperbole |
---|
0:04:27 | something like thank you from |
---|
0:04:29 | making my point better that i never do |
---|
0:04:31 | or again i'm astonished by the fact that you think i will do this |
---|
0:04:35 | so there's kind of different ways that you can use these categories in both sarcastic |
---|
0:04:39 | or not sarcastic |
---|
0:04:42 | with sarcastic or not sarcastic intent |
---|
0:04:46 | so kind of going into why do we need a large scale |
---|
0:04:49 | scale corpus of sarcasm |
---|
0:04:52 | first of all like i tried creativity and diversity make it difficult to model generalizations |
---|
0:04:58 | and subjectivity makes it very difficult to get high agreement annotation and we see that |
---|
0:05:02 | from lots of previous work on sarcasm |
---|
0:05:04 | people often use hash like sarcasm or use you know positive or negative |
---|
0:05:11 | a sentiment in different mediums to try to |
---|
0:05:15 | highlight where sarcasm exists |
---|
0:05:17 | because it's very difficult to get high agreement annotations |
---|
0:05:20 | and these annotations are costly and they require kind of expert workers |
---|
0:05:25 | so for example in and out of the blue context something like got your sosa |
---|
0:05:29 | think simple found i think i love you it's hard to tell if that's really |
---|
0:05:33 | sarcastic right |
---|
0:05:34 | out of the blue |
---|
0:05:36 | something like humans are nominal mammal that the fact it you just this in the |
---|
0:05:40 | real schools |
---|
0:05:41 | very subtle we don't know right |
---|
0:05:44 | so it's pretty hard to ask people to do this sort of annotations you have |
---|
0:05:48 | to be a little bit clever about it that kind of what we try to |
---|
0:05:51 | do |
---|
0:05:52 | so we need a way to get more labeled data and the short-term to study |
---|
0:05:56 | sarcasm |
---|
0:05:57 | to allow for better linguistic generalisations |
---|
0:06:00 | more powerful classifiers in the long term not kind of the promise of our corpus |
---|
0:06:04 | building stage |
---|
0:06:06 | how do we do it |
---|
0:06:08 | so we do bootstrapping |
---|
0:06:10 | we begin by replicating looking and walker's bootstrapping setup from twenty thirteen |
---|
0:06:14 | and the idea behind this is that |
---|
0:06:17 | you begin with a small set of annotated sarcastic and not sarcastic post |
---|
0:06:21 | and use some kind of the linguistic pattern extractor to find |
---|
0:06:26 | cues that you think are highly |
---|
0:06:28 | precise indicators of sarcasm and not sarcasm in the data |
---|
0:06:32 | once you have these sorts of cues you can go out against huge sets of |
---|
0:06:35 | an annotated data look for those cues |
---|
0:06:37 | and anything that matches we're gonna call the bootstrap data |
---|
0:06:41 | drop it back in the original annotated data and then kind of iteratively expand your |
---|
0:06:45 | data set that way |
---|
0:06:47 | that's kind of the premise that we use |
---|
0:06:48 | well i really the crux of this is that |
---|
0:06:51 | did you could bootstrapping we need this |
---|
0:06:53 | portion right here |
---|
0:06:55 | which requires the high-precision linguistic patterns to be really good we need really good high |
---|
0:06:59 | precision patterns so we try to get them using or |
---|
0:07:04 | using the linguistic patterns are out of slot t s |
---|
0:07:08 | so others log on the well by relevant ninety six is a weakly supervised pattern |
---|
0:07:11 | learner |
---|
0:07:13 | and we use it's extract lexical syntactic patterns highly associated with both sarcastic and not |
---|
0:07:18 | sarcastic utterances |
---|
0:07:19 | so the way that works is that it has a bunch of patterns templates that |
---|
0:07:23 | are defined so things like |
---|
0:07:25 | some sort of a subject followed by a passive verb phrase et cetera |
---|
0:07:29 | and it uses these patterns to then find instantiations in the text and then brings |
---|
0:07:33 | these different instantiations based on probability of occurrence in a certain class and frequency of |
---|
0:07:38 | occurrence |
---|
0:07:39 | so something like if you had the sentence in your data there are millions of |
---|
0:07:43 | people saying all sorts of stupid things about the president |
---|
0:07:46 | and you know run out of soggy we match this |
---|
0:07:49 | for example would match this but noun phrase proposition |
---|
0:07:53 | noun phrase pattern |
---|
0:07:55 | millions of people |
---|
0:07:57 | and then if this pattern was very frequent and highly |
---|
0:08:00 | probably occurring in sarcasm and then you would float up to the top of are |
---|
0:08:03 | ranked list |
---|
0:08:06 | so we do this |
---|
0:08:09 | and give each extraction pattern of frequency data off in a probability therapy |
---|
0:08:13 | and we classify post as belonging to a class it has at least and of |
---|
0:08:17 | those patterns existing |
---|
0:08:20 | so the first round that we observe |
---|
0:08:22 | looking at the small sample data |
---|
0:08:24 | is |
---|
0:08:25 | so here's some examples so something like say about your head get over a current |
---|
0:08:30 | sarcastic posts |
---|
0:08:31 | with these frequencies and probabilities of association |
---|
0:08:34 | and things like |
---|
0:08:35 | natural signal selection big thing area of our probabilities not sarcastic post |
---|
0:08:40 | and just to kind of sparse |
---|
0:08:43 | we find that the not sarcastic class contains |
---|
0:08:45 | a lot of very technical jargon scientific language topic specific things |
---|
0:08:49 | and then we can get |
---|
0:08:50 | high precision when classifying post based on just these templates |
---|
0:08:55 | it's up to about eighty percent |
---|
0:08:57 | whereas the sarcastic classes you can see are much more varied not high precision |
---|
0:09:01 | thirty percent |
---|
0:09:03 | and so it's difficult and you bootstrapping |
---|
0:09:06 | on data where the precision of these patterns is relatively low |
---|
0:09:11 | so |
---|
0:09:13 | we decided to make use of this high precision not sarcastic set of patterns that |
---|
0:09:17 | we can collect |
---|
0:09:18 | so actually expand our data trying to find post that would be good to get |
---|
0:09:23 | annotated |
---|
0:09:24 | that we think would have a higher probability than ten percent of being sarcastic and |
---|
0:09:28 | based on that original metric from a sample of to be forms data |
---|
0:09:32 | so using a pull of thirty k we filter out tools that we think contain |
---|
0:09:36 | not sarcastic post |
---|
0:09:38 | so pos that containing a any of those not sarcastic patterns that we identified |
---|
0:09:43 | and we end up with about eleven k posts that we believe have higher likelihood |
---|
0:09:47 | of being sarcastic and we put those out for annotation on mechanical turk |
---|
0:09:51 | and the way the kind of annotation task looks as they get a definition of |
---|
0:09:54 | sarcasm and an example of responses that contain sarcasm |
---|
0:09:58 | and don't contain sarcasm |
---|
0:10:00 | and then we show them a quote response pair so this is like a dialogic |
---|
0:10:03 | pair where we have a dialogic parent and the response and we asked them to |
---|
0:10:07 | identify sarcasm in the response |
---|
0:10:10 | so that's what are annotators are seeing |
---|
0:10:14 | and then using this method were able to skew the distribution of sarcasm to from |
---|
0:10:18 | ten percent up to thirty one percent |
---|
0:10:21 | so kind of getting annotated that pair that poll of eleven k |
---|
0:10:26 | depending on where we set our agreement threshold where he was askew this distribution quite |
---|
0:10:30 | high |
---|
0:10:31 | so here from nineteen to twenty three percent using this relatively can sort of conservative |
---|
0:10:35 | threshold of six out of nine annotators agreeing |
---|
0:10:38 | that posttest sarcastic |
---|
0:10:40 | we kind of since it so subjective and diverse we wanna make sure that or |
---|
0:10:44 | annotations are |
---|
0:10:45 | clean |
---|
0:10:46 | so that's why use a relatively high threshold |
---|
0:10:52 | so having more data |
---|
0:10:54 | means we're we can do better at the boot-strapping task but we still need we |
---|
0:10:58 | still observe some of the same trends |
---|
0:11:00 | so highly-precise not sarcastic patterns less precise sarcastic |
---|
0:11:05 | but and were still not quite at the point we wanted to be a propose |
---|
0:11:08 | trapping |
---|
0:11:09 | so |
---|
0:11:10 | kind of given up |
---|
0:11:12 | the diversity of the data we decide to revisit that |
---|
0:11:15 | categorization i talked about earlier |
---|
0:11:18 | so sarcasm rhetorical questions hyperbole understatement regularity |
---|
0:11:23 | so we make this observation that somebody's lexical syntactic cues are frequently used sarcastically |
---|
0:11:30 | so for example |
---|
0:11:31 | i |
---|
0:11:32 | well |
---|
0:11:32 | let's all copper that great argument revolution as well |
---|
0:11:35 | well |
---|
0:11:37 | the what's your plan how to |
---|
0:11:40 | how to realistic my friend |
---|
0:11:43 | interesting someone hijacked your account role |
---|
0:11:46 | central |
---|
0:11:47 | so pretty funny and really a combination of words expel an arm mean to expel |
---|
0:11:53 | arms louse use the creative genius |
---|
0:11:55 | so kind of these different terms that are pretty probably in the terms like this |
---|
0:12:00 | are pretty probably in sarcastic post and we try to make use of this observation |
---|
0:12:05 | in our data collection |
---|
0:12:08 | so the way we do that is |
---|
0:12:12 | we develop projects a search for different patterns in or an annotated data |
---|
0:12:16 | so we get annotations for different things that we think are quite probably the data |
---|
0:12:20 | things like well |
---|
0:12:22 | and things like |
---|
0:12:24 | all the all of these ones pretty much fantastic et cetera |
---|
0:12:28 | and we find that were able to get again distributions that are much higher than |
---|
0:12:31 | ten percent searching for post that only contains a single cues so it's interesting to |
---|
0:12:37 | note that just a single q have such a well large distribution of sarcasm so |
---|
0:12:41 | something like well |
---|
0:12:42 | used forty four percent of the time about |
---|
0:12:44 | about something post |
---|
0:12:46 | so using these observations we begin constructing are sub corpora |
---|
0:12:51 | one for rhetorical questions and one for hyperbole |
---|
0:12:55 | and the way we gather more data for this is that we observe that their |
---|
0:12:58 | use both sarcastically and not sarcastically for argumentation |
---|
0:13:02 | and we use this middle of posts heuristic to estimate whether a post is whether |
---|
0:13:07 | questions actually use rhetorically or not |
---|
0:13:09 | i'm so one a speaker |
---|
0:13:11 | ask the question then continues on with their turn their not giving it a chance |
---|
0:13:15 | whether |
---|
0:13:17 | the listener at actually respond and so it's a question at least that doesn't require |
---|
0:13:21 | at answering from someone else |
---|
0:13:23 | in the view of the writer |
---|
0:13:25 | so we do a little pilot annotation find that seventy five percent of these rhetorical |
---|
0:13:29 | questions that we gather in this way are in fact used |
---|
0:13:34 | artifact annotated to be |
---|
0:13:35 | rhetorical |
---|
0:13:36 | and we do annotations of these new post ending up with eight hundred fifty one |
---|
0:13:40 | post per class so something like do you wish to not have a logical to |
---|
0:13:44 | be |
---|
0:13:45 | already then god bless you anyway |
---|
0:13:47 | proof that you can't prove that i got |
---|
0:13:49 | and given anything but in salt et cetera so these things where someone this is |
---|
0:13:53 | the same post some was asking questions going on with their turn |
---|
0:13:58 | the second subcorpus we look at is hyperbole so hyperbole exaggerated situation we use intensive |
---|
0:14:03 | fires to capture these sorts of instances and we can get more annotations so calls |
---|
0:14:08 | in an o'brien side this sort of situational scale this contrast in fact i was |
---|
0:14:13 | talking about earlier |
---|
0:14:14 | so hyperbole can shift utterances across the scale so chipped something into extremely positive i |
---|
0:14:21 | don't way from literal and also into strictly negative and away from literal and so |
---|
0:14:27 | intensify or is kind of sort of this purpose |
---|
0:14:28 | so something like wow i'm so amazed by your comeback skills |
---|
0:14:33 | do you go on "'em" so impressed by or intellectual argument |
---|
0:14:35 | things like that |
---|
0:14:38 | so the statistics for a final corpus we get around six thousand |
---|
0:14:43 | five hundred post for are generic sarcasm corpus |
---|
0:14:46 | and then rhetorical questions and hyperbole with this distribution and more information on the dataset |
---|
0:14:51 | is available there |
---|
0:14:53 | it's in the paper |
---|
0:14:55 | so it's kind of validate the quality of our corpus |
---|
0:15:01 | we do simple experiments using very simple features bag of words about features |
---|
0:15:06 | noting previous work has achieved about seventy percent with more complex features |
---|
0:15:11 | and we end up with distributions that are higher than that so we get we |
---|
0:15:16 | do this kind of |
---|
0:15:18 | segmented |
---|
0:15:20 | segmented set of experiments where we test at different dataset sizes |
---|
0:15:23 | and we see that are f-measure is continue to increase above our peak right now |
---|
0:15:27 | seventy four with these simple features |
---|
0:15:29 | so that warrants you know expanding our dataset even more |
---|
0:15:33 | also we do again r weakly supervised experiments with other slot its just see what |
---|
0:15:38 | sorts of precisions we can get now for bootstrapping |
---|
0:15:41 | and we see much higher precision is that we were getting before at reasonable because |
---|
0:15:45 | for bootstrapping so that's good use as well |
---|
0:15:47 | so now we could expander method to be weakly supervised and gather more data more |
---|
0:15:51 | quickly |
---|
0:15:53 | and this is the numbers of new patterns that we learned so patterns that we |
---|
0:15:56 | never searched for in the original data |
---|
0:15:59 | so we're get we're learning a lot of new patterns that we didn't originally search |
---|
0:16:02 | for |
---|
0:16:03 | for all of the datasets |
---|
0:16:05 | and then some linguistic analysis quickly |
---|
0:16:09 | so we aim to characterize the differences between our datasets so again user some others |
---|
0:16:14 | love instantiations still and are generic data we see these |
---|
0:16:20 | creative sorts of different instantiations were sarcastic posts whereas again the not sarcastic pose that |
---|
0:16:26 | these highly technically |
---|
0:16:28 | technical jargon sort of terminologies |
---|
0:16:31 | for the rhetorical questions we observe a lot of the same properties for the not |
---|
0:16:36 | sarcastic class |
---|
0:16:38 | but for the start has the class we observe that |
---|
0:16:40 | there's a lot of attack on basic human abilities right on these debate forum dialogue |
---|
0:16:44 | some people say things like can you read it can you write |
---|
0:16:46 | do you understand |
---|
0:16:48 | so we kind of went through looking at some of the dependency parses on these |
---|
0:16:52 | sorts of questions |
---|
0:16:53 | i'm just found a lot of things that really relate to basic human ability so |
---|
0:16:56 | people are attacking people |
---|
0:16:58 | not really attacking their argument that's very probably on rgb boards data |
---|
0:17:03 | and finally for probably we find that the adjective an adverb patterns are really common |
---|
0:17:09 | even though we don't search for these of originally in our metrics experiments |
---|
0:17:13 | so |
---|
0:17:14 | when there and things like contrast by exclusion used |
---|
0:17:24 | samples of hyperbole are really interesting that we pick up |
---|
0:17:29 | so in conclusion we develop a large-scale highly reliable corpus of sarcasm we reduced annotation |
---|
0:17:35 | cost and effort by skewing the distribution of waiting having to annotate huge boobs of |
---|
0:17:39 | data |
---|
0:17:40 | and we operationalize lexical syntactic cues for rhetorical questions and hyperbole |
---|
0:17:45 | and verify the quality of our corpus empirically qualitatively |
---|
0:17:49 | for future directions you wanna do more feature engineering more model selection based on our |
---|
0:17:54 | linguistic observations |
---|
0:17:56 | develop more generalisable models of different categories of sarcasm that we haven't looked at |
---|
0:18:01 | and explore characteristics of our lower agreement data see if there's anything interesting there as |
---|
0:18:05 | well |
---|
0:18:06 | thanks |
---|
0:18:15 | questions |
---|
0:18:35 | so first of all so we have we began with not looking at those categories |
---|
0:18:39 | right so we start with this really generic sarcasm so definitely there it's kind of |
---|
0:18:44 | it so long tail right so there's a lotta different exaggerations |
---|
0:18:48 | definitely the problem |
---|
0:18:49 | we began initially talked leaving was just sarcasm a general but it kind of interesting |
---|
0:18:55 | to get into the more refined categories and look at how those are different and |
---|
0:19:00 | yes there's also different sorts of things that we could look at the understatement is |
---|
0:19:06 | quite prevalent as well |
---|
0:19:07 | so it's it doesn't only existing to be formed it just quite pronounced in the |
---|
0:19:11 | form so |
---|
0:19:12 | good to look at their |
---|
0:19:27 | right so the question is the question is about the words of x features so |
---|
0:19:31 | do we train them |
---|
0:19:34 | we train the word back model on our corpus are to be use existing model |
---|
0:19:37 | so we don't both on these results that are reporting are actually on the google |
---|
0:19:40 | news trained actors which is kind of |
---|
0:19:43 | it correlates with our with our data as well it |
---|
0:19:45 | the debate forums |
---|
0:19:47 | we have used our own trained model it today perform as well as this probably |
---|
0:19:51 | because that the smaller amount of data |
---|
0:19:53 | and the google news is trained on a huge amount of data so that definitely |
---|
0:19:55 | worth exploring in the future as well |
---|
0:20:12 | right so actually i didn't mention the numbers here are there's more detail in our |
---|
0:20:17 | in our paper but are level of agreement were about seventy percent for each of |
---|
0:20:22 | the for each of the tasks and they were actually better for the smaller tasks |
---|
0:20:26 | where what in generic sarcasm is a little bit are more constraint |
---|
0:20:29 | i think it's |
---|
0:20:31 | no that's actually agreement with the majority label so just |
---|
0:20:36 | and |
---|
0:20:37 | so is actually better for the sub categories in fact then the and the generics |
---|
0:20:42 | are can talk it's pretty hard to |
---|
0:20:45 | to get high agreement annotations rhetorical |
---|
0:20:52 | so i was wondering about the idea of twelve contrast try so you set |
---|
0:20:59 | these somewhere that highlights the fact that the entire time and it is some contrast |
---|
0:21:02 | between let us thing and what you think that element and so i guess |
---|
0:21:06 | that |
---|
0:21:08 | and also this idea that is that the t seven a meaning that is non |
---|
0:21:12 | leader right yes so i was thinking about the possible connection with method for and |
---|
0:21:18 | with the task of metaphor detection right and so here you are focusing on trying |
---|
0:21:23 | to find patterns that can act the rights a constant |
---|
0:21:27 | but for instance in some working metaphor detection the goal is to |
---|
0:21:31 | to |
---|
0:21:32 | to capture contrast rice to what makes a particular use different from the little use |
---|
0:21:37 | so by looking at how the sarcastic intended indicate actually be far from the regular |
---|
0:21:44 | used by was wondering into the so it's a very open question i was wondering |
---|
0:21:47 | that they have you had thought about the task in |
---|
0:21:51 | in this their arms |
---|
0:21:53 | that's really interesting so looking at kind of |
---|
0:21:57 | maybe trying to measure how far away tonic sort of a contrast scale that would |
---|
0:22:01 | definitely be interesting we haven't |
---|
0:22:02 | do not explicitly but i mean |
---|
0:22:04 | like the different intensified can have different affect so it's kind of |
---|
0:22:08 | trying to map it across the scale |
---|
0:22:13 | other questions |
---|
0:22:19 | a question |
---|
0:22:21 | it when you're doing the mining of the data |
---|
0:22:24 | and you're identifying different |
---|
0:22:27 | phrases that removes some more socially with sarcasm and non sarcasm |
---|
0:22:32 | did you do things to make sure that the dataset was not biased you know |
---|
0:22:37 | for "'cause" it utilizing portals kind of phrases |
---|
0:22:40 | so that if they don't later someone wanted to build an automated system to detect |
---|
0:22:44 | sarcasm an hour and sarcasm they would just |
---|
0:22:47 | reader paper and they are gonna go after these phrases "'cause" this was used to |
---|
0:22:50 | construct the corpus |
---|
0:22:51 | right so far are generic sarcasm corpus that was a random sample |
---|
0:22:55 | so all of that is not sampled anyway the for the rhetorical questions of hyperbole |
---|
0:23:01 | we would select those posts but |
---|
0:23:04 | the poster actually contain all sorts of other cues and it's important to note that |
---|
0:23:09 | if we ever selected a cue it would exist in both sarcastic i'm not sarcastic |
---|
0:23:12 | both |
---|
0:23:13 | so it's not like you would only find the mid one and that kind of |
---|
0:23:16 | what made it interesting that you can use those think used in both sorts of |
---|
0:23:20 | infatuation so it would be by so that lee |
---|