0:00:17 | a a a a a a we are dealing with the amount of document summarization |
---|
0:00:21 | and uh the goal of a summarisation is finding the most important bits of information |
---|
0:00:26 | uh from either a single document uh such as then you've story or a voice mail |
---|
0:00:31 | or multiple documents uh such as we use of a product or need stories of a nine |
---|
0:00:36 | or spoken documents |
---|
0:00:37 | such as broadcast news broadcast conversations lectures or reading |
---|
0:00:42 | uh the main issue is uh basic be a tackling with this information or a problem |
---|
0:00:47 | uh there are a variety of sources but which you get in formation of all |
---|
0:00:51 | and these are usually we don't the |
---|
0:00:52 | and the only have a limited time to process all this information |
---|
0:00:56 | and uh i usually it is information is not necessarily in the optimum or some sometimes we |
---|
0:01:01 | read a paper and everything that |
---|
0:01:03 | should ever it is other paper before this one |
---|
0:01:05 | to be able to understand this one that |
---|
0:01:08 | uh so we are working on a both speech and text summarisation |
---|
0:01:12 | but basically needs to been organising people shows summarisation immigration providing the research they good framework |
---|
0:01:19 | to do summarisation research and then these include |
---|
0:01:21 | uh documents document summarization |
---|
0:01:24 | and and this has been going on since the past and years and uh |
---|
0:01:28 | uh |
---|
0:01:29 | researchers are provide be the set of documents paired meet uh corresponding human some |
---|
0:01:35 | a a a a and uh |
---|
0:01:36 | i i'll be shopping results uh the nist data to be comparable with the previous three sir |
---|
0:01:41 | so uh a on the you like it works i uh people they have treated some summarisation as a classification |
---|
0:01:47 | problem |
---|
0:01:48 | and we are also doing the same |
---|
0:01:50 | uh usually in these approaches as uh the original documents that as this is they don't have a category all |
---|
0:01:55 | we have a the human summaries |
---|
0:01:57 | the first step is it's dining category to the uh a document sentence that such a summary sentence or a |
---|
0:02:02 | on somebody said |
---|
0:02:04 | and most of the previous work is used similar word based like to measure as between the documents not to |
---|
0:02:09 | assess and the summary |
---|
0:02:10 | uh and then assign labels to the sentence |
---|
0:02:13 | and then and uh done but a binary classification which features that she's the sentence length uh position in the |
---|
0:02:20 | document it's the route |
---|
0:02:22 | main it should such approaches is that a word based similarity measure is usually step fail to capture the semantic |
---|
0:02:27 | select |
---|
0:02:29 | uh between uh sentences and the summaries |
---|
0:02:32 | so uh in addition to the summarisation that's classification approach we use generate models |
---|
0:02:37 | and then use their |
---|
0:02:39 | you don't concept uh to figure out to seen to between that does than documents so that we can take |
---|
0:02:44 | this problem |
---|
0:02:45 | uh generative models have been used by others as well uh for summarisation for example |
---|
0:02:50 | hi geeky key and one they're and they have used a here i Q call to and usually a location |
---|
0:02:55 | and uh we have used let and usually allocation bayes models as well and are uh for work here is |
---|
0:03:00 | the based on our previous work |
---|
0:03:02 | uh which is the sum and D A or S out of the a |
---|
0:03:05 | and then a there will be a it does this uh to put semi-supervised extractive summarization method |
---|
0:03:10 | and it uses uh at the a a |
---|
0:03:13 | a supervised version of a the a a a a two class there uh documents and some we the to |
---|
0:03:17 | topic |
---|
0:03:18 | and then use used that's classification in that in the S out the approach |
---|
0:03:22 | um |
---|
0:03:23 | and |
---|
0:03:24 | and uh the main assumption is that |
---|
0:03:26 | uh there are two types of concepts in the in the documents generic concepts that's then the specific that |
---|
0:03:31 | and the generic ones are the ones that are usually included in the summary so that's the made assumption |
---|
0:03:36 | uh and the us that |
---|
0:03:38 | fig ones item on that are usually specific to each individual document |
---|
0:03:42 | and uh |
---|
0:03:44 | at a very high level of uh we show the process of we have a set of a document |
---|
0:03:48 | and the corresponding human summary |
---|
0:03:50 | and i be used um we yeah D a which i'll describe next uh two |
---|
0:03:54 | a a point is that of uh that and variable that and cost that |
---|
0:03:57 | a a and uh that are specific can generate using some supervision from the human summaries |
---|
0:04:02 | and then we go back and look at that training uh original documents that does that |
---|
0:04:07 | and then be mark the ones that have a making this specific course set as negative examples and the ones |
---|
0:04:12 | that have generic course that is positive example |
---|
0:04:15 | now we train a fired and and at the wrong time at the inference time we use that classifier to |
---|
0:04:20 | decide if that senses |
---|
0:04:21 | should be included in a summary or not |
---|
0:04:25 | uh so uh while this is you don't one and uh there is this uh cost off not the coast |
---|
0:04:30 | but the |
---|
0:04:31 | uh some optimality of transferring labels from the topics to the sentence that |
---|
0:04:36 | ooh |
---|
0:04:36 | so in this work B are basically be in looking at the topics |
---|
0:04:40 | and now we are trying to classify the latent variables themselves instead of the |
---|
0:04:44 | instead that of the sentence sentences |
---|
0:04:46 | and then learn to distinguish in there's of those latent variables that would be useful when we are trying to |
---|
0:04:51 | summarise need a Q |
---|
0:04:53 | so it works like this we train a classifier to do |
---|
0:04:56 | to distinguish the two topics |
---|
0:04:58 | and then at the prince time that we use the regular let and usually location |
---|
0:05:02 | and then uh uh a find change topics and then use the classifier to determine which more should be in |
---|
0:05:08 | the summary |
---|
0:05:09 | and then each most should a |
---|
0:05:11 | and a pick the sentence does that include the generate course |
---|
0:05:16 | so you're are more detail |
---|
0:05:17 | uh like leave this morning Q not speech uh already into to use the A |
---|
0:05:22 | and now approach is an extension of that the A for the summarisation task |
---|
0:05:26 | so i these a two model and it's uh it's that allows us to |
---|
0:05:30 | uh to explain a set of observations |
---|
0:05:33 | by a no observed or hidden or let's and groups |
---|
0:05:36 | and uh this explains why some parts of the data are similar |
---|
0:05:41 | and uh the assumption is each document is there a mixture of a small number of topics |
---|
0:05:45 | and each or creation is attributable to one of the documents topics |
---|
0:05:49 | so each word is sampled uh from a a a a a a a |
---|
0:05:52 | the topic |
---|
0:05:53 | so uh what we do we somebody D is also very similar to how the a but you know that |
---|
0:05:58 | unit |
---|
0:05:59 | in the case of thought date it was new ground but |
---|
0:06:01 | there are also looking at unigrams and bigram |
---|
0:06:04 | you're are when we are looking at the unit you know if it just if you to peers in the |
---|
0:06:08 | most summary or not |
---|
0:06:09 | so if it already appears in a human you be forced it we have two sets of the estimate was |
---|
0:06:13 | that's of course that's |
---|
0:06:14 | and we assume that that it should be generated it should be sampled both from the general concept that's instead |
---|
0:06:19 | of the space for |
---|
0:06:21 | and if it is not then it could be generated from any any top |
---|
0:06:26 | and |
---|
0:06:26 | uh so that be not proceed with the C V C so once we have these two sets of topics |
---|
0:06:31 | generate can space thing |
---|
0:06:33 | uh uh uh the extract features for every single topic |
---|
0:06:36 | the are think that you do is you basically try to find the most frequent unigrams and bigrams in the |
---|
0:06:41 | in the document |
---|
0:06:42 | and then a uh you be these |
---|
0:06:44 | generate a set of features for every topic |
---|
0:06:47 | so the first set of feature there |
---|
0:06:49 | as many features that is |
---|
0:06:51 | uh the number of the three terms |
---|
0:06:53 | the first feature is the probability of |
---|
0:06:55 | this topic cluster their appearing with this |
---|
0:06:58 | frequent where |
---|
0:06:59 | and and and the a and and you also look that all we need uh free for each topic class |
---|
0:07:04 | in close so that the other feature and you basically use the threshold to determine uh and then count to |
---|
0:07:10 | and then normalized by the number of frequent turn |
---|
0:07:13 | and we use maximum entropy classification |
---|
0:07:17 | oh at the inference time man and you set of documents are given |
---|
0:07:20 | since we don't have any some uh be beyond just the usual at the egg model on each document that |
---|
0:07:26 | that we use the classifier that we trained in the previous step |
---|
0:07:29 | to to label to estimate labels for the K topic |
---|
0:07:32 | and generate a coarse space |
---|
0:07:34 | then using the we compute the that those scores |
---|
0:07:37 | and then you decide if a site to |
---|
0:07:39 | that should be included a summary or not |
---|
0:07:42 | so in addition to this previous work has shown that unigram and bigram uh because is |
---|
0:07:47 | are useful in determining |
---|
0:07:49 | if a to this should be included in a summary or not |
---|
0:07:51 | so we are merging the scores based on C V |
---|
0:07:56 | so use use we compute the scores basically |
---|
0:08:00 | for every sentence you look uh at the uh so this first part |
---|
0:08:03 | you look at the uh a number of |
---|
0:08:06 | uh uh uh space is for you generic topic it contains then normalize it by K |
---|
0:08:10 | but we also look at the S out the a score and not the entire make the two of them |
---|
0:08:15 | and then as the next step be the entire interpolate the C V C score with the unigram and bigram |
---|
0:08:21 | score |
---|
0:08:22 | basically the unigram and bigram scores are normalized total to of the high-frequency unigrams or bigrams that the a a |
---|
0:08:29 | a test that the kind of thing |
---|
0:08:31 | a of course in consideration point |
---|
0:08:33 | oops |
---|
0:08:34 | uh |
---|
0:08:40 | and uh |
---|
0:08:42 | and it a a a a a at the yeah |
---|
0:08:45 | uh since we need that subset of these standards as the uh their scores uh he applied a greedy |
---|
0:08:51 | so uh it's the be right order all the sentences according to their scores then start from the highest scoring |
---|
0:08:57 | one |
---|
0:08:57 | it take the highest one |
---|
0:08:59 | then uh these |
---|
0:09:00 | start looking at the rest of the sentence that |
---|
0:09:03 | we at this time to to do some only if it is not read on that lead this act is |
---|
0:09:07 | that are already not in the set in the summary |
---|
0:09:10 | so this is very similar uh to uh |
---|
0:09:12 | two M M are approach but not exactly the same |
---|
0:09:16 | a a and uh these top uh these keep the standards that a lot of read on is it already |
---|
0:09:21 | generated summary |
---|
0:09:22 | and we stop if the summary like to satisfy |
---|
0:09:27 | and uh be able be eight uh the performance of summarisation using a score just like the previous paper |
---|
0:09:33 | uh you look at which one on a used to and with S you four |
---|
0:09:37 | so which one and two of a basic he compute the unigram and bigram or would lead between a that |
---|
0:09:43 | uh a human and system summary and which just C four looks that the |
---|
0:09:47 | uh skip |
---|
0:09:48 | or get P by grams up to a these of four |
---|
0:09:51 | and uh for training and we use a doc two thousand five and six |
---|
0:09:55 | data sets |
---|
0:09:56 | uh so there are in total hundred documents that this state L |
---|
0:10:00 | and each |
---|
0:10:01 | uh documents that contains to in five use arc |
---|
0:10:04 | and is that about eighty thousand sentences |
---|
0:10:06 | and for testing at we use the uh a doc to talk and that seven |
---|
0:10:10 | uh data set so is that a forty five documents sets and |
---|
0:10:14 | i each estimate five news article again and a twenty five thousand side |
---|
0:10:19 | so we use this it because we wanna compare you to be able to compare our results with most of |
---|
0:10:24 | them |
---|
0:10:25 | previous work a a that is using these data sets for evaluation |
---|
0:10:28 | and and uh the |
---|
0:10:30 | form of the task |
---|
0:10:31 | actually the nist evaluation of the form of the task has changed |
---|
0:10:34 | uh after two thousand that's seven so it's not you have become payable this town |
---|
0:10:38 | and uh |
---|
0:10:39 | the goal is to create two hundred and fifty word summaries from each document that |
---|
0:10:45 | so we are the results uh for the baseline you big you just use the cosine similarity to mark the |
---|
0:10:50 | in the initial set of sentence that |
---|
0:10:52 | a a similar to the previous work |
---|
0:10:54 | and a but as different from the previous work here we are only using the type of features are based |
---|
0:10:59 | on the word based features |
---|
0:11:00 | that we are computing so is they've bit weaker than the previous for |
---|
0:11:04 | another uh a baseline that we are comparing with is the five eight system so uh this |
---|
0:11:09 | a was also a classification like summarisation and that was the top performer ian |
---|
0:11:14 | not to that is that one |
---|
0:11:15 | but it that |
---|
0:11:16 | much more sophisticated features that the one that we are trying |
---|
0:11:20 | a that are one is the are some so this as the one uh that uses a hierarchical out the |
---|
0:11:25 | a a a a generate method that |
---|
0:11:27 | and uh so of the the the way a form the summary series is basically they use hlda to for |
---|
0:11:32 | to find out the topics |
---|
0:11:34 | then then they are picking the sentence that |
---|
0:11:36 | they try to keep the topic distribution for weird or original document |
---|
0:11:40 | and and they are mean the summary and then they use the kl divergence measure to make sure that uh |
---|
0:11:44 | the summary the summary topics |
---|
0:11:46 | are not significantly different than the original document topic |
---|
0:11:50 | so it use the in terms of which one and which |
---|
0:11:53 | to scores both X out the and C V C form |
---|
0:11:56 | uh significantly better than the that all the other approach |
---|
0:11:59 | and C give them a little bit better than a as that a but it also using a S eh |
---|
0:12:04 | in terms of fruits to we are about the same so it's are in the ballpark |
---|
0:12:07 | one of the main reason this some of the previous ports start actually optimising according to buy is the are |
---|
0:12:12 | using both unigrams that bigram |
---|
0:12:15 | uh but we didn't have access to the here are some summaries so we don't know what exactly what the |
---|
0:12:19 | reason |
---|
0:12:21 | and |
---|
0:12:22 | so in conclusion a a we are trying to learn a a summary content distribution |
---|
0:12:27 | uh from uh the |
---|
0:12:28 | document sets provide that uh uh according to a |
---|
0:12:32 | um |
---|
0:12:32 | according |
---|
0:12:33 | provide |
---|
0:12:34 | and uh uh use the human summaries to have some supervision and we are finding the |
---|
0:12:39 | generate and the uh a specific because that |
---|
0:12:42 | a so we have shown improvements in terms of rouge scores |
---|
0:12:45 | yeah and that most of five think that can that we think that K |
---|
0:12:49 | so there are you big it's is that one of them i for got to include here uh one of |
---|
0:12:53 | them means we are not really competing their only competing word based features of future work is to actually improve |
---|
0:12:58 | the feature set |
---|
0:12:59 | that we are using |
---|
0:13:00 | and and the other future work work based i of them i think the uh the sentence the selection part |
---|
0:13:06 | so you our previous work uh we have shown that |
---|
0:13:09 | you can actually do an exact search using integer linear programming |
---|
0:13:12 | and there are stiff system was uh the best performing system in dark |
---|
0:13:16 | two thousand and |
---|
0:13:18 | two thousand nine and ten |
---|
0:13:19 | so uh |
---|
0:13:20 | it's actually can easily be adapted here so that's also part of |
---|
0:13:24 | a future work "'cause" we are also find concepts and then that is |
---|
0:13:27 | that of the is think the concept uh that are included in the summary |
---|
0:13:31 | and that furthermore we also proposed a hierarchical topic model |
---|
0:13:34 | for summarisation |
---|
0:13:35 | yeah direct that should is uh moving towards |
---|
0:13:39 | oh |
---|
0:13:40 | thank you |
---|
0:13:45 | i i i'm also a a really a so yeah |
---|
0:13:51 | i |
---|
0:14:08 | a possible simple solution we were to cover most of which ones are able |
---|
0:14:13 | you |
---|
0:14:14 | for what you can |
---|
0:14:18 | actually that the |
---|
0:14:20 | and we i not |
---|
0:14:21 | okay key that all the |
---|
0:14:22 | a whole at the jedi concept that |
---|
0:14:24 | but in the i P framework actually we could do that |
---|
0:14:28 | uh |
---|
0:14:30 | a a uh we could uh try to a next to my the generic call that should be |
---|
0:14:34 | some |
---|
0:14:35 | uh |
---|
0:14:37 | but uh we have time |
---|
0:14:38 | that |
---|
0:14:43 | a |
---|
0:14:46 | i |
---|
0:14:52 | a |
---|
0:14:53 | i |
---|
0:14:56 | for |
---|
0:14:59 | i |
---|
0:15:08 | alright |
---|
0:15:09 | a i know that that have a a a a a a for mission just the just a |
---|
0:15:13 | just to human summary |
---|
0:15:15 | for |
---|
0:15:17 | oh |
---|
0:15:22 | i |
---|
0:15:24 | i |
---|
0:15:29 | i |
---|
0:15:30 | i |
---|
0:15:31 | i |
---|
0:15:34 | i |
---|
0:15:38 | a |
---|
0:15:40 | well actually even position based features are just the if you take the first that this from most of the |
---|
0:15:44 | documents |
---|
0:15:45 | previous work has role that really does so well |
---|
0:15:48 | then the but it's not |
---|
0:15:49 | of course that |
---|
0:15:52 | yeah |
---|
0:15:56 | a |
---|
0:15:57 | i |
---|
0:16:00 | i |
---|
0:16:01 | a |
---|