Přepis řeči - CONCEPT-BASED CLASSIFICATION FOR MULTI-DOCUMENT SUMMARIZATION

0:00:17	a a a a a a we are dealing with the amount of document summarization
0:00:21	and uh the goal of a summarisation is finding the most important bits of information
0:00:26	uh from either a single document uh such as then you've story or a voice mail
0:00:31	or multiple documents uh such as we use of a product or need stories of a nine
0:00:36	or spoken documents
0:00:37	such as broadcast news broadcast conversations lectures or reading
0:00:42	uh the main issue is uh basic be a tackling with this information or a problem
0:00:47	uh there are a variety of sources but which you get in formation of all
0:00:51	and these are usually we don't the
0:00:52	and the only have a limited time to process all this information
0:00:56	and uh i usually it is information is not necessarily in the optimum or some sometimes we
0:01:01	read a paper and everything that
0:01:03	should ever it is other paper before this one
0:01:05	to be able to understand this one that
0:01:08	uh so we are working on a both speech and text summarisation
0:01:12	but basically needs to been organising people shows summarisation immigration providing the research they good framework
0:01:19	to do summarisation research and then these include
0:01:21	uh documents document summarization
0:01:24	and and this has been going on since the past and years and uh
0:01:28	uh
0:01:29	researchers are provide be the set of documents paired meet uh corresponding human some
0:01:35	a a a a and uh
0:01:36	i i'll be shopping results uh the nist data to be comparable with the previous three sir
0:01:41	so uh a on the you like it works i uh people they have treated some summarisation as a classification
0:01:47	problem
0:01:48	and we are also doing the same
0:01:50	uh usually in these approaches as uh the original documents that as this is they don't have a category all
0:01:55	we have a the human summaries
0:01:57	the first step is it's dining category to the uh a document sentence that such a summary sentence or a
0:02:02	on somebody said
0:02:04	and most of the previous work is used similar word based like to measure as between the documents not to
0:02:09	assess and the summary
0:02:10	uh and then assign labels to the sentence
0:02:13	and then and uh done but a binary classification which features that she's the sentence length uh position in the
0:02:20	document it's the route
0:02:22	main it should such approaches is that a word based similarity measure is usually step fail to capture the semantic
0:02:27	select
0:02:29	uh between uh sentences and the summaries
0:02:32	so uh in addition to the summarisation that's classification approach we use generate models
0:02:37	and then use their
0:02:39	you don't concept uh to figure out to seen to between that does than documents so that we can take
0:02:44	this problem
0:02:45	uh generative models have been used by others as well uh for summarisation for example
0:02:50	hi geeky key and one they're and they have used a here i Q call to and usually a location
0:02:55	and uh we have used let and usually allocation bayes models as well and are uh for work here is
0:03:00	the based on our previous work
0:03:02	uh which is the sum and D A or S out of the a
0:03:05	and then a there will be a it does this uh to put semi-supervised extractive summarization method
0:03:10	and it uses uh at the a a
0:03:13	a supervised version of a the a a a a two class there uh documents and some we the to
0:03:17	topic
0:03:18	and then use used that's classification in that in the S out the approach
0:03:22	um
0:03:23	and
0:03:24	and uh the main assumption is that
0:03:26	uh there are two types of concepts in the in the documents generic concepts that's then the specific that
0:03:31	and the generic ones are the ones that are usually included in the summary so that's the made assumption
0:03:36	uh and the us that
0:03:38	fig ones item on that are usually specific to each individual document
0:03:42	and uh
0:03:44	at a very high level of uh we show the process of we have a set of a document
0:03:48	and the corresponding human summary
0:03:50	and i be used um we yeah D a which i'll describe next uh two
0:03:54	a a point is that of uh that and variable that and cost that
0:03:57	a a and uh that are specific can generate using some supervision from the human summaries
0:04:02	and then we go back and look at that training uh original documents that does that
0:04:07	and then be mark the ones that have a making this specific course set as negative examples and the ones
0:04:12	that have generic course that is positive example
0:04:15	now we train a fired and and at the wrong time at the inference time we use that classifier to
0:04:20	decide if that senses
0:04:21	should be included in a summary or not
0:04:25	uh so uh while this is you don't one and uh there is this uh cost off not the coast
0:04:30	but the
0:04:31	uh some optimality of transferring labels from the topics to the sentence that
0:04:36	ooh
0:04:36	so in this work B are basically be in looking at the topics
0:04:40	and now we are trying to classify the latent variables themselves instead of the
0:04:44	instead that of the sentence sentences
0:04:46	and then learn to distinguish in there's of those latent variables that would be useful when we are trying to
0:04:51	summarise need a Q
0:04:53	so it works like this we train a classifier to do
0:04:56	to distinguish the two topics
0:04:58	and then at the prince time that we use the regular let and usually location
0:05:02	and then uh uh a find change topics and then use the classifier to determine which more should be in
0:05:08	the summary
0:05:09	and then each most should a
0:05:11	and a pick the sentence does that include the generate course
0:05:16	so you're are more detail
0:05:17	uh like leave this morning Q not speech uh already into to use the A
0:05:22	and now approach is an extension of that the A for the summarisation task
0:05:26	so i these a two model and it's uh it's that allows us to
0:05:30	uh to explain a set of observations
0:05:33	by a no observed or hidden or let's and groups
0:05:36	and uh this explains why some parts of the data are similar
0:05:41	and uh the assumption is each document is there a mixture of a small number of topics
0:05:45	and each or creation is attributable to one of the documents topics
0:05:49	so each word is sampled uh from a a a a a a a
0:05:52	the topic
0:05:53	so uh what we do we somebody D is also very similar to how the a but you know that
0:05:58	unit
0:05:59	in the case of thought date it was new ground but
0:06:01	there are also looking at unigrams and bigram
0:06:04	you're are when we are looking at the unit you know if it just if you to peers in the
0:06:08	most summary or not
0:06:09	so if it already appears in a human you be forced it we have two sets of the estimate was
0:06:13	that's of course that's
0:06:14	and we assume that that it should be generated it should be sampled both from the general concept that's instead
0:06:19	of the space for
0:06:21	and if it is not then it could be generated from any any top
0:06:26	and
0:06:26	uh so that be not proceed with the C V C so once we have these two sets of topics
0:06:31	generate can space thing
0:06:33	uh uh uh the extract features for every single topic
0:06:36	the are think that you do is you basically try to find the most frequent unigrams and bigrams in the
0:06:41	in the document
0:06:42	and then a uh you be these
0:06:44	generate a set of features for every topic
0:06:47	so the first set of feature there
0:06:49	as many features that is
0:06:51	uh the number of the three terms
0:06:53	the first feature is the probability of
0:06:55	this topic cluster their appearing with this
0:06:58	frequent where
0:06:59	and and and the a and and you also look that all we need uh free for each topic class
0:07:04	in close so that the other feature and you basically use the threshold to determine uh and then count to
0:07:10	and then normalized by the number of frequent turn
0:07:13	and we use maximum entropy classification
0:07:17	oh at the inference time man and you set of documents are given
0:07:20	since we don't have any some uh be beyond just the usual at the egg model on each document that
0:07:26	that we use the classifier that we trained in the previous step
0:07:29	to to label to estimate labels for the K topic
0:07:32	and generate a coarse space
0:07:34	then using the we compute the that those scores
0:07:37	and then you decide if a site to
0:07:39	that should be included a summary or not
0:07:42	so in addition to this previous work has shown that unigram and bigram uh because is
0:07:47	are useful in determining
0:07:49	if a to this should be included in a summary or not
0:07:51	so we are merging the scores based on C V
0:07:56	so use use we compute the scores basically
0:08:00	for every sentence you look uh at the uh so this first part
0:08:03	you look at the uh a number of
0:08:06	uh uh uh space is for you generic topic it contains then normalize it by K
0:08:10	but we also look at the S out the a score and not the entire make the two of them
0:08:15	and then as the next step be the entire interpolate the C V C score with the unigram and bigram
0:08:21	score
0:08:22	basically the unigram and bigram scores are normalized total to of the high-frequency unigrams or bigrams that the a a
0:08:29	a test that the kind of thing
0:08:31	a of course in consideration point
0:08:33	oops
0:08:34	uh
0:08:40	and uh
0:08:42	and it a a a a a at the yeah
0:08:45	uh since we need that subset of these standards as the uh their scores uh he applied a greedy
0:08:51	so uh it's the be right order all the sentences according to their scores then start from the highest scoring
0:08:57	one
0:08:57	it take the highest one
0:08:59	then uh these
0:09:00	start looking at the rest of the sentence that
0:09:03	we at this time to to do some only if it is not read on that lead this act is
0:09:07	that are already not in the set in the summary
0:09:10	so this is very similar uh to uh
0:09:12	two M M are approach but not exactly the same
0:09:16	a a and uh these top uh these keep the standards that a lot of read on is it already
0:09:21	generated summary
0:09:22	and we stop if the summary like to satisfy
0:09:27	and uh be able be eight uh the performance of summarisation using a score just like the previous paper
0:09:33	uh you look at which one on a used to and with S you four
0:09:37	so which one and two of a basic he compute the unigram and bigram or would lead between a that
0:09:43	uh a human and system summary and which just C four looks that the
0:09:47	uh skip
0:09:48	or get P by grams up to a these of four
0:09:51	and uh for training and we use a doc two thousand five and six
0:09:55	data sets
0:09:56	uh so there are in total hundred documents that this state L
0:10:00	and each
0:10:01	uh documents that contains to in five use arc
0:10:04	and is that about eighty thousand sentences
0:10:06	and for testing at we use the uh a doc to talk and that seven
0:10:10	uh data set so is that a forty five documents sets and
0:10:14	i each estimate five news article again and a twenty five thousand side
0:10:19	so we use this it because we wanna compare you to be able to compare our results with most of
0:10:24	them
0:10:25	previous work a a that is using these data sets for evaluation
0:10:28	and and uh the
0:10:30	form of the task
0:10:31	actually the nist evaluation of the form of the task has changed
0:10:34	uh after two thousand that's seven so it's not you have become payable this town
0:10:38	and uh
0:10:39	the goal is to create two hundred and fifty word summaries from each document that
0:10:45	so we are the results uh for the baseline you big you just use the cosine similarity to mark the
0:10:50	in the initial set of sentence that
0:10:52	a a similar to the previous work
0:10:54	and a but as different from the previous work here we are only using the type of features are based
0:10:59	on the word based features
0:11:00	that we are computing so is they've bit weaker than the previous for
0:11:04	another uh a baseline that we are comparing with is the five eight system so uh this
0:11:09	a was also a classification like summarisation and that was the top performer ian
0:11:14	not to that is that one
0:11:15	but it that
0:11:16	much more sophisticated features that the one that we are trying
0:11:20	a that are one is the are some so this as the one uh that uses a hierarchical out the
0:11:25	a a a a generate method that
0:11:27	and uh so of the the the way a form the summary series is basically they use hlda to for
0:11:32	to find out the topics
0:11:34	then then they are picking the sentence that
0:11:36	they try to keep the topic distribution for weird or original document
0:11:40	and and they are mean the summary and then they use the kl divergence measure to make sure that uh
0:11:44	the summary the summary topics
0:11:46	are not significantly different than the original document topic
0:11:50	so it use the in terms of which one and which
0:11:53	to scores both X out the and C V C form
0:11:56	uh significantly better than the that all the other approach
0:11:59	and C give them a little bit better than a as that a but it also using a S eh
0:12:04	in terms of fruits to we are about the same so it's are in the ballpark
0:12:07	one of the main reason this some of the previous ports start actually optimising according to buy is the are
0:12:12	using both unigrams that bigram
0:12:15	uh but we didn't have access to the here are some summaries so we don't know what exactly what the
0:12:19	reason
0:12:21	and
0:12:22	so in conclusion a a we are trying to learn a a summary content distribution
0:12:27	uh from uh the
0:12:28	document sets provide that uh uh according to a
0:12:32	um
0:12:32	according
0:12:33	provide
0:12:34	and uh uh use the human summaries to have some supervision and we are finding the
0:12:39	generate and the uh a specific because that
0:12:42	a so we have shown improvements in terms of rouge scores
0:12:45	yeah and that most of five think that can that we think that K
0:12:49	so there are you big it's is that one of them i for got to include here uh one of
0:12:53	them means we are not really competing their only competing word based features of future work is to actually improve
0:12:58	the feature set
0:12:59	that we are using
0:13:00	and and the other future work work based i of them i think the uh the sentence the selection part
0:13:06	so you our previous work uh we have shown that
0:13:09	you can actually do an exact search using integer linear programming
0:13:12	and there are stiff system was uh the best performing system in dark
0:13:16	two thousand and
0:13:18	two thousand nine and ten
0:13:19	so uh
0:13:20	it's actually can easily be adapted here so that's also part of
0:13:24	a future work "'cause" we are also find concepts and then that is
0:13:27	that of the is think the concept uh that are included in the summary
0:13:31	and that furthermore we also proposed a hierarchical topic model
0:13:34	for summarisation
0:13:35	yeah direct that should is uh moving towards
0:13:39	oh
0:13:40	thank you
0:13:45	i i i'm also a a really a so yeah
0:13:51	i
0:14:08	a possible simple solution we were to cover most of which ones are able
0:14:13	you
0:14:14	for what you can
0:14:18	actually that the
0:14:20	and we i not
0:14:21	okay key that all the
0:14:22	a whole at the jedi concept that
0:14:24	but in the i P framework actually we could do that
0:14:28	uh
0:14:30	a a uh we could uh try to a next to my the generic call that should be
0:14:34	some
0:14:35	uh
0:14:37	but uh we have time
0:14:38	that
0:14:43	a
0:14:46	i
0:14:52	a
0:14:53	i
0:14:56	for
0:14:59	i
0:15:08	alright
0:15:09	a i know that that have a a a a a a for mission just the just a
0:15:13	just to human summary
0:15:15	for
0:15:17	oh
0:15:22	i
0:15:24	i
0:15:29	i
0:15:30	i
0:15:31	i
0:15:34	i
0:15:38	a
0:15:40	well actually even position based features are just the if you take the first that this from most of the
0:15:44	documents
0:15:45	previous work has role that really does so well
0:15:48	then the but it's not
0:15:49	of course that
0:15:52	yeah
0:15:56	a
0:15:57	i
0:16:00	i
0:16:01	a

CONCEPT-BASED CLASSIFICATION FOR MULTI-DOCUMENT SUMMARIZATION

Spoken Document Processing

Přednášející: Dilek Hakkani-Tur, Autoři: Asli Celikyilmaz, University of California Berkeley, United States; Dilek Hakkani-Tür, Microsoft Corporation, United States