Multi-document summarization based on cluster using non-negative matrix factorization

Sun Park, Ju Hong Lee, Deok Hwan Kim, Chan Min Ahn

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Scopus citations

Abstract

In this paper, a new summarization method, which uses non-negative matrix factorization (NMF) and X-means clustering, is introduced to extract meaningful sentences from multi-documents. The proposed method can improve the quality of document summaries because the inherent semantics of the documents are well reflected by using the semantic features calculated by NMF and the sentences most relevant to the given topic are extracted efficiently by using the semantic variables derived by NMF. Besides, it uses K-means clustering to remove noises so that it can avoid the biased inherent semantics of the documents to be reflected in summaries. We perform detail experiments with the well-known DUC test dataset. The experimental results demonstrate that the proposed method has better performance than other methods using the LSA, the Kmeans, and the NMF.

Original languageEnglish
Title of host publicationSOFSEM 2007
Subtitle of host publicationTheory and Practice of Computer Science - 33rd Conference on Current Trends in Theory and Practice of Computer Science, Proceedings
PublisherSpringer Verlag
Pages761-770
Number of pages10
ISBN (Print)9783540695066
DOIs
StatePublished - 2007
Event33rd Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2007 - Harrachov, Czech Republic
Duration: 20 Jan 200726 Jan 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4362 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference33rd Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2007
Country/TerritoryCzech Republic
CityHarrachov
Period20/01/0726/01/07

Fingerprint

Dive into the research topics of 'Multi-document summarization based on cluster using non-negative matrix factorization'. Together they form a unique fingerprint.

Cite this