Selecting labels for news document clusters

作者: Krishnaprasad Thirunarayan , Trivikram Immaneni , Mastan Vali Shaik

DOI: 10.1007/978-3-540-73351-5_11

关键词:

摘要: This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number alternatives selecting headlines and/or sentences in (obtained as result an entity-event-duration query), formalize approach to extracting short phrase from well-supported headlines/sentences the that can serve label. Our technique maps sentence into set significant stems approximate its semantics, comparison. Eventually label is extracted selected headline/sentence contiguous sequence words, resuscitating word sequencing information lost formalization semantic equivalence.

参考文章(13)
Benjamin CM Fung, Ke Wang, Martin Ester, None, Hierarchical Document Clustering Encyclopedia of Data Warehousing and Mining. pp. 555- 559 ,(2009) , 10.4018/978-1-59140-557-3.CH105
Martin Ester, Byron J. Gao, Cluster Description Formats, Problems and Algorithms. siam international conference on data mining. pp. 464- 468 ,(2006)
P. Ferragina, A. Gulli, The anatomy of a hierarchical clustering engine for Web-page, news and book snippets international conference on data mining. pp. 395- 398 ,(2004) , 10.1109/ICDM.2004.10027
Gianna M. Del Corso, Antonio Gullí, Francesco Romani, Ranking a stream of news the web conference. pp. 97- 106 ,(2005) , 10.1145/1060745.1060764
A. Gulli, The anatomy of a news search engine the web conference. pp. 880- 881 ,(2005) , 10.1145/1062745.1062778
S. Osinski, D. Weiss, A concept-driven algorithm for clustering search results IEEE Intelligent Systems. ,vol. 20, pp. 48- 54 ,(2005) , 10.1109/MIS.2005.38
Florian Beil, Martin Ester, Xiaowei Xu, Frequent term-based text clustering Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 436- 442 ,(2002) , 10.1145/775047.775110
M.F. Porter, An algorithm for suffix stripping Program: Electronic Library and Information Systems. ,vol. 40, pp. 313- 316 ,(1997) , 10.1108/EB046814
Paolo Ferragina, Antonio Gulli, A personalized search engine based on web-snippet hierarchical clustering the web conference. ,vol. 38, pp. 801- 810 ,(2005) , 10.1145/1062745.1062760
Daniel M. Dunlavy, John Conroy, Dianne P. O'Leary, QCS Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Demonstrations - NAACL '03. pp. 11- 12 ,(2003) , 10.3115/1073427.1073433