WordNet improves text document clustering

作者: Andreas Hotho , Steffen Staab , Gerd Stumme

DOI:

关键词:

摘要: Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number meaningful clusters. The bag words representation used for these methods is often unsatisfactory as it ignores relationships between terms that do not co-occur literally. In order to deal with the problem, we integrate background knowledge — our application Wordnet process text documents. We cluster documents standard partitional algorithm. Our experimental evaluation on Reuters newsfeeds compares results pre-categorizations news. experiments, improvements compared baseline can be shown many interesting tasks.

参考文章(23)
Andreas Hotho, Steffen Staab, Gerd Stumme, Text Clustering Based on Background Knowledge ,(2003)
Gianni Amati, Claudio Carpineto, Giovanni Romano, Fondazione Ugo Bordoni, None, FUB at TREC-10 Web track: A probabilistic framework for topic relevance term weighting text retrieval conference. pp. 182- 191 ,(2001)
Andreas Hotho, Steffen Staab, Gerd Stumme, Explaining Text Clustering Results Using Semantic Structures european conference on principles of data mining and knowledge discovery. pp. 217- 228 ,(2003) , 10.1007/978-3-540-39804-2_21
Bernhard Ganter, Rudolf Wille, C. Franzke, Formal Concept Analysis: Mathematical Foundations ,(1998)
L Alfonso Urena-López, Manuel Buenaga, Jose M Gomez, None, Integrating Linguistic Resources in TC through WSD Computers and The Humanities. ,vol. 35, pp. 215- 230 ,(2001) , 10.1023/A:1002632712378
Belén Díaz-Agudo, Manuel de Buenaga Rodríguez, José María Gómez Hidalgo, Using WordNet to Complement Training Information in Text Categorization arXiv: Computation and Language. pp. 353- ,(1997)
George Karypis, Michael Steinbach, Vipin Kumar, A Comparison of Document Clustering Techniques ,(2000)
Nancy Ide, Jean Véronis, Introduction to the special issue on word sense disambiguation: the state of the art Computational Linguistics. ,vol. 24, pp. 2- 40 ,(1998)
Gerard Salton, Automatic text processing: the transformation, analysis, and retrieval of information by computer Addison-Wesley Longman Publishing Co., Inc.. ,(1989)
Eneko Agirre, German Rigau, Word sense disambiguation using Conceptual Density international conference on computational linguistics. pp. 16- 22 ,(1996) , 10.3115/992628.992635