Deriving concept hierarchies from text

作者: Mark Sanderson , Bruce Croft

DOI: 10.1145/312624.312679

关键词:

摘要: Abstract : This paper presents a means of automatically deriving hierarchical organization concepts from set documents without use training data or standard clustering techniques. Instead, salient words and phrases extracted the are organized hierarchically using type co-occurrence known as subsumption. The resulting structure is displayed series menus. When generated retrieved documents, user browsing menus provided with detailed overview their content in manner distinct existing summarization methods used to build simple, but appear be effective: smallscale study reveals that hierarchy possesses properties expected such general terms placed at top levels leading related more specific below. formation presentation described along some other informal evaluations. into concept derived itself undoubtedly one goal information retrieval. Were this achieved, would form somewhat like manually constructed subject hierarchies, Library Congress categories, Dewey Decimal system. only difference being categories customized itself. For example, collection media articles, category "Entertainment" might near level; below it, (amongst others) find "Movies", entertainment; that, there could "Actors & Actresses", an aspect movies. As can seen, arrangement provides topic those articles.

参考文章(15)
François Bourdoncle, LiveTopics: recherche visuelle d'information sur l'Internet RIAO '97 Computer-Assisted Information Searching on Internet - Volume 2. pp. 651- 654 ,(1997)
William A. Woods, Conceptual Indexing: A Better Way to Organize Knowledge Sun Microsystems, Inc.. ,(1997)
Jinxi Xu, W. Bruce Croft, Quary Expansion Using Local and Global Document Analysis international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 4- 11 ,(1996) , 10.1145/3130348.3130364
R.H. Thompson, W.B. Croft, Support for browsing in an intelligent text retrieval system International Journal of Human-computer Studies \/ International Journal of Man-machine Studies. ,vol. 30, pp. 639- 668 ,(1989) , 10.1016/S0020-7373(89)80014-8
Marti A. Hearst, Jan O. Pedersen, Reexamining the cluster hypothesis: scatter/gather on retrieval results international acm sigir conference on research and development in information retrieval. pp. 76- 84 ,(1996) , 10.1145/243199.243216
George A. Miller, WordNet Communications of the ACM. ,vol. 38, pp. 39- 41 ,(1995) , 10.1145/219717.219748
David Yarowsky, Unsupervised word sense disambiguation rivaling supervised methods Proceedings of the 33rd annual meeting on Association for Computational Linguistics -. pp. 189- 196 ,(1995) , 10.3115/981658.981684
KAREN SPARCK JONES, A statistical interpretation of term specificity and its application in retrieval Journal of Documentation. ,vol. 60, pp. 493- 502 ,(1972) , 10.1108/EB026526