Domain-independent automatic keyphrase indexing with small training sets

作者: Olena Medelyan , Ian H. Witten

DOI: 10.1002/ASI.V59:7

关键词:

摘要: Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding the document select appropriate descriptors according to defined cataloging rules. We propose new method that enhances automatic keyphrase extraction by using semantic information about terms phrases gleaned from domain-specific thesaurus. The key advantage approach it performs well very little training data. evaluate large set manually indexed documents domain agriculture, compare its consistency group six professional indexers, explore performance smaller collections other domains French Spanish © 2008 Wiley Periodicals, Inc.

参考文章(27)
George Hripcsak, Adam S Rothschild, Agreement, the F-Measure, and Reliability in Information Retrieval Journal of the American Medical Informatics Association. ,vol. 12, pp. 296- 298 ,(2005) , 10.1197/JAMIA.M1733
Percy Nohama, Kornél Markó, Udo Hahn, Philipp Daumke, Stefan Schulz, Interlingual indexing across different languages RIAO '04 Coupling approaches, coupling media and coupling languages for information retrieval. pp. 82- 99 ,(2004)
Ken Barker, Nadia Cornacchia, Using Noun Phrase Heads to Extract Document Keyphrases Lecture Notes in Computer Science. pp. 40- 52 ,(2000) , 10.1007/3-540-45486-1_4
N. Fuhr, G. E. Knorz, Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS) international acm sigir conference on research and development in information retrieval. pp. 391- 408 ,(1984) , 10.5555/636805.636831
Sabrina Tiun, Rosni Abdullah, Tang Enya Kong, Automatic Topic Identification Using Ontology Hierarchy international conference on computational linguistics. pp. 444- 453 ,(2001) , 10.1007/3-540-44686-9_43
Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)
Anette Hulth, Combining Machine Learning and Natural Language Processing for Automatic Keyword Extraction Institutionen för data- och systemvetenskap (tills m KTH). ,(2004)
Christian Plaunt, Barbara A. Norgard, An association-based method for automatic indexing with a controlled vocabulary Journal of the Association for Information Science and Technology. ,vol. 49, pp. 888- 902 ,(1998) , 10.1002/(SICI)1097-4571(199808)49:10<888::AID-ASI5>3.0.CO;2-Y