Topic indexing with Wikipedia

作者: Olena Medelyan , David N. Milne , Ian H. Witten

DOI:

关键词:

摘要: Wikipedia article names can be utilized as a controlled vocabulary for identifying the main topics in document. Wikipedia’s 2M articles cover terminology of nearly any document collection, which permits indexing absence manually created vocabularies. We combine state-of-the-art strategies automatic with unique property—a richly hyperlinked encyclopedia. evaluate scheme by comparing automatically assigned those chosen human indexers. Analysis consistency shows that our algorithm outperforms some subjects.

参考文章(15)
George Hripcsak, Adam S Rothschild, Agreement, the F-Measure, and Reliability in Information Retrieval Journal of the American Medical Informatics Association. ,vol. 12, pp. 296- 298 ,(2005) , 10.1197/JAMIA.M1733
Percy Nohama, Kornél Markó, Udo Hahn, Philipp Daumke, Stefan Schulz, Interlingual indexing across different languages RIAO '04 Coupling approaches, coupling media and coupling languages for information retrieval. pp. 82- 99 ,(2004)
Peter D. Turney, Coherent keyphrase extraction via web mining international joint conference on artificial intelligence. pp. 434- 439 ,(2003)
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
Craig G. Nevill-Manning, Gordon W. Paynter, Carl Gutwin, Ian H. Witten, Eibe Frank, Domain-specific keyphrase extraction international joint conference on artificial intelligence. ,vol. 2, pp. 668- 673 ,(1999)
David Milne, Ian H. Witten, An effective, low-cost measure of semantic relatedness obtained from Wikipedia links AAAI Press. pp. 25- 30 ,(2008)
David Milne, Olena Medelyan, Ian Witten, Mining Domain-Specific Thesauri from Wikipedia: A Case Study web intelligence. pp. 442- 448 ,(2006) , 10.1109/WI.2006.119
L. Rolling, Indexing Consistency, Quality and Efficiency. Information Processing and Management. ,vol. 17, pp. 69- 76 ,(1981) , 10.1016/0306-4573(81)90028-5
Gregg Rothermel, Mary Jean Harrold, A safe, efficient regression test selection technique ACM Transactions on Software Engineering and Methodology. ,vol. 6, pp. 173- 210 ,(1997) , 10.1145/248233.248262
Olena Medelyan, Ian H. Witten, Domain-independent automatic keyphrase indexing with small training sets Journal of the Association for Information Science and Technology. ,vol. 59, pp. 1026- 1040 ,(2008) , 10.1002/ASI.V59:7