作者: Olena Medelyan , David N. Milne , Ian H. Witten
DOI:
关键词:
摘要: Wikipedia article names can be utilized as a controlled vocabulary for identifying the main topics in document. Wikipedia’s 2M articles cover terminology of nearly any document collection, which permits indexing absence manually created vocabularies. We combine state-of-the-art strategies automatic with unique property—a richly hyperlinked encyclopedia. evaluate scheme by comparing automatically assigned those chosen human indexers. Analysis consistency shows that our algorithm outperforms some subjects.