Domain-specific keyphrase extraction

作者: Craig G. Nevill-Manning , Gordon W. Paynter , Carl Gutwin , Ian H. Witten , Eibe Frank

DOI:

关键词:

摘要: Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority documents have author-assigned keyphrases, manually assigning keyphrases to existing is very laborious. Therefore it highly desirable automate the keyphrase extraction process. This paper shows that simple procedure for based on naive Bayes learning scheme performs comparably state art. It goes explain how this procedure's performance can be boosted by automatically tailoring process particular collection at hand. Results large technical reports in computer science show quality extracted improves significantly when domain-specific information exploited.

参考文章(7)
Julie Beth Lovins, Development of a Stemming Algorithm Mech. Transl. Comput. Linguistics. ,vol. 11, pp. 22- 31 ,(1968)
Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)
J. Ross Quinlan, C4.5: Programs for Machine Learning ,(1992)
Pedro Domingos, Michael Pazzani, On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Machine Learning. ,vol. 29, pp. 103- 130 ,(1997) , 10.1023/A:1007413511361
Peter D. Turney, Learning to Extract Keyphrases from Text arXiv: Learning. ,(1999)
Susan Dumais, John Platt, David Heckerman, Mehran Sahami, Inductive learning algorithms and representations for text categorization conference on information and knowledge management. pp. 148- 155 ,(1998) , 10.1145/288627.288651
Leo Breiman, Bagging predictors Machine Learning archive. ,vol. 24, pp. 123- ,(1996) , 10.1023/A:1018054314350