The effect of feature representation on MEDLINE document classification.

作者: Meliha Yetisgen-Yildiz , Wanda Pratt

DOI:

关键词:

摘要: This work explores the effect of text representation techniques on overall performance medical classification. To accomplish this goal, we developed a classification system that supports very basic word (bag-of-words) and more complex phrase (bag-of-phrases). We also combined representations (hybrid) for further analysis. Our extracts phrases from by incorporating knowledge base natural language processing techniques. conducted experiments to evaluate effects different measuring change in with MEDLINE documents OHSUMED dataset. measured information retrieval metrics; precision (p), recall (r), F1-score (F1). In our experiments, achieved better hybrid approach (p=0.87, r=0.46, F1=0.60) compared bag-of-words (p=0.85, r=0.44, F1=0.58) bag-of-phrases r=0.42, F1=0.57).

参考文章(6)
Wenlei Mao, Wesley W. Chu, Free-text medical document retrieval via phrase-based vector space model. american medical informatics association annual symposium. pp. 489- 493 ,(2002)
David D. Lewis, Robert E. Schapire, James P. Callan, Ron Papka, Training algorithms for linear text classifiers international acm sigir conference on research and development in information retrieval. pp. 298- 306 ,(1996) , 10.1145/243199.243277
Fabrizio Sebastiani, Machine learning in automated text categorization ACM Computing Surveys. ,vol. 34, pp. 1- 47 ,(2002) , 10.1145/505282.505283
David D. Lewis, An evaluation of phrasal and clustered representations on a text categorization task international acm sigir conference on research and development in information retrieval. pp. 37- 50 ,(1992) , 10.1145/133160.133172
Srinivasan P, Ruiz M E, Hierarchical Neural Networks for Text Categorization. Sigir Forum. pp. 281- 282 ,(1999)