Text Categorization of Biomedical Data Sets Using Graph Kernels and a Controlled Vocabulary

作者: Said Bleik , Meenakshi Mishra , Jun Huan , Min Song , None

DOI: 10.1109/TCBB.2013.16

关键词:

摘要: Recently, graph representations of text have been showing improved performance over conventional bag-of-words in categorization applications. In this paper, we present a graph-based representation for biomedical articles and use kernels to classify those into high-level categories. our representation, common concepts semantic relationships are identified with the help an existing ontology used build rich structure that provides consistent feature set preserves additional information could improve classifier's performance. We attempt graphs using both set-based kernel is capable dealing disconnected nature simple linear kernel. Finally, report results comparing classification classifiers text-based classifiers.

参考文章(29)
Diane J. Cook, Kevin R. Gee, Text Classification Using Graph-Encoded Linguistic Elements. the florida ai research society. pp. 487- 492 ,(2005)
Text classification using string kernels Journal of Machine Learning Research. ,vol. 2, pp. 419- 444 ,(2002) , 10.1162/153244302760200687
Belur V. Dasarathy, Nearest neighbor (NN) norms: NN pattern classification techniques Los Alamitos: IEEE Computer Society Press. ,(1991)
Koji Tsuda, Akihiro Inokuchi, Hisashi Kashima, Marginalized kernels between labeled graphs international conference on machine learning. pp. 321- 328 ,(2003)
Ralitsa Angelova, Gerhard Weikum, Graph-based text classification Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 485- 492 ,(2006) , 10.1145/1148170.1148254
Manu Aery, Sharma Chakravarthy, InfoSift: Adapting graph mining techniques for text classification the florida ai research society. pp. 277- 282 ,(2005)
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0
Said Bleik, Min Song, Aaron Smalter, Jun Huan, Gerald Lushington, None, CGM: A biomedical text categorization approach using concept graph mining 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop. pp. 38- 43 ,(2009) , 10.1109/BIBMW.2009.5332134
M. E. Maron, Automatic Indexing: An Experimental Inquiry Journal of the ACM. ,vol. 8, pp. 404- 417 ,(1961) , 10.1145/321075.321084
CHRISTINA LESLIE, ELEAZAR ESKIN, WILLIAM STAFFORD NOBLE, The spectrum kernel: a string kernel for SVM protein classification. pacific symposium on biocomputing. pp. 564- 575 ,(2001) , 10.1142/9789812799623_0053