Identifying Gene Function Descriptions by Probability-based Sentence Selection.

作者: Nihar Sheth , Kazuhiro Seki , Javed Mostafa

DOI:

关键词:

摘要: This paper proposes an approach to the secondary task in TREC Genomics Track. We regard as identification of sentences describing gene functions (i.e., GeneRIFs) and propose a method considering two factors: topicality relevance. The former refers sentence is measured based on location information word frequencies article. latter relevance GeneRIF vocabulary used formalize probabilistic model combining these features. Our evaluated test set 139 MEDLINE abstracts, results demonstrate that (a) function words input could help identify descriptions (b) there peculiar GeneRIFs (c) shows highest predictive power for this particular despite its simplicity. Additionally, we examine some alternative methods comparison with our method.

参考文章(9)
Yihong Gong, Xin Liu, Generic text summarization using relevance measure and latent semantic analysis international acm sigir conference on research and development in information retrieval. pp. 19- 25 ,(2001) , 10.1145/383952.383955
Wesley T. Chuang, Jihoon Yang, Extracting sentence segments for text summarization: a machine learning approach international acm sigir conference on research and development in information retrieval. pp. 152- 159 ,(2000) , 10.1145/345508.345566
Yiming Yang, Xin Liu, A re-examination of text categorization methods international acm sigir conference on research and development in information retrieval. pp. 42- 49 ,(1999) , 10.1145/312624.312647
William Hersh, Text retrieval conference (TREC) genomics pre-track workshop acm/ieee joint conference on digital libraries. pp. 428- 428 ,(2002) , 10.1145/544220.544378
Hermann Ney, Ute Essen, Reinhard Kneser, On structuring probabilistic dependences in stochastic language modelling Computer Speech & Language. ,vol. 8, pp. 1- 38 ,(1994) , 10.1006/CSLA.1994.1001
Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, Gregory Marton, Quantitative evaluation of passage retrieval algorithms for question answering international acm sigir conference on research and development in information retrieval. pp. 41- 47 ,(2003) , 10.1145/860435.860445
M.F. Porter, An algorithm for suffix stripping Program: Electronic Library and Information Systems. ,vol. 40, pp. 313- 316 ,(1997) , 10.1108/EB046814
William R. Hersh, Ravi Teja Bhupatiraju, TREC Genomics Track Overview text retrieval conference. pp. 14- 23 ,(2003)
Kim D Pruitt, Donna R Maglott, RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Research. ,vol. 29, pp. 137- 140 ,(2001) , 10.1093/NAR/29.1.137