A Feature Generation Algorithm for Sequences with Application to Splice-Site Prediction

作者: Rezarta Islamaj , Lise Getoor , W. John Wilbur

DOI: 10.1007/11871637_55

关键词:

摘要: In this paper we present a new approach to feature selection for sequence data. We identify general categories and give construction algorithms each of them. show how they can be integrated in system that tightly couples selection. This process, which refer as generation, allows us systematically search large space potential features. demonstrate the effectiveness our an important component gene finding problem, splice-site prediction. predictive models built using generation algorithm achieve significant improvement accuracy over existing, state-of-the-art approaches.

参考文章(19)
Won Kim, W. John Wilbur, DNA splice site detection: a comparison of specific and general methods. american medical informatics association annual symposium. pp. 390- 394 ,(2002)
Ron Kohavi, George H. John, The Wrapper Approach Springer, Boston, MA. pp. 33- 50 ,(1998) , 10.1007/978-1-4615-5725-8_3
Tong Zhang, Frank J. Oles, Text Categorization Based on Regularized Linear Classification Methods Information Retrieval. ,vol. 4, pp. 5- 31 ,(2001) , 10.1023/A:1011441423217
Huan Liu, Lei Yu, Feature selection for high-dimensional data: a fast correlation-based filter solution international conference on machine learning. pp. 856- 863 ,(2003)
Mehran Sahami, Daphne Koller, Toward optimal feature selection international conference on machine learning. pp. 284- 292 ,(1996)
MTW, Huan Liu, Hiroshi Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective Journal of the American Statistical Association. ,vol. 94, pp. 1390- ,(1998) , 10.2307/2669967
HUIQING LIU, LIMSOON WONG, Data mining tools for biological sequences. Journal of Bioinformatics and Computational Biology. ,vol. 1, pp. 139- 167 ,(2003) , 10.1142/S0219720003000216
Gene Yeo, Christopher B. Burge, Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals Journal of Computational Biology. ,vol. 11, pp. 377- 394 ,(2004) , 10.1089/1066527041410418
Avrim L. Blum, Pat Langley, Selection of relevant features and examples in machine learning Artificial Intelligence. ,vol. 97, pp. 245- 271 ,(1997) , 10.1016/S0004-3702(97)00063-5
Sven Degroeve, Yvan Saeys, Bernard De Baets, Pierre Rouzé, Yves Van de Peer, SpliceMachine: predicting splice sites from high-dimensional local context representations Bioinformatics. ,vol. 21, pp. 1332- 1338 ,(2005) , 10.1093/BIOINFORMATICS/BTI166