Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors

作者: Caroline König , Martha I Cárdenas , Jesús Giraldo , René Alquézar , Alfredo Vellido

DOI: 10.1186/S12859-015-0731-9

关键词: Peptide sequenceComputational biologyClass (philosophy)Phylogenetic treeDNA microarrayG protein-coupled receptorRelevance (information retrieval)Support vector machineProteomicsNoiseData miningComputer science

摘要: Background The characterization of proteins in families and subfamilies, at different levels, entails the definition use class labels. When adscription a protein to family is uncertain, or even wrong, this becomes an instance what has come be known as label noise problem. Label potentially negative effect on any quantitative analysis that depends information. This study investigates C G protein-coupled receptors, which are cell membrane relevance both biology general pharmacology particular. Their supervised classification into subtypes, based primary sequence data, hampered by noise. The latter may stem from combination expert knowledge limitations lack clear correspondence between labels mostly reflect GPCR functionality representations sequences.

参考文章(61)
Raúl Cruz-Barbosa, Alfredo Vellido, Jesús Giraldo, The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors Medical & Biological Engineering & Computing. ,vol. 53, pp. 137- 149 ,(2015) , 10.1007/S11517-014-1218-Y
M. A. Aizerman, Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning Automation and Remote Control. ,vol. 25, pp. 821- 837 ,(1964)
Thomas G. Dietterich, Ensemble Methods in Machine Learning Multiple Classifier Systems. pp. 1- 15 ,(2000) , 10.1007/3-540-45014-9_1
Dragan Gamberger, Borut Sluban, Nada Lavra, Advances in Class Noise Detection european conference on artificial intelligence. pp. 1105- 1106 ,(2010) , 10.3233/978-1-60750-606-5-1105
Caroline König, Raúl Cruz-Barbosa, René Alquézar, Alfredo Vellido, SVM-Based Classification of Class C GPCRs from Alignment-Free Physicochemical Transformations of Their Sequences international conference on image analysis and processing. pp. 336- 343 ,(2013) , 10.1007/978-3-642-41190-8_36
André L. B. Miranda, Luís Paulo F. Garcia, André C. P. L. F. Carvalho, Ana C. Lorena, Use of Classification Algorithms in Noise Detection and Elimination hybrid artificial intelligence systems. pp. 417- 424 ,(2009) , 10.1007/978-3-642-02319-4_50
Carla E. Brodley, Mark A. Friedl, Identifying mislabeled training data Journal of Artificial Intelligence Research. ,vol. 11, pp. 131- 167 ,(1999) , 10.1613/JAIR.606
Paulo AS Nuin, Zhouzhi Wang, Elisabeth RM Tillier, The accuracy of several multiple sequence alignment programs for proteins BMC Bioinformatics. ,vol. 7, pp. 471- 471 ,(2006) , 10.1186/1471-2105-7-471
Kaushala Jayawardana, Sarah-Jane Schramm, Lauren Haydu, John F. Thompson, Richard A. Scolyer, Graham J. Mann, Samuel Müller, Jean Yee Hwa Yang, Determination of prognosis in metastatic melanoma through integration of clinico-pathologic, mutation, mRNA, microRNA, and protein information. International Journal of Cancer. ,vol. 136, pp. 863- 874 ,(2015) , 10.1002/IJC.29047
Da-Fei Feng, Russell F. Doolittle, Progressive sequence alignment as a prerequisitetto correct phylogenetic trees Journal of Molecular Evolution. ,vol. 25, pp. 351- 360 ,(1987) , 10.1007/BF02603120