作者: Caroline König , Martha I Cárdenas , Jesús Giraldo , René Alquézar , Alfredo Vellido
DOI: 10.1186/S12859-015-0731-9
关键词: Peptide sequence 、 Computational biology 、 Class (philosophy) 、 Phylogenetic tree 、 DNA microarray 、 G protein-coupled receptor 、 Relevance (information retrieval) 、 Support vector machine 、 Proteomics 、 Noise 、 Data mining 、 Computer science
摘要: Background The characterization of proteins in families and subfamilies, at different levels, entails the definition use class labels. When adscription a protein to family is uncertain, or even wrong, this becomes an instance what has come be known as label noise problem. Label potentially negative effect on any quantitative analysis that depends information. This study investigates C G protein-coupled receptors, which are cell membrane relevance both biology general pharmacology particular. Their supervised classification into subtypes, based primary sequence data, hampered by noise. The latter may stem from combination expert knowledge limitations lack clear correspondence between labels mostly reflect GPCR functionality representations sequences.