The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

作者: Qiong Wei , Roland L. Dunbrack

DOI: 10.1371/JOURNAL.PONE.0067863

关键词: BioinformaticsMatthews correlation coefficientReceiver operating characteristicUndersamplingData pointBinary numberTest dataClassifier (UML)BiologyBinary classificationGeneral Biochemistry, Genetics and Molecular BiologyGeneral Agricultural and Biological SciencesGeneral Medicine

摘要: … of category i having a positive value and the data set with an … of the testing data sets; train_30 is just a big higher for testing … Some authors have tried to estimate the percentage of SNPs …

参考文章(80)
Charles Elkan, The foundations of cost-sensitive learning international joint conference on artificial intelligence. pp. 973- 978 ,(2001)
Richard J Dobson, Patricia B Munroe, Mark J Caulfield, Mansoor AS Saqi, Predicting deleterious nsSNPs: an analysis of sequence and structural attributes BMC Bioinformatics. ,vol. 7, pp. 217- 217 ,(2006) , 10.1186/1471-2105-7-217
Zhi-Hua Zhou, Cost-sensitive learning modeling decisions for artificial intelligence. pp. 17- 18 ,(2011) , 10.1007/978-3-642-22589-5_2
Thorsten Joachims, Making large-scale support vector machine learning practical Advances in kernel methods. pp. 169- 184 ,(1999)
I Tomek, Two Modifications of CNN systems man and cybernetics. ,vol. 6, pp. 769- 772 ,(1976)