Classifier Evaluation with Missing Negative Class Labels

作者: Andrew K. Rider , Reid A. Johnson , Darcy A. Davis , T. Ryan Hoens , Nitesh V. Chawla

DOI: 10.1007/978-3-642-41398-8_33

关键词: Machine learningMathematicsArtificial intelligenceData miningClassifier (UML)

摘要: The concept of a negative class does not apply to many problems for which classification is increasingly utilized. In this study we investigate the reliability evaluation metrics when contains an unknown proportion mislabeled positive instances. We examine how can inform us about potential systematic biases in data. provide motivating case and general framework approaching show that behavior unstable presence uncertainty labels stability depends on kind bias Finally, type amount present data have significant effect ranking degree they over- or underestimate true performance classifiers.

参考文章(19)
R. B. Brem, L. Kruglyak, The landscape of genetic complexity across 5,700 gene expression traits in yeast Proceedings of the National Academy of Sciences of the United States of America. ,vol. 102, pp. 1572- 1577 ,(2005) , 10.1073/PNAS.0408709102
Jesse Davis, Mark Goadrich, The relationship between Precision-Recall and ROC curves Proceedings of the 23rd international conference on Machine learning - ICML '06. ,vol. 148, pp. 233- 240 ,(2006) , 10.1145/1143844.1143874
Andrew Chatr-aryamontri, Bobby-Joe Breitkreutz, Sven Heinicke, Lorrie Boucher, Andrew Winter, Chris Stark, Julie Nixon, Lindsay Ramage, Nadine Kolas, Lara O’Donnell, Teresa Reguly, Ashton Breitkreutz, Adnane Sellam, Daici Chen, Christie Chang, Jennifer Rust, Michael Livstone, Rose Oughtred, Kara Dolinski, Mike Tyers, The BioGRID interaction database: 2013 update Nucleic Acids Research. ,vol. 41, pp. 816- 823 ,(2012) , 10.1093/NAR/GKS1158
Karen R. Christie, Eurie L. Hong, J. Michael Cherry, Functional annotations for the Saccharomyces cerevisiae genome: the knowns and the known unknowns Trends in Microbiology. ,vol. 17, pp. 286- 294 ,(2009) , 10.1016/J.TIM.2009.04.005
Gaurav Pandey, Bin Zhang, Aaron N. Chang, Chad L. Myers, Jun Zhu, Vipin Kumar, Eric E. Schadt, An Integrative Multi-Network and Multi-Classifier Approach to Predict Genetic Interactions PLOS Computational Biology. ,vol. 6, pp. 376- 381 ,(2010) , 10.1371/JOURNAL.PCBI.1000928
George Forman, An extensive empirical study of feature selection metrics for text classification Journal of Machine Learning Research. ,vol. 3, pp. 1289- 1305 ,(2003)
David A. Cieslak, T. Ryan Hoens, Nitesh V. Chawla, W. Philip Kegelmeyer, Hellinger distance decision trees are robust and skew-insensitive Data Mining and Knowledge Discovery. ,vol. 24, pp. 136- 158 ,(2012) , 10.1007/S10618-011-0222-1
Charles Elkan, Keith Noto, Learning classifiers from only positive and unlabeled data Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 213- 220 ,(2008) , 10.1145/1401890.1401920
AP Bradley, RPW Duin, P Paclik, TCW Landgrebe, Precision-recall operating characteristic (P-ROC) curves in imprecise environments international conference on pattern recognition. ,vol. 4, pp. 123- 127 ,(2006) , 10.1109/ICPR.2006.941