The Problem of Cross-Validation: Averaging and Bias, Repetition and Significance

作者: David M. W. Powers , Adham Atyabi

DOI: 10.1109/SCET.2012.6342143

关键词:

摘要: Cross-Validation (CV) is the primary mechanism used in Machine Learning to control generalization error absence of sufficiently large quantities marked up (tagged or labelled) data undertake independent testing, training and validation (including early stopping, feature selection, parameter tuning, boosting and/or fusion). Repeated (RCV) try further improve accuracy our performance estimates, including compensating for outliers. Typically a researcher will compare new target algorithm against wide range competing algorithms on standard datasets. The combination many folds, CV repetitions, parameterizations, sets, adds very number points compare, massive multiple testing problem quadratic individual test combinations. Research sometimes involves basic significance provides confidence intervals, but seldom addresses whereby assumption p<.05 means that we expect spurious "significant" result 1 20 pairs. This paper defines explores protocol reduces scale repeated whilst providing principled way erosion due testing.

参考文章(21)
David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)
Adham Atyabi, Sean P. Fitzgibbon, David M. W. Powers, Multiplying the Mileage of Your Dataset with Subwindowing Brain Informatics. pp. 173- 184 ,(2011) , 10.1007/978-3-642-23605-1_19
Remco R. Bouckaert, Eibe Frank, Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Advances in Knowledge Discovery and Data Mining. pp. 3- 12 ,(2004) , 10.1007/978-3-540-24775-3_3
David Martin Ward Powers, None, The Problem with Kappa conference of the european chapter of the association for computational linguistics. pp. 345- 355 ,(2012)
Adham Atyabi, David M W Powers, The impact of segmentation and replication on non-overlapping windows: An EEG study international conference on information science and technology. pp. 668- 674 ,(2012) , 10.1109/ICIST.2012.6221730
J. Entwisle, D. M. W. Powers, The present use of statistics in the evaluation of NLP parsers Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning - NeMLaP3/CoNLL '98. pp. 215- 224 ,(1998) , 10.3115/1603899.1603935
Matthijs J. Warrens, Inequalities between multi-rater kappas Advanced Data Analysis and Classification. ,vol. 4, pp. 271- 286 ,(2010) , 10.1007/S11634-010-0073-4
Lisa R. David, Claire Sanger, David Fisher, Louis C. Argenta, Proboscis Lateralis Journal of Craniofacial Surgery. ,vol. 19, pp. 1107- 1113 ,(2008) , 10.1097/SCS.0B013E318176AC9F
Arie Ben-David, About the relationship between ROC curves and Cohen's kappa Engineering Applications of Artificial Intelligence. ,vol. 21, pp. 874- 882 ,(2008) , 10.1016/J.ENGAPPAI.2007.09.009
A BENDAVID, Comparison of classification accuracy using Cohen’s Weighted Kappa Expert Systems with Applications. ,vol. 34, pp. 825- 832 ,(2008) , 10.1016/J.ESWA.2006.10.022