The Problem of Cross-Validation: Averaging and Bias, Repetition and Significance

DOI: 10.1109/SCET.2012.6342143

关键词:

摘要: Cross-Validation (CV) is the primary mechanism used in Machine Learning to control generalization error absence of sufficiently large quantities marked up (tagged or labelled) data undertake independent testing, training and validation (including early stopping, feature selection, parameter tuning, boosting and/or fusion). Repeated (RCV) try further improve accuracy our performance estimates, including compensating for outliers. Typically a researcher will compare new target algorithm against wide range competing algorithms on standard datasets. The combination many folds, CV repetitions, parameterizations, sets, adds very number points compare, massive multiple testing problem quadratic individual test combinations. Research sometimes involves basic significance provides confidence intervals, but seldom addresses whereby assumption p<.05 means that we expect spurious "significant" result 1 20 pairs. This paper defines explores protocol reduces scale repeated whilst providing principled way erosion due testing.

参考文章(21)

David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)

Adham Atyabi, Sean P. Fitzgibbon, David M. W. Powers, Multiplying the Mileage of Your Dataset with Subwindowing Brain Informatics. pp. 173- 184 ,(2011) , 10.1007/978-3-642-23605-1_19

Remco R. Bouckaert, Eibe Frank, Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms Advances in Knowledge Discovery and Data Mining. pp. 3- 12 ,(2004) , 10.1007/978-3-540-24775-3_3

David Martin Ward Powers, None, The Problem with Kappa conference of the european chapter of the association for computational linguistics. pp. 345- 355 ,(2012)

Adham Atyabi, David M W Powers, The impact of segmentation and replication on non-overlapping windows: An EEG study international conference on information science and technology. pp. 668- 674 ,(2012) , 10.1109/ICIST.2012.6221730

J. Entwisle, D. M. W. Powers, The present use of statistics in the evaluation of NLP parsers Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning - NeMLaP3/CoNLL '98. pp. 215- 224 ,(1998) , 10.3115/1603899.1603935

Matthijs J. Warrens, Inequalities between multi-rater kappas Advanced Data Analysis and Classification. ,vol. 4, pp. 271- 286 ,(2010) , 10.1007/S11634-010-0073-4

Lisa R. David, Claire Sanger, David Fisher, Louis C. Argenta, Proboscis Lateralis Journal of Craniofacial Surgery. ,vol. 19, pp. 1107- 1113 ,(2008) , 10.1097/SCS.0B013E318176AC9F

Arie Ben-David, About the relationship between ROC curves and Cohen's kappa Engineering Applications of Artificial Intelligence. ,vol. 21, pp. 874- 882 ,(2008) , 10.1016/J.ENGAPPAI.2007.09.009

10.

A BENDAVID, Comparison of classification accuracy using Cohen’s Weighted Kappa Expert Systems with Applications. ,vol. 34, pp. 825- 832 ,(2008) , 10.1016/J.ESWA.2006.10.022

The Problem of Cross-Validation: Averaging and Bias, Repetition and Significance

来源期刊

我的账户

The Problem of Cross-Validation: Averaging and Bias, Repetition and Significance

来源期刊

相似文章 4

The Problem with Kappa

Teaching artificial intelligence to read electropherograms.

Prediction of Listeria monocytogenes ATCC 7644 growth on fresh-cut produce treated with bacteriophage and sucrose monolaurate by using artificial neural network

Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data.

我的账户