Dataset decay and the problem of sequential analyses on open datasets.

作者: William Hedley Thompson , Jessey Wright , Patrick G Bissett , Russell A Poldrack

DOI: 10.7554/ELIFE.53498

关键词:

摘要: Open data allows researchers to explore pre-existing datasets in new ways. However, if many reuse the same dataset, multiple statistical testing may increase false positives. Here we demonstrate that sequential hypothesis on dataset by can inflate error rates. We go discuss a number of correction procedures reduce positives, and challenges associated with these procedures.

参考文章(47)
Brian A Nosek, George Alter, George C Banks, Denny Borsboom, Sara D Bowman, Steven J Breckler, Stuart Buck, Christopher D Chambers, Gilbert Chin, Garret Christensen, M Contestabile, A Dafoe, Eric Eich, Jeremy Freese, R Glennerster, D Goroff, Donald P Green, B Hesse, M Humphreys, John Ishiyama, D Karlan, A Kraut, A Lupia, P Mabry, T Madon, N Malhotra, Evan Mayo-Wilson, M McNutt, Edward Miguel, E Levy Paluck, Uri Simonsohn, Courtney Soderberg, Barbara A Spellman, J Turitto, G VandenBos, Simine Vazire, Eric-Jan Wagenmakers, R Wilson, TSCIENTIFICSTANDARDS Yarkoni, None, Promoting an open research culture Science. ,vol. 348, pp. 1422- 1425 ,(2015) , 10.1126/SCIENCE.AAB2374
Angélique O. J. Cramer, Don van Ravenzwaaij, Dora Matzke, Helen Steingroever, Ruud Wetzels, Raoul P. P. P. Grasman, Lourens J. Waldorp, Eric-Jan Wagenmakers, Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies Psychonomic Bulletin & Review. ,vol. 23, pp. 640- 647 ,(2016) , 10.3758/S13423-015-0913-5
Ehud Aharoni, Saharon Rosset, Generalizedα-investing: definitions, optimality results and application to public databases Journal of the Royal Statistical Society: Series B (Statistical Methodology). ,vol. 76, pp. 771- 794 ,(2014) , 10.1111/RSSB.12048
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Leon Roth, Preserving Statistical Validity in Adaptive Data Analysis symposium on the theory of computing. pp. 117- 126 ,(2015) , 10.1145/2746539.2746580
Dean P. Foster, Robert A. Stine, α-investing: a procedure for sequential control of expected false discoveries Journal of the Royal Statistical Society: Series B (Statistical Methodology). ,vol. 70, pp. 429- 444 ,(2008) , 10.1111/J.1467-9868.2007.00643.X
Paul A. Games, Multiple Comparisons of Means American Educational Research Journal. ,vol. 8, pp. 531- 565 ,(1971) , 10.3102/00028312008003531
Robert Rosenthal, The file drawer problem and tolerance for null results Psychological Bulletin. ,vol. 86, pp. 638- 641 ,(1979) , 10.1037/0033-2909.86.3.638
E. Auerbach, D. Barch, T.E.J. Behrens, R. Bucholz, A. Chang, L. Chen, M. Corbetta, S.W. Curtiss, S. Della Penna, D. Feinberg, M.F. Glasser, N. Harel, A.C. Heath, L. Larson-Prior, D. Marcus, G. Michalareas, S. Moeller, R. Oostenveld, S.E. Petersen, F. Prior, B.L. Schlaggar, S.M. Smith, A.Z. Snyder, J. Xu, E. Yacoub, D.C. Van Essen, K. Ugurbil, The Human Connectome Project: A data acquisition perspective NeuroImage. ,vol. 62, pp. 2222- 2231 ,(2012) , 10.1016/J.NEUROIMAGE.2012.02.018
David C Van Essen, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens, Essa Yacoub, Kamil Ugurbil, Wu-Minn HCP Consortium, None, The WU-Minn Human Connectome Project: An Overview NeuroImage. ,vol. 80, pp. 62- 79 ,(2013) , 10.1016/J.NEUROIMAGE.2013.05.041