Tales of Huffman: An Exercise in Dealing with Messy Data

作者: Robert H. Carver

DOI:

关键词:

摘要: Statistics education reformers have for years called the use of real data in teaching introductory statistics(Ballman, 1997; Garfield et al., 2004; Hogg, 1991). Instructors now ready access to cases, textbookproblems and other exercises with accompanying well-documented sets or realistic data. On-line portalsand libraries provide a huge array keyed variously substantive topics statisticaltechniques suitable students. The vast majority these datasets tend already been cleaned up by their preparers. As enrichingas resources are, relatively few them offer students first-hand experience essential messiness of“real” There is good case be made that cleaning preparation belong introductorycourses (Burger & Leopold, 2001). Certainly, problems missing, dirty, incomplete are importanttopics within field (Hoyle, 1971; Rubin, 1976; Wagner, 2002). Using from Wright Brothers’ 1904 experiments, this leads intermediate studentsthrough process preparation, illustrating five common steps cleaning:standardizing format records, deciding how treat ambiguously recorded data, conversion measurementsto single standard unit, detecting resolving issues outliers, imputation missing

参考文章(11)
George E. P. Box, Patrick Y. T. Liu, Statistics as a Catalyst to Learning by Scientific Method Part I—An Example Journal of Quality Technology. ,vol. 31, pp. 1- 15 ,(1999) , 10.1080/00224065.1999.11979889
Karla Ballman, Greater Emphasis on Variation in an Introductory Statistics Course Journal of Statistics Education. ,vol. 5, ,(1997) , 10.1080/10691898.1997.11910529
Orville Wright, Wilbur Wright, Peter L. Jakab, Rick Young, The Published Writings of Wilbur and Orville Wright ,(1953)
Leonard S. Leonard S. Hobbs, The Wright Brothers' Engines and Their Design ,(2020)
Sidney Addelman, Statistics for experimenters ,(1978)
Robert V. Hogg, Statistical Education: Improvements are Badly Needed The American Statistician. ,vol. 45, pp. 342- 343 ,(1991) , 10.1080/00031305.1991.10475832
DONALD B. RUBIN, Inference and missing data Biometrika. ,vol. 63, pp. 581- 592 ,(1976) , 10.1093/BIOMET/63.3.581
M. H. Hoyle, Spoilt Data-An Introduction and Bibliography Journal of the Royal Statistical Society. Series A (General). ,vol. 134, pp. 429- 439 ,(1971) , 10.2307/2344243