Filling the Missing Data of Air Pollutant Concentration Using Single Imputation Methods

作者: Norazian Mohamed Noor , A.S. Yahaya , N.A. Ramli , Mohd Mustafa Al Bakri Abdullah

DOI: 10.4028/WWW.SCIENTIFIC.NET/AMM.754-755.923

关键词: Mean absolute errorMean squared errorNearest neighbourMathematicsImputation (statistics)Missing dataStatisticsLinear interpolation

摘要: Hourly measured PM10 concentration at eight monitoring stations within peninsular Malaysia in 2006 was used to conduct the simulated missing data. The gap lengths of values are limited 12 hours since actual trend missingness is considered short. Two percentages gaps were generated that 5 % and 15 %. A number single imputation methods (linear interpolation (LI), nearest neighbour (NN), mean above below (MAB), daily (DM), 12-hour (12M), 6-hour (6M), row (RM) previous year (PY)) calculated fill In addition, multiple (MI) also conducted compare between methods. performances evaluated using four statistical criteria namely absolute error, root squared prediction accuracy index agreement. results show 6M perform comparably well LI. Thus, this effect smaller averaging time gives better prediction. Other predict data except for PY. RM MI performs moderately with increasing performance higher fraction whereas LR makes worst both percentages.

参考文章(15)
Jiann-Long Chen, Shafiqul Islam, Pratim Biswas, Nonlinear dynamics of hourly ozone concentrations. nonparametric short term prediction Atmospheric Environment. ,vol. 32, pp. 1839- 1848 ,(1998) , 10.1016/S1352-2310(97)00399-3
Roderick JA Little, None, Robust Estimation of the Mean and Covariance Matrix from Data with Missing Values Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 37, pp. 23- 38 ,(1988) , 10.2307/2347491
A.L. Bello, Imputation techniques in regression analysis: looking closely at their implementation Computational Statistics & Data Analysis. ,vol. 20, pp. 45- 57 ,(1995) , 10.1016/0167-9473(94)00024-D
Heikki Junninen, Harri Niska, Kari Tuppurainen, Juhani Ruuskanen, Mikko Kolehmainen, Methods for imputation of missing values in air quality data sets Atmospheric Environment. ,vol. 38, pp. 2895- 2907 ,(2004) , 10.1016/J.ATMOSENV.2004.02.026
A PLAIA, A BONDI, Single imputation method of missing values in environmental pollution data sets Atmospheric Environment. ,vol. 40, pp. 7316- 7330 ,(2006) , 10.1016/J.ATMOSENV.2006.06.040
Christine Bono, L. Douglas Ried, Carole Kimberlin, Bruce Vogel, Missing data on the Center for Epidemiologic Studies Depression Scale: A comparison of 4 imputation techniques Research in Social and Administrative Pharmacy. ,vol. 3, pp. 1- 27 ,(2007) , 10.1016/J.SAPHARM.2006.04.001
Roderick JA Little, Donald B Rubin, None, Statistical Analysis with Missing Data ,(1987)
Alan Olinsky, Shaw Chen, Lisa Harlow, The comparative efficacy of imputation methods for missing data in structural equation modeling European Journal of Operational Research. ,vol. 151, pp. 53- 79 ,(2003) , 10.1016/S0377-2217(02)00578-7
Geert J.M.G. van der Heijden, A. Rogier T. Donders, Theo Stijnen, Karel G.M. Moons, Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: A clinical example Journal of Clinical Epidemiology. ,vol. 59, pp. 1102- 1109 ,(2006) , 10.1016/J.JCLINEPI.2006.01.015
Kathy H. Li, Nhu D. Le, Li Sun, James V. Zidek, Spatial–temporal models for ambient hourly PM10 in Vancouver Environmetrics. ,vol. 10, pp. 321- 338 ,(1999) , 10.1002/(SICI)1099-095X(199905/06)10:3<321::AID-ENV355>3.0.CO;2-D