Small values in big data: The continuing need for appropriate metadata

作者: Craig A. Stow , Katherine E. Webster , Tyler Wagner , Noah Lottig , Patricia A. Soranno

DOI: 10.1016/J.ECOINF.2018.03.002

关键词:

摘要: Abstract Compiling data from disparate sources to address pressing ecological issues is increasingly common. Many datasets contain left-censored – observations below an analytical detection limit. Studies single and typically small show that common approaches for handling censored — e.g., deletion or substituting fixed values result in systematic biases. However, no studies have explored the degree which documentation presence of influence outcomes large, multi-sourced datasets. We describe a lake water quality database assembled 74 illustrate challenges dealing with big data, including limits are absent, range widely, trends over time. substitutions can also bias analyses using ‘big data’ datasets, be effectively handled modern quantitative approaches, but such rely on accurate metadata treatment each source.

参考文章(24)
TYLER WAGNER, PATRICIA A. SORANNO, KATHERINE E. WEBSTER, KENDRA SPENCE CHERUVELIL, Landscape drivers of regional variation in the relationship between total phosphorus and chlorophyll in lakes Freshwater Biology. ,vol. 56, pp. 1811- 1824 ,(2011) , 10.1111/J.1365-2427.2011.02621.X
YoonKyung Cha, Seok Soon Park, Kyunghyun Kim, Myeongseop Byeon, Craig A. Stow, Probabilistic prediction of cyanobacteria abundance in a Korean reservoir using a Bayesian Poisson model Water Resources Research. ,vol. 50, pp. 2518- 2532 ,(2014) , 10.1002/2013WR014372
A. H. El-Shaarawi, D. M. Dolan, Maximum likelihood estimation of water quality concentrations from censored data Canadian Journal of Fisheries and Aquatic Sciences. ,vol. 46, pp. 1033- 1039 ,(1989) , 10.1139/F89-134
W. Foreman, J. Gray, A. Chalmers, P.J. Phillips, C. Schubert, D. Argue, I. Fisher, E.T. Furlong, Concentrations of hormones, pharmaceuticals and other micropollutants in groundwater affected by septic systems in New England and New York Science of The Total Environment. ,vol. 512, pp. 43- 54 ,(2015) , 10.1016/J.SCITOTENV.2014.12.067
Ronald C. Antweiler, Howard E. Taylor, Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics. Environmental Science & Technology. ,vol. 42, pp. 3732- 3738 ,(2008) , 10.1021/ES071301C
Jacob Carstensen, Censored data regression: Statistical methods for analyzing Secchi transparency in shallow systems Limnology and Oceanography-methods. ,vol. 8, pp. 376- 385 ,(2010) , 10.4319/LOM.2010.8.376
Robert J. Gilliom, Dennis R. Helsel, Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques Water Resources Research. ,vol. 22, pp. 135- 146 ,(1986) , 10.1029/WR022I002P00135
Jian Yun, Song S. Qian, A Hierarchical Model for Estimating Long-Term Trend of Atrazine Concentration in the Surface Water of the Contiguous U.S. JAWRA Journal of the American Water Resources Association. ,vol. 51, pp. 1128- 1137 ,(2015) , 10.1111/JAWR.12284
Dennis R. Helsel, More than obvious: better methods for interpreting nondetect data. Environmental Science & Technology. ,vol. 39, ,(2005) , 10.1021/ES053368A