Nonparametric Generation of Synthetic Data for Small Geographic Areas

作者: Joseph W. Sakshaug , Trivellore E. Raghunathan

DOI: 10.1007/978-3-319-11257-2_17

关键词:

摘要: Computing and releasing statistics for small geographic areas is a common task many statistical agencies, but public-use microdata these much less due to data confidentiality concerns. Accessing the restricted usually only possible within research center (RDC). This arrangement inconvenient researchers who must travel large distances and, in some cases, pay sizeable usage fee access nearest RDC. An alternative dissemination method that has been explored release synthetic data. In general, consists of imputed values drawn from predictive model based on observed Data preserved because no actual are released. The typically standard, parametric distribution, often key variables interest do not follow strict forms. this paper, we apply nonparametric generating continuous collected areas. evaluated using 2005-2007 American Community Survey. analytic validity assessed by comparing (baseline) inferences obtained with those

参考文章(30)
D. V. Lindley, A. F. M. Smith, Bayes Estimates for the Linear Model Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 34, pp. 1- 18 ,(1972) , 10.1111/J.2517-6161.1972.TB00885.X
Jerome P. Reiter, Gregory Caiola, Random Forests for Generating Partially Synthetic, Categorical Data Transactions on Data Privacy. ,vol. 3, pp. 27- 42 ,(2010) , 10.5555/1747335.1747337
Jerome P. Reiter, D. B. Rubin, Trivellore E. Raghunathan, Multiple Imputation for Statistical Disclosure Limitation Journal of Official Statistics. ,vol. 19, pp. 1- ,(2003)
Jin-Mann S. Lin, Small Area Estimation ,(2003)
J.P. Reiter, Using CART to generate partially synthetic public use microdata Journal of Official Statistics. ,vol. 21, pp. 441- 462 ,(2005)
Xiao-Li Meng, Multiple-Imputation Inferences with Uncongenial Sources of Input Statistical Science. ,vol. 9, pp. 538- 558 ,(1994) , 10.1214/SS/1177010269
Joseph W. Sakshaug, Trivellore E. Raghunathan, Synthetic data for small area estimation privacy in statistical databases. pp. 162- 173 ,(2010) , 10.1007/978-3-642-15838-4_15
Frank E Harrell, Jr, Frank E Harrell, Ordinal Logistic Regression Springer, New York, NY. pp. 311- 325 ,(2001) , 10.1007/978-1-4757-3462-1_13
Donald B. Rubin, Nathaniel Schenker, Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse Journal of the American Statistical Association. ,vol. 81, pp. 366- 374 ,(1986) , 10.1080/01621459.1986.10478280