Enriching data imputation with extensive similarity neighbors

作者： Shaoxu Song , Aoqian Zhang , Lei Chen , Jianmin Wang

关键词:

摘要: Incomplete information often occur along with many database applications, e.g., in data integration, cleaning or exchange. The idea of imputation is to fill the missing values its neighbors who share same information. Such could either be identified certainly by editing rules statistically relational dependency networks. Unfortunately, owing sparsity, number (identified w.r.t. value equality) rather limited, especially presence variances. In this paper, we argue extensively enrich similarity tolerance small variations. More fillings can thus acquired that aforesaid equality fail reveal. To more, study problem maximizing imputation. Our major contributions include (1) np-hardness analysis on solving and approximating problem, (2) exact algorithms for tackling (3) efficient approximation performance guarantees. Experiments real synthetic sets demonstrate filling accuracy improved.

uni-trier.de PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(19)

Shichao Zhang, Jilian Zhang, Xiaofeng Zhu, Yongsong Qin, Chengqi Zhang, Missing value imputation based on data clustering trans. computational science. ,vol. 1, pp. 128- 138 ,(2008) , 10.1007/978-3-540-79299-4_7

Wenfei Fan, Xibei Jia, Jianzhong Li, Shuai Ma, Reasoning about record matching rules Proceedings of the VLDB Endowment. ,vol. 2, pp. 407- 418 ,(2009) , 10.14778/1687627.1687674

Gonzalo Navarro, A guided tour to approximate string matching ACM Computing Surveys. ,vol. 33, pp. 31- 88 ,(2001) , 10.1145/375360.375365

Shaoxu Song, Lei Chen, Hong Cheng, Parameter-Free Determination of Distance Thresholds for Metric Distance Constraints 2012 IEEE 28th International Conference on Data Engineering. pp. 846- 857 ,(2012) , 10.1109/ICDE.2012.46

Xu Chu, I. F. Ilyas, P. Papotti, Holistic data cleaning: Putting violations into context international conference on data engineering. pp. 458- 469 ,(2013) , 10.1109/ICDE.2013.6544847

Roderick JA Little, Donald B Rubin, None, Statistical Analysis with Missing Data ,(1987)

Solmaz Kolahi, Laks V. S. Lakshmanan, On approximating optimum repairs for functional dependency violations Proceedings of the 12th International Conference on Database Theory - ICDT '09. pp. 53- 62 ,(2009) , 10.1145/1514894.1514901

Sen Wu, Xiaodong Feng, Yushan Han, Qiang Wang, Missing categorical data imputation approach based on similarity systems, man and cybernetics. pp. 2827- 2832 ,(2012) , 10.1109/ICSMC.2012.6378177

Jiannan Wang, Nan Tang, Towards dependable data repairing with fixing rules international conference on management of data. pp. 457- 468 ,(2014) , 10.1145/2588555.2610494

10.

Leonid Libkin, Limsoon Wong, Semantic representations and query languages for or-sets Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '93. pp. 37- 48 ,(1993) , 10.1145/153850.153854

Enriching data imputation with extensive similarity neighbors

来源期刊

我的账户

Enriching data imputation with extensive similarity neighbors

来源期刊

相似文章 10

我的账户