Enriching data imputation with extensive similarity neighbors

作者: Shaoxu Song , Aoqian Zhang , Lei Chen , Jianmin Wang

DOI: 10.14778/2809974.2809989

关键词:

摘要: Incomplete information often occur along with many database applications, e.g., in data integration, cleaning or exchange. The idea of imputation is to fill the missing values its neighbors who share same information. Such could either be identified certainly by editing rules statistically relational dependency networks. Unfortunately, owing sparsity, number (identified w.r.t. value equality) rather limited, especially presence variances. In this paper, we argue extensively enrich similarity tolerance small variations. More fillings can thus acquired that aforesaid equality fail reveal. To more, study problem maximizing imputation. Our major contributions include (1) np-hardness analysis on solving and approximating problem, (2) exact algorithms for tackling (3) efficient approximation performance guarantees. Experiments real synthetic sets demonstrate filling accuracy improved.

参考文章(19)
Shichao Zhang, Jilian Zhang, Xiaofeng Zhu, Yongsong Qin, Chengqi Zhang, Missing value imputation based on data clustering trans. computational science. ,vol. 1, pp. 128- 138 ,(2008) , 10.1007/978-3-540-79299-4_7
Wenfei Fan, Xibei Jia, Jianzhong Li, Shuai Ma, Reasoning about record matching rules Proceedings of the VLDB Endowment. ,vol. 2, pp. 407- 418 ,(2009) , 10.14778/1687627.1687674
Gonzalo Navarro, A guided tour to approximate string matching ACM Computing Surveys. ,vol. 33, pp. 31- 88 ,(2001) , 10.1145/375360.375365
Shaoxu Song, Lei Chen, Hong Cheng, Parameter-Free Determination of Distance Thresholds for Metric Distance Constraints 2012 IEEE 28th International Conference on Data Engineering. pp. 846- 857 ,(2012) , 10.1109/ICDE.2012.46
Xu Chu, I. F. Ilyas, P. Papotti, Holistic data cleaning: Putting violations into context international conference on data engineering. pp. 458- 469 ,(2013) , 10.1109/ICDE.2013.6544847
Roderick JA Little, Donald B Rubin, None, Statistical Analysis with Missing Data ,(1987)
Solmaz Kolahi, Laks V. S. Lakshmanan, On approximating optimum repairs for functional dependency violations Proceedings of the 12th International Conference on Database Theory - ICDT '09. pp. 53- 62 ,(2009) , 10.1145/1514894.1514901
Sen Wu, Xiaodong Feng, Yushan Han, Qiang Wang, Missing categorical data imputation approach based on similarity systems, man and cybernetics. pp. 2827- 2832 ,(2012) , 10.1109/ICSMC.2012.6378177
Jiannan Wang, Nan Tang, Towards dependable data repairing with fixing rules international conference on management of data. pp. 457- 468 ,(2014) , 10.1145/2588555.2610494
Leonid Libkin, Limsoon Wong, Semantic representations and query languages for or-sets Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '93. pp. 37- 48 ,(1993) , 10.1145/153850.153854