Automatic Web-based relational data imputation

作者: Hailong Liu , Zhanhuai Li , Qun Chen , Zhaoqiang Chen

DOI: 10.1007/S11704-016-6319-3

关键词:

摘要: Data incompleteness is one of the most important data quality problems in enterprise information systems. Most existing imputing techniques just deduce approximate values for incomplete attributes by means some specific rules or mathematical methods. Unfortunately, approximation may be far away from truth. Furthermore, when observed inadequate, they will not work well. The World Wide Web (WWW) has become and widely used source. Several current works have proven that using can augment databases. In this paper, we propose a Web-based relational framework, which tries to automatically retrieve real WWW attributes. try take full advantage relations among different kinds objects based on idea same kind things must with their relatives world. Our proposed consist two automatic query formulation algorithms graph-based candidates extraction model. evaluations are high-quality datasets poor-quality dataset prove effectiveness our approaches.

参考文章(38)
Kamakshi Lakshminarayan, Robert Goldman, Steven A. Harp, Tariq Samad, Imputation of missing data using machine learning techniques knowledge discovery and data mining. pp. 140- 145 ,(1996)
Marco Ramoni, Paola Sebastiani, Robust Learning with Missing Data Machine Learning. ,vol. 45, pp. 147- 170 ,(2001) , 10.1023/A:1010968702992
David Loshin, Master Data Management ,(2008)
Shichao Zhang, Jilian Zhang, Xiaofeng Zhu, Yongsong Qin, Chengqi Zhang, Missing value imputation based on data clustering trans. computational science. ,vol. 1, pp. 128- 138 ,(2008) , 10.1007/978-3-540-79299-4_7
Zhixu Li, Mohamed A. Sharaf, Laurianne Sitbon, Shazia Sadiq, Marta Indulska, Xiaofang Zhou, WebPut: efficient web-based data imputation web information systems engineering. ,vol. 7651, pp. 243- 256 ,(2012) , 10.1007/978-3-642-35063-4_18
V. Rao Vemuri, Na Tang, Web-Based Knowledge Acquisition to Impute Missing Values for Classification web intelligence. pp. 124- 130 ,(2004) , 10.1109/WI.2004.163
Haoqiong Bian, Yueguo Chen, Xiaoyong Du, Xiaolu Zhang, MetKB Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13. pp. 2461- 2464 ,(2013) , 10.1145/2505515.2508209
Gustavo E. A. P. A. Batista, Maria Carolina Monard, An analysis of four missing data treatment methods for supervised learning Applied Artificial Intelligence. ,vol. 17, pp. 519- 533 ,(2003) , 10.1080/713827181