Big Data Cleaning

作者: Nan Tang

DOI: 10.1007/978-3-319-11116-2_2

关键词:

摘要: Data cleaning is, in fact, a lively subject that has played an important part the history of data management and analytics, it still is undergoing rapid development. Moreover, considered as main challenge era big data, due to increasing volume, velocity variety many applications. This paper aims provide overview recent work different aspects cleaning: error detection methods, repairing algorithms, generalized system. It also includes some discussion about our efforts methods from perspective terms variety.

参考文章(31)
Leopoldo Bertossi, Jan Chomicki, Query Answering in Inconsistent Databases Logics for Emerging Applications of Databases. pp. 43- 83 ,(2004) , 10.1007/978-3-642-18690-5_2
Leopoldo Bertossi, Solmaz Kolahi, Laks V. S. Lakshmanan, Data cleaning and query answering with matching dependencies and matching functions Proceedings of the 14th International Conference on Database Theory - ICDT '11. pp. 268- 279 ,(2011) , 10.1145/1938551.1938585
George Beskales, Gautam Das, Ahmed K. Elmagarmid, Ihab F. Ilyas, Felix Naumann, Mourad Ouzzani, Paolo Papotti, Jorge Quiane-Ruiz, Nan Tang, The data analytics group at the qatar computing research institute international conference on management of data. ,vol. 41, pp. 33- 38 ,(2013) , 10.1145/2430456.2430466
Wenfei Fan, Floris Geerts, Nan Tang, Wenyuan Yu, Inferring data currency and consistency for conflict resolution international conference on data engineering. pp. 470- 481 ,(2013) , 10.1109/ICDE.2013.6544848
Amr Ebaid, Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Si Yin, NADEEF Proceedings of the VLDB Endowment. ,vol. 6, pp. 1218- 1221 ,(2013) , 10.14778/2536274.2536280
Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin, NADEEF/ER: generic and interactive entity resolution international conference on management of data. pp. 1071- 1074 ,(2014) , 10.1145/2588555.2594511
Wenfei Fan, Jianzhong Li, Nan Tang, Wenyuan Yu, Incremental Detection of Inconsistencies in Distributed Data 2012 IEEE 28th International Conference on Data Engineering. pp. 318- 329 ,(2012) , 10.1109/ICDE.2012.82
I. P. Fellegi, D. Holt, A Systematic Approach to Automatic Edit and Imputation Journal of the American Statistical Association. ,vol. 71, pp. 17- 35 ,(1976) , 10.1080/01621459.1976.10481472
Xu Chu, I. F. Ilyas, P. Papotti, Holistic data cleaning: Putting violations into context international conference on data engineering. pp. 458- 469 ,(2013) , 10.1109/ICDE.2013.6544847
Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Nan Tang, NADEEF: a commodity data cleaning system international conference on management of data. pp. 541- 552 ,(2013) , 10.1145/2463676.2465327