Data Cleaning: Problems and Current Approaches.

作者: Erhard Rahm , Hong Hai Do

DOI:

关键词: Data scienceComputer scienceProcess (engineering)Data warehouseData qualityDatabase designData modelingCurrent (fluid)Data cleansing

摘要: We classify data quality problems that are addressed by cleaning and provide an overview of the main solution approaches. Data is especially required when integrating heterogeneous sources should be together with schema-related transformations. In warehouses, a major part so-called ETL process. also discuss current tool support for cleaning.

参考文章(34)
Thomas Bergstraesser, Philip A. Bernstein, Meta-Data Support for Data Transformations Using Microsoft Repository. IEEE Data(base) Engineering Bulletin. ,vol. 22, pp. 9- 14 ,(1999)
Alvaro E. Monge, Matching Algorithms within a Duplicate Detection System. IEEE Data(base) Engineering Bulletin. ,vol. 23, pp. 14- 20 ,(2000)
Usama M. Fayyad, Mining Databases: Towards Algorithms for Knowledge Discovery. IEEE Data(base) Engineering Bulletin. ,vol. 21, pp. 39- 48 ,(1998)
Michael Stonebraker, Joseph Hellerstein, Rick Caccia, Open enterprise data integration ,(1999)
Joseph Hellerstein, Vijayshankar Raman, Potters Wheel: An interactive framework for data cleaning ,(2000)
Edward L. Wimmers, Renée J. Miller, Peter M. Schwarz, Mary Tork Roth, Laura M. Haas, B. Niswonger, Transforming Heterogeneous Data with Database Middleware: Beyond Integration. IEEE Data(base) Engineering Bulletin. ,vol. 22, pp. 31- 36 ,(1999)
Erhard Rahm, Hong Hai Do, On Metadata Interoperability in Data Warehouses ,(2000)
Pedro M. Domingos, AnHai Doan, Alon Y. Levy, Learning Source Description for Data Integration. WebDB (Informal Proceedings). pp. 81- 86 ,(2000)
Joseph Hellerstein, Vijayshankar Raman, Potter''s Wheel: An Interactive Framework for Data Transformation and Cleaning very large data bases. ,(2001)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)