An EffectiveMulti-Layer Model for Controlling the Quality of Data

作者: C.K.-S. Leung , M.A.F. Mateo , A.J. Nadler

DOI: 10.1109/IDEAS.2007.12

关键词: Data integrityData warehouseData modelingData efficiencyData consistencyComputer scienceData miningData validationData qualityData stream mining

摘要: Data mining aims to search for implicit, previously unknown, and potentially useful information that might be embedded in the data. It is well known "garbage in, garbage out". Hence, get meaningful results, a clean set of data essential. In this paper, we propose an effective model controlling quality Specifically, three-layer focuses on validity consistency. To elaborate, internal layer ensures observed are valid their values fall within reasonable ranges. The temporal consistent with behaviour. spatial neighbours. A case study applying our proposed real-life weather agricultural application shows improving quality, thus leading better results. important note not confined applications. We also discuss, how can effectively applicable control some other situations.

参考文章(20)
Nick Koudas, Beng Chin Ooi, Suresh Venkatasubramanian, Divesh Srivastava, Bing Tian Dai, Column heterogeneity as a measure of data quality CleanDB. pp. 1- ,(2006)
Michael Benedikt, Glenn Bruns, Philip Bohannon, Data Cleaning for Decision Support. CleanDB. ,(2006)
Rajeev Rastogi, Kyuseok Shim, PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning Data Mining and Knowledge Discovery. ,vol. 4, pp. 315- 344 ,(2000) , 10.1023/A:1009887311454
Carson Kai-Sang Leung, Ruppa K. Thulasiram, Dmitri A. Bondarenko, An Efficient System for Detecting Outliers from Financial Time Series Flexible and Efficient Information Handling. pp. 190- 198 ,(2006) , 10.1007/11788911_16
Thomas Reek, Stephen R. Doty, Timothy W. Owen, A Deterministic Approach to the Validation of Historical Daily Temperature and Precipitation Data from the Cooperative Network Bulletin of the American Meteorological Society. ,vol. 73, pp. 753- 762 ,(1992) , 10.1175/1520-0477(1992)073<0753:ADATTV>2.0.CO;2
Edwin M. Knorr, Raymond T. Ng, Vladimir Tucakov, Distance-based outliers: algorithms and applications very large data bases. ,vol. 8, pp. 237- 253 ,(2000) , 10.1007/S007780050006
Malik Agyemang, Ken Barker, Rada Alhajj, A comprehensive survey of numeric and symbolic outlier mining techniques intelligent data analysis. ,vol. 10, pp. 521- 538 ,(2006) , 10.3233/IDA-2006-10604
C.K. Leung, Q.I. Khan, T. Hoque, CanTree: a tree structure for efficient incremental mining of frequent patterns international conference on data mining. pp. 274- 281 ,(2005) , 10.1109/ICDM.2005.38
P. Bhattacharya, M. Rahman, B.C. Desai, Image Representation and Retrieval Using Support Vector Machine and Fuzzy C-means Clustering Based Semantical Spaces international conference on pattern recognition. ,vol. 2, pp. 1162- 1168 ,(2006) , 10.1109/ICPR.2006.687