DEFINITION AND ANALYSIS OF POPULATION-BASED DATA COMPLETENESS MEASUREMENT

作者: Nurul A Emran , Suzanne Embury

DOI:

关键词: MathematicsSoftwareData miningPopulationData qualityCompleteness (statistics)Data setPopulation based dataMissing dataFunctional dependency

摘要: Poor quality data such as with errors or missing values cause negative consequences in many application domains. An important aspect of is completeness. One problem completeness the individuals sets. Within a set, refer to real world entities whose information recorded. So far, studies however, there has been little discussion about how are assessed. In this thesis, we propose notion population-based (PBC) that deals problem, aim investigating what required measure PBC and identify needed support measurements practice. To achieve these aims, analyse elements requirements for measurement, resulting definition measurement formula. We an architecture systems determine technical terms software hardware components. analysis issues arise implementing makes contribution understanding feasibility provide accurate results. Further exploration particular issue was discovered showed when measuring across multiple databases, from those databases need be integrated materialised. Unfortunately, requirement may lead large internal store system impractical maintain. approach test hypothesis available storage space can optimised by materialising only partial contributing while retaining accuracy measurements. Our involves substituting some attributes smaller alternatives, exploiting approximate functional dependencies (AFDs) within each local database. space-accuracy trade-offs leads development algorithm assess candidate alternative space-saving (of measurement). The result several case conducted proxy assessment contributes offered proxies. A better dealing achieved through proposal investigation PBC,

参考文章(102)
Carlo Batini, Monica Scannapieco, Completeness in the Relational Model: a Comprehensive Framework. ICIQ. pp. 333- 345 ,(2004)
Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, Cristian-Augustin Saita, Declarative Data Cleaning: Language, Model, and Algorithms very large data bases. pp. 371- 380 ,(2001)
JT McDonnell, S Stumpf, Data, Information and Knowledge Quality in Retail Security Decision Making In: Tochtermann, K and Maurer, H, (eds.) (Proceedings) Proc. of the 3rd International Conference on Knowledge Management (I-KNOW'03). (pp. pp. 344-351). (2003). ,(2003)
Klaus Petrik, Participation and e-democracy how to utilize web 2.0 for policy decision-making international conference on digital government research. ,vol. 390, pp. 254- 263 ,(2009) , 10.5555/1556176.1556222
Syed Saif ur Rahman, Eike Schallehn, Gunter Saake, ECOS: Evolutionary Column-Oriented Storage Lecture Notes in Computer Science. pp. 18- 32 ,(2011) , 10.1007/978-3-642-24577-0_4
Barbara Pernici, Monica Scannapieco, Data Quality in Web Information Systems international conference on conceptual modeling. pp. 397- 413 ,(2002) , 10.1007/978-3-540-39733-5_3
Richard Y. Wang, Diane M. Strong, Beyond accuracy: what data quality means to data consumers Journal of Management Information Systems. ,vol. 12, pp. 5- 33 ,(1996) , 10.1080/07421222.1996.11518099
Hector Garcia-Molina, Jennifer Widom, Jeffrey D. Ullman, Database Systems: The Complete Book ,(2001)
Claudia Rolker, Felix Naumann, Assessment Methods for Information Quality Criteria IQ. pp. 148- 162 ,(2000) , 10.18452/2441
Sandra de F. Mendes Sampaio, Pedro R. Falcone Sampaio, Incorporating completeness quality support in internet query systems conference on advanced information systems engineering. pp. 17- 20 ,(2007)