Estimating the Quality of Data in Relational Databases.

作者: Amihai Motro , Igor Rakov

DOI:

关键词:

摘要: With more and electronic information sources becoming widely available, the issue of quality these, often-competing, has become germane. We propose a standard for rating with respect to their quality. An important consideration is that often varies considerably when specific areas within these are considered. This implies assignment single an source usually unsatisfactory. Of course, user overall may not be as this extracting from source. Therefore, methods must developed will derive reliable estimates provided users, specifications have been assigned sources. Our work here bears on all concerns. describe approach uses dual measures gauge distance in database truth. then combine manual verification statistical arrive at useful databases. consider variance by isolating databases homogeneous quality, estimating each separate area. These composite regarded affixed database. Finally, we show how individual queries such specifications. was supported part DARPA grants N0014-92-J-4038 N0060-96-D-3202.

参考文章(12)
M. P. Reddy, Richard Y. Wang, Estimating data accuracy in a federated database environment international conference on information systems. pp. 115- 134 ,(1995) , 10.1007/3-540-60584-3_27
Kamran Parsaye, Mark Chignell, Intelligent database tools & applications John Wiley & Sons, Inc.. ,(1993)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Richard Y. Wang, M. P. Reddy, Henry B. Kon, Toward Quality Data: An Attribute-Based Approach ,(2014)
Amihai Motro, Integrity = validity + completeness ACM Transactions on Database Systems. ,vol. 14, pp. 480- 502 ,(1989) , 10.1145/76902.76904
Nabil Kamel, Roger King, Exploiting data-distribution patterns in modeling tuple selectivities in a database Information Sciences. ,vol. 69, pp. 27- 53 ,(1993) , 10.1016/0020-0255(93)90038-N
Frank Olken, Doron Rotem, Random sampling from databases: a survey Statistics and Computing. ,vol. 5, pp. 25- 42 ,(1995) , 10.1007/BF00140664
Christopher Fox, Anany Levitin, Thomas Redman, The notion of data and its quality dimensions Information Processing and Management. ,vol. 30, pp. 9- 19 ,(1994) , 10.1016/0306-4573(94)90020-5
Amihai Motro, Panorama: a database system that annotates its answers to queries with their properties intelligent information systems. ,vol. 7, pp. 51- 73 ,(1996) , 10.1007/BF00125522