Query-Oriented Data Cleaning with Oracles

作者: Moria Bergman , Tova Milo , Slava Novgorodov , Wang-Chiew Tan

DOI: 10.1145/2723372.2737786

关键词:

摘要: As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct possible. For this reason, many data cleaning tools have been developed automatically resolve inconsistencies databases. However, provide only best-effort results usually cannot eradicate all errors that may exist database. Even more importantly, existing do not typically address problem of determining what missing from To overcome limitations techniques, we present QOCO, novel query-oriented system with oracles. Under framework, incorrect (resp. missing) tuples removed (added to) result query through edits applied underlying where derived by interacting domain experts which model oracle crowds. We show minimal interactions crowds derive removing (adding) (missing) NP-hard general heuristic algorithms interact Finally, implement our prototype QOCO effective efficient comprehensive suite experiments.

参考文章(63)
Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra, Query Aware Determinization of Uncertain Objects IEEE Transactions on Knowledge and Data Engineering. ,vol. 27, pp. 207- 221 ,(2015) , 10.1109/TKDE.2013.170
Hyunjung Park, Jennifer Widom, CrowdFill: collecting structured data from the crowd international conference on management of data. pp. 577- 588 ,(2014) , 10.1145/2588555.2610503
Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum, Yago: a core of semantic knowledge the web conference. pp. 697- 706 ,(2007) , 10.1145/1242572.1242667
Quoc Trung Tran, Chee-Yong Chan, How to ConQueR why-not questions Proceedings of the 2010 international conference on Management of data - SIGMOD '10. pp. 15- 26 ,(2010) , 10.1145/1807167.1807172
Aditya G. Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh, Jennifer Widom, CrowdScreen Proceedings of the 2012 international conference on Management of Data - SIGMOD '12. pp. 361- 372 ,(2012) , 10.1145/2213836.2213878
Umeshwar Dayal, Philip A. Bernstein, On the correct translation of update operations on relational views ACM Transactions on Database Systems. ,vol. 7, pp. 381- 416 ,(1982) , 10.1145/319732.319740
Melanie Herschel, Mauricio A. Hernández, Explaining missing answers to SPJUA queries Proceedings of the VLDB Endowment. ,vol. 3, pp. 185- 196 ,(2010) , 10.14778/1920841.1920869
Steven Euijong Whang, Peter Lofgren, Hector Garcia-Molina, Question selection for crowd entity resolution Proceedings of the VLDB Endowment. ,vol. 6, pp. 349- 360 ,(2013) , 10.14778/2536336.2536337
Robert McCann, Warren Shen, AnHai Doan, Matching Schemas in Online Communities: A Web 2.0 Approach international conference on data engineering. pp. 110- 119 ,(2008) , 10.1109/ICDE.2008.4497419
Jiannan Wang, Tim Kraska, Michael J. Franklin, Jianhua Feng, CrowdER Proceedings of the VLDB Endowment. ,vol. 5, pp. 1483- 1494 ,(2012) , 10.14778/2350229.2350263