CrowdCleaner: Data cleaning for multi-version data on the web via crowdsourcing

作者: Yongxin Tong , Caleb Chen Cao , Chen Jason Zhang , Yatao Li , Lei Chen

DOI: 10.1109/ICDE.2014.6816736

关键词:

摘要: Multi-version data is often one of the most concerned information on Web since this type usually updated frequently. Even though there exist some integration systems that try to maintain latest update version, maintained multi-version includes inaccurate and invalid due or delay errors. In demo, we present CrowdCleaner, a smart cleaning system for Web, which utilizes crowdsourcing-based approaches detecting repairing errors cannot be solved by traditional techniques. particular, CrowdCleaner blends active passive crowdsourcing methods together rectifying data. We demonstrate following four facilities provided CrowdCleaner: (1) an error-monitor find out items (e.g., submission date, price real estate, etc.) are wrong versions according reports from crowds, belongs strategy; (2) task-manager allocate tasks human workers intelligently; (3) smart-decision-maker identify answer crowds correct with methods; (4) whom-to-ask-finder discover users (or workers) should credible their records.

参考文章(11)
Zachary Ives, Alon Halevy, AnHai Doan, Principles of Data Integration ,(2012)
Lu Jiang, Zhaohui Wu, Qian Feng, Jun Liu, Qinghua Zheng, Efficient deep web crawling using reinforcement learning knowledge discovery and data mining. pp. 428- 439 ,(2010) , 10.1007/978-3-642-13657-3_46
Stephen Guo, Aditya Parameswaran, Hector Garcia-Molina, So who won? Proceedings of the 2012 international conference on Management of Data - SIGMOD '12. pp. 385- 396 ,(2012) , 10.1145/2213836.2213880
Caleb Chen Cao, Yongxin Tong, Lei Chen, H. V. Jagadish, WiseMarket: a new paradigm for managing wisdom of online social users knowledge discovery and data mining. pp. 455- 463 ,(2013) , 10.1145/2487575.2487642
Aditya G. Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh, Jennifer Widom, CrowdScreen Proceedings of the 2012 international conference on Management of Data - SIGMOD '12. pp. 361- 372 ,(2012) , 10.1145/2213836.2213878
Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava, Integrating conflicting data Proceedings of the VLDB Endowment. ,vol. 2, pp. 550- 561 ,(2009) , 10.14778/1687627.1687690
Wangchao Le, Feifei Li, Yufei Tao, Robert Christensen, Optimal splitters for temporal and multi-version databases international conference on management of data. pp. 109- 120 ,(2013) , 10.1145/2463676.2465310
Caleb Chen Cao, Jieying She, Yongxin Tong, Lei Chen, Whom to ask? Proceedings of the VLDB Endowment. ,vol. 5, pp. 1495- 1506 ,(2012) , 10.14778/2350229.2350264
Chen Jason Zhang, Lei Chen, H. V. Jagadish, Chen Caleb Cao, Reducing uncertainty of schema matching via crowdsourcing Proceedings of the VLDB Endowment. ,vol. 6, pp. 757- 768 ,(2013) , 10.14778/2536360.2536374
Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller, Human-powered sorts and joins Proceedings of the VLDB Endowment. ,vol. 5, pp. 13- 24 ,(2011) , 10.14778/2047485.2047487