CrowdScreen

作者: Aditya G. Parameswaran , Hector Garcia-Molina , Hyunjung Park , Neoklis Polyzotis , Aditya Ramesh

DOI: 10.1145/2213836.2213878

关键词: Set (abstract data type)Computer scienceArtificial intelligenceHeuristicsProbabilistic analysis of algorithmsLarge set (Ramsey theory)State (computer science)Machine learningAlgorithmVariety (cybernetics)Theoretical computer scienceCrowdsourcing

摘要: Given a large set of data items, we consider the problem filtering them based on properties that can be verified by humans. This is commonplace in crowdsourcing applications, and yet, to our knowledge, no one has considered formal optimization this problem. (Typical solutions use heuristics solve problem.) We formally state few different variants develop deterministic probabilistic algorithms optimize expected cost (i.e., number questions) error. experimentally show provide definite gains with respect other strategies. Our applied variety scenarios form an integral part any query processor uses human computation.

参考文章(21)
Robert C. Miller, Samuel R. Madden, Eugene Wu, Adam Marcus, David R. Karger, Crowdsourced Databases: Query Processing with People conference on innovative data systems research. pp. 211- 214 ,(2011)
Neoklis Polyzotis, Aditya G. Parameswaran, Answering Queries using Humans, Algorithms and Databases conference on innovative data systems research. pp. 160- 166 ,(2011)
Eytan Bakshy, Jake M. Hofman, Winter A. Mason, Duncan J. Watts, Everyone's an influencer: quantifying influence on twitter web search and data mining. pp. 65- 74 ,(2011) , 10.1145/1935826.1935845
Rion Snow, Brendan O'Connor, Daniel Jurafsky, Andrew Y. Ng, Cheap and fast---but is it good? Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 254- 263 ,(2008) , 10.3115/1613715.1613751
Vikas C. Raykar, Shipeng Yu, Linda H. Zhao, Anna Jerebko, Charles Florin, Gerardo Hermosillo Valadez, Luca Bogoni, Linda Moy, Supervised learning from multiple experts Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 889- 896 ,(2009) , 10.1145/1553374.1553488
Anhai Doan, Raghu Ramakrishnan, Alon Y. Halevy, Crowdsourcing systems on the World-Wide Web Communications of the ACM. ,vol. 54, pp. 86- 96 ,(2011) , 10.1145/1924421.1924442
Robert McCann, Warren Shen, AnHai Doan, Matching Schemas in Online Communities: A Web 2.0 Approach international conference on data engineering. pp. 110- 119 ,(2008) , 10.1109/ICDE.2008.4497419
Adam Marcus, Eugene Wu, David R. Karger, Samuel Madden, Robert C. Miller, Demonstration of Qurk Proceedings of the 2011 international conference on Management of data - SIGMOD '11. pp. 1315- 1318 ,(2011) , 10.1145/1989323.1989486
Aditya Parameswaran, Anish Das Sarma, Hector Garcia-Molina, Neoklis Polyzotis, Jennifer Widom, Human-assisted graph search Proceedings of the VLDB Endowment. ,vol. 4, pp. 267- 278 ,(2011) , 10.14778/1952376.1952377
Victor S. Sheng, Foster Provost, Panagiotis G. Ipeirotis, Get another label? improving data quality and data mining using multiple, noisy labelers Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 614- 622 ,(2008) , 10.1145/1401890.1401965