作者: Jingru Yang , Ju Fan , Zhewei Wei , Guoliang Li , Tongyu Liu
关键词:
摘要: Large-scale data annotation is indispensable for many applications, such as machine learning and integration. However, existing solutions either incur expensive cost large datasets or produce noisy results. This paper introduces a cost-effective approach, focuses on the labeling rule generation problem that aims to generate high-quality rules largely reduce while preserving quality. To address problem, we first candidate rules, then devise game-based crowdsourcing approach CROWDGAME select by considering coverage precision. employs two groups of crowd workers: one group answers validation tasks (whether valid) play role generator, other tuple checking annotated label correct) refuter. We let two-player game: generator identifies with precision, refuter tries refute its opponent some tuples provide enough evidence reject covering tuples. studies challenges in CROWDGAME. The balance trade-off between define loss factors. second precision estimation. utilize Bayesian estimation combine both tasks. third fulfill framework minimizing loss. introduce minimax strategy develop efficient task selection algorithms. conduct experiments entity matching relation extraction, results show our method outperforms state-of-the-art solutions.