Synthetic QA Corpora Generation with Roundtrip Consistency

作者: Chris Alberti , Daniel Andor , Emily Pitler , Jacob Devlin , Michael Collins

DOI: 10.18653/V1/P19-1620

关键词: Question answeringComputer scienceQuestion generationArtificial intelligenceNatural language processingConsistency (database systems)

摘要: We introduce a novel method of generating synthetic question answering corpora by combining models generation and answer extraction, filtering the results to ensure roundtrip consistency. By pretraining on resulting we obtain significant improvements SQuAD2 NQ, establishing new state-of-the-art latter. Our data models, for both can be fully reproduced finetuning publicly available BERT model extractive subsets NQ. also describe more powerful variant that does full sequence-to-sequence generation, obtaining exact match F1 at less than 0.1% 0.4% from human performance SQuAD2.

参考文章(13)
Noah A. Smith, Michael Heilman, Good Question! Statistical Ranking for Question Generation north american chapter of the association for computational linguistics. pp. 609- 617 ,(2010)
Tie-Yan Liu, Yingce Xia, Wei-Ying Ma, Nenghai Yu, Tao Qin, Liwei Wang, Di He, Dual learning for machine translation neural information processing systems. ,vol. 29, pp. 820- 828 ,(2016)
Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Understanding Back-Translation at Scale. empirical methods in natural language processing. pp. 489- 500 ,(2018) , 10.18653/V1/D18-1045
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov, Natural Questions: A Benchmark for Question Answering Research Transactions of the Association for Computational Linguistics. ,vol. 7, pp. 452- 466 ,(2019) , 10.1162/TACL_A_00276
Mike Lewis, Angela Fan, Generative Question Answering: Learning to Answer the Whole Question. international conference on learning representations. ,(2018)
Hao Tian, Hua Wu, Yu Sun, Xin Tian, Danxiang Zhu, Shuohuan Wang, Han Zhang, Yukun Li, Shikun Feng, Xuyi Chen, ERNIE: Enhanced Representation through Knowledge Integration arXiv: Computation and Language. ,(2019)
Xinya Du, Junru Shao, Claire Cardie, Learning to Ask: Neural Question Generation for Reading Comprehension meeting of the association for computational linguistics. ,vol. 1, pp. 1342- 1352 ,(2017) , 10.18653/V1/P17-1123
Xinya Du, Claire Cardie, Harvesting Paragraph-level Question-Answer Pairs from Wikipedia Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ,vol. 1, pp. 1907- 1917 ,(2018) , 10.18653/V1/P18-1177
Bhuwan Dhingra, Danish Danish, Dheeraj Rajagopal, Simple and Effective Semi-Supervised Question Answering north american chapter of the association for computational linguistics. ,vol. 2, pp. 582- 587 ,(2018) , 10.18653/V1/N18-2092
Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc’Aurelio Ranzato, Phrase-Based & Neural Unsupervised Machine Translation Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. pp. 5039- 5049 ,(2018) , 10.18653/V1/D18-1549