作者: Chris Alberti , Daniel Andor , Emily Pitler , Jacob Devlin , Michael Collins
DOI: 10.18653/V1/P19-1620
关键词: Question answering 、 Computer science 、 Question generation 、 Artificial intelligence 、 Natural language processing 、 Consistency (database systems)
摘要: We introduce a novel method of generating synthetic question answering corpora by combining models generation and answer extraction, filtering the results to ensure roundtrip consistency. By pretraining on resulting we obtain significant improvements SQuAD2 NQ, establishing new state-of-the-art latter. Our data models, for both can be fully reproduced finetuning publicly available BERT model extractive subsets NQ. also describe more powerful variant that does full sequence-to-sequence generation, obtaining exact match F1 at less than 0.1% 0.4% from human performance SQuAD2.