An Exploration of Data Augmentation and RNN Architectures for Question Ranking in Community Question Answering

作者: Razvan C. Bunescu , Charles Chen

DOI:

关键词:

摘要: The automation of tasks in community question answering (cQA) is dominated by machine learning approaches, whose performance often limited the number training examples. Starting from a neural sequence approach with attention, we explore impact two data augmentation techniques on ranking performance: method that swaps reference questions their paraphrases, and examples automatically selected external datasets. Both methods are shown to lead substantial gains accuracy over strong baseline. Further improvements obtained changing model architecture mirror structure seen data.

参考文章(19)
Razvan Bunescu, Yunfeng Huang, Learning the Relative Usefulness of Questions in Community QA empirical methods in natural language processing. pp. 97- 107 ,(2010)
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio, None, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention international conference on machine learning. ,vol. 3, pp. 2048- 2057 ,(2015)
Yacine Jernite, David Sontag, Yoon Kim, Alexander M. Rush, Character-aware neural language models national conference on artificial intelligence. pp. 2741- 2749 ,(2016)
Quynh Thi Ngoc Do, Steven Bethard, Marie-Francine Moens, Domain adaptation in semantic role labeling using a neural language model and linguistic resources IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 23, pp. 1812- 1823 ,(2015) , 10.1109/TASLP.2015.2449072
Delphine Bernhard, Iryna Gurevych, Answering learners' questions by retrieving question paraphrases from social Q&A sites Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications - EANL '08. pp. 44- 52 ,(2008) , 10.3115/1631836.1631842
Kai Wang, Zhaoyan Ming, Tat-Seng Chua, A syntactic tree matching approach to finding similar questions in community-based qa services international acm sigir conference on research and development in information retrieval. pp. 187- 194 ,(2009) , 10.1145/1571941.1571975
Xin Cao, Gao Cong, Bin Cui, Christian Søndergaard Jensen, Ce Zhang, The use of categorization information in language models for question retrieval conference on information and knowledge management. pp. 265- 274 ,(2009) , 10.1145/1645953.1645989
Kang Liu, Guangyou Zhou, Li Cai, Jun Zhao, Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives meeting of the association for computational linguistics. pp. 653- 662 ,(2011)
William Lewis, Robert C. Moore, Intelligent Selection of Language Model Training Data meeting of the association for computational linguistics. pp. 220- 224 ,(2010)
Marie-Francine Moens, Oleksandr Kolomiyets, Steven Bethard, Model-Portability Experiments for Textual Temporal Analysis meeting of the association for computational linguistics. pp. 271- 276 ,(2011)