Attention Is All You Need

作者： Ashish Vaswani , Jakob Uszkoreit , Noam Shazeer , Illia Polosukhin , Llion Jones

DOI:

关键词:

摘要: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. best performing also connect the encoder and decoder through attention mechanism. We propose a new simple network architecture, Transformer, solely mechanisms, dispensing with recurrence convolutions entirely. Experiments two machine translation tasks show these to be superior quality while being more parallelizable requiring significantly less time train. Our model achieves 28.4 BLEU WMT 2014 English-to-German task, improving over existing results, including ensembles by 2 BLEU. On English-to-French our establishes single-model state-of-the-art score of 41.8 after training for 3.5 days eight GPUs, small fraction costs from literature. that Transformer generalizes well other applying it successfully English constituency parsing both large limited data.

参考文章(30)

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

Alex Graves, Generating Sequences With Recurrent Neural Networks arXiv: Neural and Evolutionary Computing. ,(2013)

Thang Luong, Hieu Pham, Christopher D. Manning, Effective Approaches to Attention-based Neural Machine Translation empirical methods in natural language processing. pp. 1412- 1421 ,(2015) , 10.18653/V1/D15-1166

Zhongqiang Huang, Mary Harper, Self-Training PCFG Grammars with Latent Annotations Across Languages empirical methods in natural language processing. pp. 832- 841 ,(2009) , 10.3115/1699571.1699621

Min Zhang, Wenliang Chen, Yue Zhang, Muhua Zhu, Jingbo Zhu, Fast and Accurate Shift-Reduce Constituent Parsing meeting of the association for computational linguistics. pp. 434- 443 ,(2013)

Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein, Learning Accurate, Compact, and Interpretable Tree Annotation meeting of the association for computational linguistics. pp. 433- 440 ,(2006) , 10.3115/1220175.1220230

David McClosky, Eugene Charniak, Mark Johnson, Effective Self-Training for Parsing language and technology conference. pp. 152- 159 ,(2006) , 10.3115/1220835.1220855

Ilya Sutskever, Minh-Thang Luong, Oriol Vinyals, Quoc V. Le, Lukasz Kaiser, Multi-task Sequence to Sequence Learning arXiv: Learning. ,(2015)

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition computer vision and pattern recognition. pp. 770- 778 ,(2016) , 10.1109/CVPR.2016.90

10.

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu, None, Exploring the limits of language modeling arXiv: Computation and Language. ,(2016)

Attention Is All You Need

来源期刊

我的账户

Attention Is All You Need

来源期刊

相似文章 10

我的账户