O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers.

作者： Srinadh Bhojanapalli , Ankit Singh Rawat , Sashank J. Reddi , Sanjiv Kumar , Yin-Wen Chang

DOI:

关键词: Transformer (machine learning model) 、 Theoretical computer science 、 Quadratic equation 、 Attention model 、 Pairwise comparison 、 Computer science

摘要: Recently, Transformer networks have redefined the state of the art in many NLP tasks. However, these models suffer from quadratic computational cost in the input sequence …