O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers.

作者: Srinadh Bhojanapalli , Ankit Singh Rawat , Sashank J. Reddi , Sanjiv Kumar , Yin-Wen Chang

DOI:

关键词: Transformer (machine learning model)Theoretical computer scienceQuadratic equationAttention modelPairwise comparisonComputer science

摘要: Recently, Transformer networks have redefined the state of the art in many NLP tasks. However, these models suffer from quadratic computational cost in the input sequence …

参考文章(0)