作者: Srinadh Bhojanapalli , Ankit Singh Rawat , Sashank J. Reddi , Sanjiv Kumar , Yin-Wen Chang
DOI:
关键词: Transformer (machine learning model) 、 Theoretical computer science 、 Quadratic equation 、 Attention model 、 Pairwise comparison 、 Computer science
摘要: Recently, Transformer networks have redefined the state of the art in many NLP tasks. However, these models suffer from quadratic computational cost in the input sequence …