作者: Zhong Zhang , Chongming Gao , Cong Xu , Rui Miao , Qinli Yang
DOI: 10.18653/V1/2020.FINDINGS-EMNLP.46
关键词:
摘要: Weight tying is now a common setting in many language generation tasks such as modeling and machine translation. However, recent study reveals that there potential flaw weight tying. They find the learned word embeddings are likely to degenerate lie narrow cone when training model. call it representation degeneration problem propose cosine regularization solve it. Nevertheless, we prove insufficient problem, still happen under certain conditions. In this paper, revisit theoretically analyze limitations of previously proposed solution. Afterward, an alternative method called Laplacian tackle problem. Experiments on demonstrate effectiveness regularization.