Revisiting Representation Degeneration Problem in Language Modeling

作者: Zhong Zhang , Chongming Gao , Cong Xu , Rui Miao , Qinli Yang

DOI: 10.18653/V1/2020.FINDINGS-EMNLP.46

关键词:

摘要: Weight tying is now a common setting in many language generation tasks such as modeling and machine translation. However, recent study reveals that there potential flaw weight tying. They find the learned word embeddings are likely to degenerate lie narrow cone when training model. call it representation degeneration problem propose cosine regularization solve it. Nevertheless, we prove insufficient problem, still happen under certain conditions. In this paper, revisit theoretically analyze limitations of previously proposed solution. Afterward, an alternative method called Laplacian tackle problem. Experiments on demonstrate effectiveness regularization.

参考文章(26)
Tomas Mikolov, Martin Karafiát, Sanjeev Khudanpur, Jan Cernocký, Lukás Burget, Recurrent neural network based language model conference of the international speech communication association. pp. 1045- 1048 ,(2010)
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio, None, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention international conference on machine learning. ,vol. 3, pp. 2048- 2057 ,(2015)
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and tell: A neural image caption generator computer vision and pattern recognition. pp. 3156- 3164 ,(2015) , 10.1109/CVPR.2015.7298935
Steven CH Hoi, Wei Liu, Shih-Fu Chang, None, Semi-supervised distance metric learning for collaborative image retrieval and clustering ACM Transactions on Multimedia Computing, Communications, and Applications. ,vol. 6, pp. 18- ,(2010) , 10.1145/1823746.1823752
Deng Cai, Xiaofei He, Yuxiao Hu, Jiawei Han, Thomas Huang, Learning a Spatially Smooth Subspace for Face Recognition computer vision and pattern recognition. pp. 1- 7 ,(2007) , 10.1109/CVPR.2007.383054
Yasuhiro Fujiwara, Sekitoshi Kanai, Shuichi Adachi, Yuki Yamanaka, Sigsoftmax: reanalysis of the softmax bottleneck neural information processing systems. ,vol. 31, pp. 284- 294 ,(2018)
Zellig S. Harris, Distributional Structure WORD. ,vol. 10, pp. 146- 162 ,(1954) , 10.1080/00437956.1954.11659520
Sho Takase, Jun Suzuki, Masaaki Nagata, Direct Output Connection for a High-Rank Language Model Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. pp. 4599- 4609 ,(2018) , 10.18653/V1/D18-1489
Tie-Yan Liu, Tao Qin, Liwei Wang, Di He, Xu Tan, Jun Gao, Representation Degeneration Problem in Training Natural Language Generation Models. international conference on learning representations. ,(2019)
Dilin Wang, Qiang Liu, ChengYue Gong, Improving Neural Language Modeling via Adversarial Training international conference on machine learning. pp. 6555- 6565 ,(2019)