摘要: Real-world web videos often contain cues to supplement visual information for generating natural language descriptions. In this paper we propose a sequence-to-sequence model …

参考文章(5)
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
Chin-Yew Lin, ROUGE: A Package for Automatic Evaluation of Summaries meeting of the association for computational linguistics. pp. 74- 81 ,(2004)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition computer vision and pattern recognition. pp. 770- 778 ,(2016) , 10.1109/CVPR.2016.90
Jeffrey Pennington, Richard Socher, Christopher Manning, Glove: Global Vectors for Word Representation empirical methods in natural language processing. pp. 1532- 1543 ,(2014) , 10.3115/V1/D14-1162
Ilya Sutskever, Ian J. Goodfellow, Gregory S. Corrado, Michael Isard, Matthieu Devin, Vincent Vanhoucke, Martin Wicke, Manjunath Kudlur, Rajat Monga, Vijay Vasudevan, Geoffrey Irving, Yangqing Jia, Fernanda B. Viégas, Kunal Talwar, Martin Wattenberg, Ashish Agarwal, Martín Abadi, Yuan Yu, Rafal Józefowicz, Craig Citro, Sherry Moore, Paul Barham, Benoit Steiner, Pete Warden, Josh Levenberg, Derek Gordon Murray, Paul A. Tucker, Jonathon Shlens, Jeffrey Dean, Xiaoqiang Zheng, Chris Olah, Andy Davis, Dan Mané, Mike Schuster, Sanjay Ghemawat, Andrew Harp, Oriol Vinyals, Eugene Brevdo, Zhifeng Chen, Lukasz Kaiser, TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems arXiv: Distributed, Parallel, and Cluster Computing. ,(2015)