The role of image representations in vision to language tasks

作者: PRANAVA MADHYASTHA , JOSIAH WANG , LUCIA SPECIA

DOI: 10.1017/S1351324918000116

关键词:

摘要: Tasks that require modeling of both language and visual information, such as image captioning, have become very popular in recent years. Most state-of-the-art approaches …

参考文章(80)
Margaret Mitchell, Jesse Dodge, Amit Goyal, Kota Yamaguchi, Karl Stratos, Xufeng Han, Alyssa Mensch, Alexander Berg, Tamara Berg, Hal Daumé III, None, Midge: Generating Image Descriptions From Computer Vision Detections conference of the european chapter of the association for computational linguistics. pp. 747- 756 ,(2012)
M. Hodosh, P. Young, J. Hockenmaier, Framing image description as a ranking task: data, models and evaluation metrics Journal of Artificial Intelligence Research. ,vol. 47, pp. 853- 899 ,(2013) , 10.1613/JAIR.3994
M. Grubinger, The IAPR Benchmark : A New Evaluation Resource for Visual Information Systems language resources and evaluation. ,(2006)
Tomas Mikolov, Martin Karafiát, Sanjeev Khudanpur, Jan Cernocký, Lukás Burget, Recurrent neural network based language model conference of the international speech communication association. pp. 1045- 1048 ,(2010)
Ilya Sutskever, Wojciech Zaremba, Oriol Vinyals, Recurrent Neural Network Regularization arXiv: Neural and Evolutionary Computing. ,(2014)
Girish Kulkarni, Tamara L. Berg, Yejin Choi, Siming Li, Alexander C. Berg, Composing Simple Image Descriptions using Web-scale N-grams conference on computational natural language learning. pp. 220- 228 ,(2011)
Nasrin Mostafazadeh, Lucy Vanderwende, Margaret Mitchell, Francis Ferraro, Jacob Devlin, Michel Galley, Ting-Hao, Huang, A Survey of Current Datasets for Vision and Language Research arXiv: Computation and Language. ,(2015)
Pavel Zemcík, Michal Hradis, Martin Kolár, Technical Report: Image Captioning with Semantically Similar Images. arXiv: Computer Vision and Pattern Recognition. ,(2015)
Yiannis Aloimonos, Hal Daume Iii, Yezhou Yang, Ching Teo, Corpus-Guided Sentence Generation of Natural Images empirical methods in natural language processing. pp. 444- 454 ,(2011)
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and tell: A neural image caption generator computer vision and pattern recognition. pp. 3156- 3164 ,(2015) , 10.1109/CVPR.2015.7298935