A Unified Generation-Retrieval Framework for Image Captioning

作者: Chunpu Xu , Wei Zhao , Min Yang , Xiang Ao , Wangrong Cheng

DOI: 10.1145/3357384.3358105

关键词:

摘要: Recent image captioning approaches are typically trained on generation-based or retrieval-based approaches. Both methods have their advantages but limited by the disadvantages. In this paper, we propose a Unified Generation-Retrieval framework for Image Captioning (UGRIC) using adversarial learning. Different from previous methods, proposed UGRIC model leverages informative contents of N-best response candidates provided to enhance method. addition, further improve informativeness generated caption, employ copying mechanism choose words retrieved candidate captions and put them into proper positions output sequence. Experiments MSCOCO dataset demonstrate effectiveness through various evaluation metrics.\footnoteCode data available at: \urlhttp://tinyurl.com/y6z2x6ho.

参考文章(9)
Andrej Karpathy, Li Fei-Fei, Deep visual-semantic alignments for generating image descriptions computer vision and pattern recognition. pp. 3128- 3137 ,(2015) , 10.1109/CVPR.2015.7298932
Rebecca Mason, Eugene Charniak, Domain-Specific Image Captioning conference on computational natural language learning. pp. 11- 20 ,(2014) , 10.3115/V1/W14-1602
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang, Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6077- 6086 ,(2018) , 10.1109/CVPR.2018.00636
Wei Zhao, Wei Xu, Min Yang, Jianbo Ye, Zhou Zhao, Yabing Feng, Yu Qiao, Dual Learning for Cross-domain Image Captioning conference on information and knowledge management. pp. 29- 38 ,(2017) , 10.1145/3132847.3132920
Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh, Neural Baby Talk 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7219- 7228 ,(2018) , 10.1109/CVPR.2018.00754
Wei Zhao, Benyou Wang, Jianbo Ye, Min Yang, Zhou Zhao, Ruotian Luo, Yu Qiao, A Multi-task Learning Approach for Image Captioning international joint conference on artificial intelligence. pp. 1205- 1211 ,(2018) , 10.24963/IJCAI.2018/168
Min Yang, Wei Zhao, Wei Xu, Yabing Feng, Zhou Zhao, Xiaojun Chen, Kai Lei, Multitask Learning for Cross-Domain Image Captioning IEEE Transactions on Multimedia. ,vol. 21, pp. 1047- 1061 ,(2019) , 10.1109/TMM.2018.2869276
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, Vaibhava Goel, Self-Critical Sequence Training for Image Captioning computer vision and pattern recognition. pp. 1179- 1195 ,(2017) , 10.1109/CVPR.2017.131
Jyoti Aneja, Aditya Deshpande, Alexander G. Schwing, Convolutional Image Captioning 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5561- 5570 ,(2018) , 10.1109/CVPR.2018.00583