VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling

作者: Chunxia Xiao , Tao Hu , Chengjiang Long , Leheng Zhang

DOI:

关键词:

摘要: In this paper, we propose a novel way to interpret text information by extracting visual feature presentation from multiple high-resolution and photo-realistic synthetic images generated Text-to-image Generative Adversarial Network (GAN) improve the performance of image labeling. Firstly, design stacked Multi-Adversarial (GMAN), StackGMAN++, modified version current state-of-the-art GAN, StackGAN++, generate with various prior noises conditioned on text. And then extract deep features explore underlying concepts for Finally, combine image-level feature, text-level based together predict labels images. We conduct experiments two benchmark datasets experimental results clearly demonstrate efficacy our proposed approach.

参考文章(35)
Serge Belongie, Peter Welinder, Pietro Perona, Steve Branson, Catherine Wah, The Caltech-UCSD Birds-200-2011 Dataset California Institute of Technology. ,(2011)
Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181
Justin Johnson, Lamberto Ballan, Li Fei-Fei, Love Thy Neighbors: Image Annotation by Exploiting Image Metadata 2015 IEEE International Conference on Computer Vision (ICCV). pp. 4624- 4632 ,(2015) , 10.1109/ICCV.2015.525
Matthieu Guillaumin, Jakob Verbeek, Cordelia Schmid, Multimodal semi-supervised learning for image classification computer vision and pattern recognition. pp. 902- 909 ,(2010) , 10.1109/CVPR.2010.5540120
Neela Sawant, Ritendra Datta, Jia Li, James Z. Wang, Quest for relevant tags using local interaction networks and visual content multimedia information retrieval. pp. 231- 240 ,(2010) , 10.1145/1743384.1743424
Börkur Sigurbjörnsson, Roelof van Zwol, Flickr tag recommendation based on collective knowledge Proceeding of the 17th international conference on World Wide Web - WWW '08. pp. 327- 336 ,(2008) , 10.1145/1367497.1367542
Chengjiang Long, Gang Hua, Ashish Kapoor, Active Visual Recognition with Expertise Estimation in Crowdsourcing international conference on computer vision. pp. 3000- 3007 ,(2013) , 10.1109/ICCV.2013.373
Stefanie Lindstaedt, Viktoria Pammer, Roland M, Roman Kern, Helmut M, Claudia Wagner, Recommending Tags for Pictures Based on Text, Visual Content and User Context international conference on internet and web applications and services. pp. 506- 511 ,(2008) , 10.1109/ICIW.2008.26
Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, William T. Freeman, LabelMe: A Database and Web-Based Tool for Image Annotation International Journal of Computer Vision. ,vol. 77, pp. 157- 173 ,(2008) , 10.1007/S11263-007-0090-8
Zak Stone, Todd Zickler, Trevor Darrell, Autotagging Facebook: Social network context improves photo annotation computer vision and pattern recognition. pp. 1- 8 ,(2008) , 10.1109/CVPRW.2008.4562956