VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling

作者： Chunxia Xiao , Tao Hu , Chengjiang Long , Leheng Zhang

DOI:

关键词:

摘要: In this paper, we propose a novel way to interpret text information by extracting visual feature presentation from multiple high-resolution and photo-realistic synthetic images generated Text-to-image Generative Adversarial Network (GAN) improve the performance of image labeling. Firstly, design stacked Multi-Adversarial (GMAN), StackGMAN++, modified version current state-of-the-art GAN, StackGAN++, generate with various prior noises conditioned on text. And then extract deep features explore underlying concepts for Finally, combine image-level feature, text-level based together predict labels images. We conduct experiments two benchmark datasets experimental results clearly demonstrate efficacy our proposed approach.

参考文章(35)

Serge Belongie, Peter Welinder, Pietro Perona, Steve Branson, Catherine Wah, The Caltech-UCSD Birds-200-2011 Dataset California Institute of Technology. ,(2011)

Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181

Justin Johnson, Lamberto Ballan, Li Fei-Fei, Love Thy Neighbors: Image Annotation by Exploiting Image Metadata 2015 IEEE International Conference on Computer Vision (ICCV). pp. 4624- 4632 ,(2015) , 10.1109/ICCV.2015.525

Matthieu Guillaumin, Jakob Verbeek, Cordelia Schmid, Multimodal semi-supervised learning for image classification computer vision and pattern recognition. pp. 902- 909 ,(2010) , 10.1109/CVPR.2010.5540120

Neela Sawant, Ritendra Datta, Jia Li, James Z. Wang, Quest for relevant tags using local interaction networks and visual content multimedia information retrieval. pp. 231- 240 ,(2010) , 10.1145/1743384.1743424

Börkur Sigurbjörnsson, Roelof van Zwol, Flickr tag recommendation based on collective knowledge Proceeding of the 17th international conference on World Wide Web - WWW '08. pp. 327- 336 ,(2008) , 10.1145/1367497.1367542

Chengjiang Long, Gang Hua, Ashish Kapoor, Active Visual Recognition with Expertise Estimation in Crowdsourcing international conference on computer vision. pp. 3000- 3007 ,(2013) , 10.1109/ICCV.2013.373

Stefanie Lindstaedt, Viktoria Pammer, Roland M, Roman Kern, Helmut M, Claudia Wagner, Recommending Tags for Pictures Based on Text, Visual Content and User Context international conference on internet and web applications and services. pp. 506- 511 ,(2008) , 10.1109/ICIW.2008.26

Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, William T. Freeman, LabelMe: A Database and Web-Based Tool for Image Annotation International Journal of Computer Vision. ,vol. 77, pp. 157- 173 ,(2008) , 10.1007/S11263-007-0090-8

10.

Zak Stone, Todd Zickler, Trevor Darrell, Autotagging Facebook: Social network context improves photo annotation computer vision and pattern recognition. pp. 1- 8 ,(2008) , 10.1109/CVPRW.2008.4562956

VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling

来源期刊

我的账户

VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling

来源期刊

相似文章 2

Iterative and Adaptive Sampling with Spatial Attention for Black-Box Model Explanations

Iterative and Adaptive Sampling with Spatial Attention for Black-Box Model Explanations

我的账户