作者: Chunxia Xiao , Tao Hu , Chengjiang Long , Leheng Zhang
DOI:
关键词:
摘要: In this paper, we propose a novel way to interpret text information by extracting visual feature presentation from multiple high-resolution and photo-realistic synthetic images generated Text-to-image Generative Adversarial Network (GAN) improve the performance of image labeling. Firstly, design stacked Multi-Adversarial (GMAN), StackGMAN++, modified version current state-of-the-art GAN, StackGAN++, generate with various prior noises conditioned on text. And then extract deep features explore underlying concepts for Finally, combine image-level feature, text-level based together predict labels images. We conduct experiments two benchmark datasets experimental results clearly demonstrate efficacy our proposed approach.