ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

作者: Ram Nevatia , Liang-Chieh Chen , Kan Chen , Haoyuan Gao , Jiang Wang

DOI:

关键词: Natural language processingQuestion answeringConvolutional neural networkFeature (computer vision)Deep learningComputer scienceNatural languageImage (mathematics)Benchmark (computing)SemanticsArtificial intelligenceMachine learning

摘要: We propose a novel attention based deep learning architecture for visual question answering task (VQA). Given an image and an image related natural language question, VQA …

参考文章(30)
Matthew D. Zeiler, ADADELTA: An Adaptive Learning Rate Method arXiv: Learning. ,(2012)
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus, Indoor Segmentation and Support Inference from RGBD Images Computer Vision – ECCV 2012. pp. 746- 760 ,(2012) , 10.1007/978-3-642-33715-4_54
Volodymyr Mnih, Koray Kavukcuoglu, Jimmy Ba, Multiple Object Recognition with Visual Attention arXiv: Learning. ,(2014)
Zhiheng Huang, Junhua Mao, Haoyuan Gao, Lei Wang, Wei Xu, Jie Zhou, Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering arXiv: Computer Vision and Pattern Recognition. ,(2015)
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)
Esteban Real, Pierre Sermanet, Andrea Frome, Attention for Fine-Grained Categorization arXiv: Computer Vision and Pattern Recognition. ,(2014)
Fei Sha, Changshui Zhang, Runpeng Cui, Kun Fu, Junqi Jin, Aligning where to see and what to tell: image caption with region-based attention and scene factorization arXiv: Computer Vision and Pattern Recognition. ,(2015)
Andrej Karpathy, Li Fei-Fei, Deep visual-semantic alignments for generating image descriptions computer vision and pattern recognition. pp. 3128- 3137 ,(2015) , 10.1109/CVPR.2015.7298932
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell, Kate Saenko, Long-term recurrent convolutional networks for visual recognition and description computer vision and pattern recognition. pp. 2625- 2634 ,(2015) , 10.1109/CVPR.2015.7298878
Benjamin Klein, Lior Wolf, Yehuda Afek, A Dynamic Convolutional Layer for short rangeweather prediction computer vision and pattern recognition. pp. 4840- 4848 ,(2015) , 10.1109/CVPR.2015.7299117