Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

作者： Arun Mallya , Svetlana Lazebnik

关键词: Computer science 、 Learning models 、 Context (language use) 、 Multiple choice 、 Task (project management) 、 Network model 、 Question answering 、 Transfer (computing) 、 Machine learning 、 Object (computer science) 、 Artificial intelligence

摘要: This paper proposes deep convolutional network models that utilize local and global context to make human activity label predictions in still images, achieving state-of-the-art performance on two recent datasets with hundreds of labels each. We use multiple instance learning handle the lack supervision level individual person instances, weighted loss unbalanced training data. Further, we show how specialized features trained these can be used improve accuracy Visual Question Answering (VQA) task, form choice fill-in-the-blank questions (Visual Madlibs). Specifically, tackle types person-object relationship improvements over generic ImageNet classification task

参考文章(38)

Pulkit Agrawal, Ross Girshick, Jitendra Malik, Analyzing the Performance of Multilayer Neural Networks for Object Recognition european conference on computer vision. pp. 329- 344 ,(2014) , 10.1007/978-3-319-10584-0_22

Ilya Sutskever, Geoffrey E. Hinton, Alex Krizhevsky, ImageNet Classification with Deep Convolutional Neural Networks neural information processing systems. ,vol. 25, pp. 1097- 1105 ,(2012)

Cha Zhang, John C. Platt, Paul A. Viola, Multiple Instance Boosting for Object Detection neural information processing systems. ,vol. 18, pp. 1417- 1424 ,(2005)

Subhransu Maji, Lubomir Bourdev, Jitendra Malik, Action recognition from a distributed representation of pose and appearance CVPR 2011. pp. 3177- 3184 ,(2011) , 10.1109/CVPR.2011.5995631

Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg, Visual Madlibs: Fill in the Blank Description Generation and Question Answering 2015 IEEE International Conference on Computer Vision (ICCV). pp. 2461- 2469 ,(2015) , 10.1109/ICCV.2015.283

Yu-Wei Chao, Zhan Wang, Yugeng He, Jiaxuan Wang, Jia Deng, HICO: A Benchmark for Recognizing Human-Object Interactions in Images international conference on computer vision. pp. 1017- 1025 ,(2015) , 10.1109/ICCV.2015.122

Gil Levi, Tal Hassner, Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns international conference on multimodal interfaces. pp. 503- 510 ,(2015) , 10.1145/2818346.2830587

Marcus Rohrbach, Trevor Darrell, Jacob Andreas, Dan Klein, Deep Compositional Question Answering with Neural Module Networks arXiv: Computer Vision and Pattern Recognition. ,(2015)

Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick, Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks computer vision and pattern recognition. pp. 2874- 2883 ,(2016) , 10.1109/CVPR.2016.314

10.

Allan Jabri, Armand Joulin, Laurens van der Maaten, Revisiting Visual Question Answering Baselines european conference on computer vision. pp. 727- 739 ,(2016) , 10.1007/978-3-319-46484-8_44

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

来源期刊

我的账户

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

来源期刊

相似文章 10

我的账户