Attend and Interact: Higher-Order Object Interactions for Video Understanding

作者: Zsolt Kira , Iain Melvin , Hans Peter Graf , Ghassan AlRegib , Chih-Yao Ma

DOI:

关键词:

摘要: Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual …

参考文章(54)
Amir Roshan Zamir, Khurram Soomro, Mubarak Shah, UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild arXiv: Computer Vision and Pattern Recognition. ,(2012)
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, Juan Carlos Niebles, ActivityNet: A large-scale video benchmark for human activity understanding computer vision and pattern recognition. pp. 961- 970 ,(2015) , 10.1109/CVPR.2015.7298698
Zhiheng Huang, Haonan Yu, Yi Yang, Jiang Wang, Wei Xu, Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks arXiv: Computer Vision and Pattern Recognition. ,(2015)
Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Li Fei-Fei, Image retrieval using scene graphs computer vision and pattern recognition. pp. 3668- 3678 ,(2015) , 10.1109/CVPR.2015.7298990
Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, BLEU Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02. pp. 311- 318 ,(2001) , 10.3115/1073083.1073135
Lorenzo Torresani, Manohar Paluri, Du Tran, Rob Fergus, Lubomir D. Bourdev, C3D: Generic Features for Video Analysis. ,(2014)
Alon Lavie, Satanjeev Banerjee, METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments meeting of the association for computational linguistics. pp. 65- 72 ,(2005)
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, T. Serre, HMDB: A large video database for human motion recognition international conference on computer vision. pp. 2556- 2563 ,(2011) , 10.1109/ICCV.2011.6126543
Kate Saenko, Subhashini Venugopalan, Jeff Donahue, Raymond Mooney, Marcus Rohrbach, Huijuan Xu, Translating Videos to Natural Language Using Deep Recurrent Neural Networks arXiv: Computer Vision and Pattern Recognition. ,(2014)
Andrej Karpathy, Rahul Sukthankar, Li Fei-Fei, Thomas Leung, Sanketh Shetty, George Toderici, Large-scale Video Classification with Convolutional Neural Networks ,(2014)