Classification and Temporal Localization for Human-Human Interactions

作者: Ngoc Nguyen , Atsuo Yoshitaka

DOI: 10.1109/BIGMM.2016.87

关键词:

摘要: Recognition of human-human interactions is one the most important topics since it has great scientific importance and many potential practical applications such as surveillance, automatic video indexing. Previous approaches have only concentrated on classification put less effort into localization human interactions. In addition, they rely hand-designed features (e.g. SIFT, HOG), or poses joints to model A disadvantage that difficult time consuming extend these different datasets in real world. this paper, we approach problem interaction temporal with unsupervised feature learning. Motivated by well-known Independent Subspace Analysis (ISA) natural image statistics convolution technique, introduce a three-layer convolutional ISA network learn hierarchical invariant from videos. Using learned network, build bag-of-features (BOF) representation for We then apply Support Vector Machine (SVM) classify interactions, employ sliding window technique localize temporally. also evaluate performance sequences UT-Interaction dataset Hollywood dataset. The encouraging results show our able which are effective represent complex activities realistic environments. Although insufficient applications, first step further research

参考文章(26)
M. S. Ryoo, Chia-Chih Chen, J. K. Aggarwal, Amit Roy-Chowdhury, An overview of contest on semantic description of human activities (SDHA) 2010 international conference on pattern recognition. pp. 270- 285 ,(2010) , 10.1007/978-3-642-17711-8_28
Aapo Hyvärinen, Jarmo Hurri, Patrik O. Hoyer, Natural Image Statistics Computational Imaging and Vision. ,vol. 39, ,(2009) , 10.1007/978-1-84882-491-1
Yu Kong, Yunde Jia, Yun Fu, Learning Human Interaction by Interactive Phrases Computer Vision – ECCV 2012. pp. 300- 313 ,(2012) , 10.1007/978-3-642-33718-5_22
Daniel Waltisberg, Angela Yao, Juergen Gall, Luc Van Gool, Variations of a hough-voting action recognition system international conference on pattern recognition. pp. 306- 312 ,(2010) , 10.1007/978-3-642-17711-8_31
Geert Willems, Tinne Tuytelaars, Luc Van Gool, An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector Lecture Notes in Computer Science. pp. 650- 663 ,(2008) , 10.1007/978-3-540-88688-4_48
Arash Vahdat, Bo Gao, Mani Ranjbar, Greg Mori, A discriminative key pose sequence model for recognizing human interactions international conference on computer vision. pp. 1729- 1736 ,(2011) , 10.1109/ICCVW.2011.6130458
Yu Kong, Yunde Jia, A Hierarchical Model for Human Interaction Recognition international conference on multimedia and expo. pp. 1- 6 ,(2012) , 10.1109/ICME.2012.67
M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes international conference on computer vision. ,vol. 2, pp. 1395- 1402 ,(2005) , 10.1109/ICCV.2005.28
Adrien Gaidon, Zaid Harchaoui, Cordelia Schmid, Activity representation with motion hierarchies International Journal of Computer Vision. ,vol. 107, pp. 219- 238 ,(2014) , 10.1007/S11263-013-0677-1