作者: Ngoc Nguyen , Atsuo Yoshitaka
关键词:
摘要: Recognition of human-human interactions is one the most important topics since it has great scientific importance and many potential practical applications such as surveillance, automatic video indexing. Previous approaches have only concentrated on classification put less effort into localization human interactions. In addition, they rely hand-designed features (e.g. SIFT, HOG), or poses joints to model A disadvantage that difficult time consuming extend these different datasets in real world. this paper, we approach problem interaction temporal with unsupervised feature learning. Motivated by well-known Independent Subspace Analysis (ISA) natural image statistics convolution technique, introduce a three-layer convolutional ISA network learn hierarchical invariant from videos. Using learned network, build bag-of-features (BOF) representation for We then apply Support Vector Machine (SVM) classify interactions, employ sliding window technique localize temporally. also evaluate performance sequences UT-Interaction dataset Hollywood dataset. The encouraging results show our able which are effective represent complex activities realistic environments. Although insufficient applications, first step further research