作者: Mehrsan Javan Roshtkhari , Martin D. Levine
DOI: 10.1109/CRV.2012.32
关键词:
摘要: This paper presents a novel action matching method based on hierarchical codebook of local spatio-temporal video volumes (STVs). Given single example an activity as query video, the proposed finds similar videos to in dataset. It is bag words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. also robust spatial temporal scale changes, well some deformations. The algorithm yields compact subset salient code STVs for then likelihood similarity between all target measured using probabilistic inference mechanism. hierarchy achieved by initially constructing STVs, while considering uncertainty construction, which always ignored current versions BOV approach. At second level hierarchy, large contextual region containing many (Ensemble STVs) considered order construct model their compositions. third formed ensembles similarities. latter are labels (code words) actions being exhibited video. Finally, at highest selected analyzing high assigned each image pixel function time. was applied three available datasets recognition with different complexities (KTH, Weizmann, MSR II) results were superior other approaches, especially cases training cross-dataset recognition.