BBN VISER TRECVID 2011 Multimedia Event Detection System

作者: I-Hong Jhuo , Stavros Tsakalidis , Shiv N. Vitaladevuni , Vasant Manohar , Pradeep Natarajan

DOI:

关键词:

摘要: We describe the Raytheon BBN (BBN) VISER system that is designed to detect events of interest in multimedia data. also present a comprehensive analysis different modules context MED 2011 task. The incorporates large set low-level features capture appearance, color, motion, audio, and audio-visual cooccurrence patterns videos. For features, we rigorously analyzed several coding pooling strategies, used state-of-the-art spatio-temporal strategies model relationships between features. uses high-level (i.e., semantic) visual information obtained from detecting scene, object, action concepts. Furthermore, exploits multimodal by analyzing available spoken videotext content using BBN's Byblos automatic speech recognition (ASR) video text systems. These diverse streams are combined into single, fixed dimensional vector for each video. explored two combination strategies: early fusion late fusion. Early was implemented through fast kernel-based framework performed both Bayesian (BAYCOM) as well an innovative weighted-average framework. Consistent with previous MED’10 evaluation, exhibit strong performance form basis our system. However, speech, video-text, object detection provide consistent significant improvements. Overall, BBN’s exhibited best among all submitted systems average ANDC score 0.46 across 10 MED’11 test when threshold optimized NDC score, <30% missed rate minimize detections at 6% false alarm rate.

参考文章(19)
G. Csurka, Visual categorization with bags of keypoints european conference on computer vision. ,vol. 1, pp. 22- ,(2004)
Herbert Bay, Tinne Tuytelaars, Luc Van Gool, SURF: speeded up robust features european conference on computer vision. ,vol. 1, pp. 404- 417 ,(2006) , 10.1007/11744023_32
Vijay Chandrasekhar, Gabriel Takacs, David M. Chen, Sam S. Tsai, Yuriy Reznik, Radek Grzeszczuk, Bernd Girod, Compressed Histogram of Gradients: A Low-Bitrate Descriptor International Journal of Computer Vision. ,vol. 96, pp. 384- 399 ,(2012) , 10.1007/S11263-011-0453-Z
Vasant Manohar, Stavros Tsakalidis, Pradeep Natarajan, Rohit Prasad, Prem Natarajan, None, Audio-visual fusion using bayesian model combination for web video retrieval Proceedings of the 19th ACM international conference on Multimedia - MM '11. pp. 1537- 1540 ,(2011) , 10.1145/2072298.2072059
Jingen Liu, Mubarak Shah, Benjamin Kuipers, Silvio Savarese, Cross-view action recognition via view knowledge transfer CVPR 2011. pp. 3209- 3216 ,(2011) , 10.1109/CVPR.2011.5995729
Ivan Laptev, On Space-Time Interest Points international conference on computer vision. ,vol. 64, pp. 107- 123 ,(2005) , 10.1007/S11263-005-1838-7
Y-Lan Boureau, Francis Bach, Yann LeCun, Jean Ponce, Learning mid-level features for recognition computer vision and pattern recognition. pp. 2559- 2566 ,(2010) , 10.1109/CVPR.2010.5539963
Manik Varma, S.v.n. Vishwanathan, Nawanol Ampornpunt, Zhaonan Sun, Multiple Kernel Learning and the SMO Algorithm neural information processing systems. ,vol. 23, pp. 2361- 2369 ,(2010)
Lorenzo Torresani, Martin Szummer, Andrew Fitzgibbon, Efficient object category recognition using classemes european conference on computer vision. pp. 776- 789 ,(2010) , 10.1007/978-3-642-15549-9_56
Ivan Laptev, Marcin Marszalek, Cordelia Schmid, Benjamin Rozenfeld, Learning realistic human actions from movies computer vision and pattern recognition. pp. 1- 8 ,(2008) , 10.1109/CVPR.2008.4587756