Straight to the Point: Fast-Forwarding Videos via Reinforcement Learning Using Textual Data

作者: Washington De Souza Ramos , Michel M Silva , Edson R Araujo , Leandro Soriano Marcolino , Erickson R Nascimento

DOI: 10.1109/CVPR42600.2020.01094

关键词:

摘要: The rapid increase in the amount of published visual data and limited time users bring demand for processing untrimmed videos to produce shorter versions that convey same information. Despite remarkable progress has been made by summarization methods, most them can only select a few frames or skims, which creates gaps breaks video context. In this paper, we present novel methodology based on reinforcement learning formulation accelerate instructional videos. Our approach adaptively are not relevant information without creating final video. agent is textually visually oriented remove shrink input Additionally, propose network, called Visually-guided Document Attention Network (VDAN), able generate highly discriminative embedding space represent both textual data. experiments show our method achieves best performance terms F1 Score coverage at segment level.

参考文章(38)
Masaya Okamoto, Keiji Yanai, Summarization of Egocentric Moving Videos for Generating Walking Route Guidance pacific-rim symposium on image and video technology. pp. 431- 442 ,(2013) , 10.1007/978-3-642-53842-1_37
Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Jointly Modeling Embedding and Translation to Bridge Video and Language computer vision and pattern recognition. pp. 4594- 4602 ,(2016) , 10.1109/CVPR.2016.497
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context Computer Vision – ECCV 2014. pp. 740- 755 ,(2014) , 10.1007/978-3-319-10602-1_48
Michael Gygli, Helmut Grabner, Luc Van Gool, Video summarization by learning submodular mixtures of objectives computer vision and pattern recognition. pp. 3090- 3098 ,(2015) , 10.1109/CVPR.2015.7298928
Yair Poleg, Shmuel Peleg, Tavi Halperin, Chetan Arora, EgoSampling: Fast-forward and stereo for egocentric videos computer vision and pattern recognition. pp. 4768- 4776 ,(2015) , 10.1109/CVPR.2015.7299109
Johannes Kopf, Michael F. Cohen, Richard Szeliski, First-person hyper-lapse videos international conference on computer graphics and interactive techniques. ,vol. 33, pp. 78- ,(2014) , 10.1145/2601097.2601195
Neel Joshi, Wolf Kienzle, Mike Toelle, Matt Uyttendaele, Michael F. Cohen, Real-time hyperlapse creation via optimal frame selection international conference on computer graphics and interactive techniques. ,vol. 34, pp. 63- ,(2015) , 10.1145/2766954
Lucas Paletta, Axel Pinz, Active object recognition by view integration and reinforcement learning Robotics and Autonomous Systems. ,vol. 31, pp. 71- 86 ,(2000) , 10.1016/S0921-8890(99)00079-2
Yong Jae Lee, J. Ghosh, K. Grauman, Discovering important people and objects for egocentric video summarization computer vision and pattern recognition. pp. 1346- 1353 ,(2012) , 10.1109/CVPR.2012.6247820