作者: Washington De Souza Ramos , Michel M Silva , Edson R Araujo , Leandro Soriano Marcolino , Erickson R Nascimento
DOI: 10.1109/CVPR42600.2020.01094
关键词:
摘要: The rapid increase in the amount of published visual data and limited time users bring demand for processing untrimmed videos to produce shorter versions that convey same information. Despite remarkable progress has been made by summarization methods, most them can only select a few frames or skims, which creates gaps breaks video context. In this paper, we present novel methodology based on reinforcement learning formulation accelerate instructional videos. Our approach adaptively are not relevant information without creating final video. agent is textually visually oriented remove shrink input Additionally, propose network, called Visually-guided Document Attention Network (VDAN), able generate highly discriminative embedding space represent both textual data. experiments show our method achieves best performance terms F1 Score coverage at segment level.