作者: Sen Li , Jianwu Fang , Hongke Xu , Jianru Xue
DOI: 10.1109/TCSVT.2020.2984783
关键词:
摘要: Future frame prediction in video is one of the most important problem computer vision, and useful for a range practical applications, such as intention or anomaly detection. However, this task challenging because complex dynamic evolution scene. The difficulty to model inherent spatio-temporal correlation between frames pose an adaptive flexible framework large motion change appearance variation. In paper, we construct deep multi-branch mask network (DMMNet) which adaptively fuses advantages optical flow warping RGB pixel synthesizing methods, i.e., common two kinds approaches task. procedure DMMNet, add layer each branch adjust magnitude estimated weight predicted by synthesizing, respectively. other words, provide more masking fusion on prediction. Exhaustive experiments Caltech pedestrian UCF101 datasets show that proposed can obtain favorable performance compared with state-of-the-art methods. addition, also put our into detection problem, superiority verified UCSD dataset.