作者: Eunbyung Park , Xufeng Han , Tamara L. Berg , Alexander C. Berg
DOI: 10.1109/WACV.2016.7477589
关键词:
摘要: Although deep convolutional neural networks (CNNs) have shown remarkable results for feature learning and prediction tasks, many recent studies demonstrated improved performance by incorporating additional handcrafted features or fusing predictions from multiple CNNs. Usually, these combinations are implemented via concatenation averaging output scores several In this paper, we present new approaches combining different sources of knowledge in learning. First, propose amplification, where use an auxiliary, hand-crafted, (e.g. optical flow) to perform spatially varying soft-gating on intermediate CNN maps. Second, a multiplicative fusion method CNNs trained that robust amplifying suppressing the activations based their agreement. We test methods context action recognition information spatial temporal cues is useful, obtaining comparable with state-of-the-art outperform using only flow features.