LAMV: Learning to Align and Match Videos with Kernelized Temporal Layers

作者: Lorenzo Baraldi , Matthijs Douze , Rita Cucchiara , Herve Jegou

DOI: 10.1109/CVPR.2018.00814

关键词:

摘要: This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon revisits temporal match kernels within neural networks: we propose new layer that finds alignments by maximizing the scores between two sequences of vectors, according to time-sensitive similarity metric parametrized in Fourier domain. We learn this with proposal strategy, which minimize triplet loss takes into account both localization accuracy recognition rate. evaluate our on video alignment, copy detection event retrieval. outperforms state art alignment datasets comparable setups. It also attains best reported results particular search, while precisely

参考文章(32)
João F. Henriques, Rui Caseiro, Pedro Martins, Jorge Batista, Exploiting the circulant structure of tracking-by-detection with kernels european conference on computer vision. pp. 702- 715 ,(2012) , 10.1007/978-3-642-33765-9_50
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks 2015 IEEE International Conference on Computer Vision (ICCV). pp. 4489- 4497 ,(2015) , 10.1109/ICCV.2015.510
Herve Jegou, Matthijs Douze, Cordelia Schmid, Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search european conference on computer vision. ,vol. 5302, pp. 304- 317 ,(2008) , 10.1007/978-3-540-88682-2_24
Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure 2015 IEEE International Conference on Computer Vision (ICCV). pp. 4507- 4515 ,(2015) , 10.1109/ICCV.2015.512
Giorgos Tolias, Teddy Furon, Hervé Jégou, Orientation Covariant Aggregation of Local Descriptors with Embeddings european conference on computer vision. pp. 382- 397 ,(2014) , 10.1007/978-3-319-10599-4_25
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context Computer Vision – ECCV 2014. pp. 740- 755 ,(2014) , 10.1007/978-3-319-10602-1_48
Basura Fernando, Efstratios Gavves, M. Jose Oramas, Amir Ghodrati, Tinne Tuytelaars, Modeling video evolution for action recognition computer vision and pattern recognition. pp. 5378- 5387 ,(2015) , 10.1109/CVPR.2015.7299176
Andrei Bursuc, Giorgos Tolias, Hervé Jégou, Kernel Local Descriptors with Implicit Rotation Matching international conference on multimedia retrieval. pp. 595- 598 ,(2015) , 10.1145/2671188.2749379
Sébastien Poullot, Shunsuke Tsukatani, Anh Phuong Nguyen, Hervé Jégou, Shin'Ichi Satoh, Temporal Matching Kernel with Explicit Feature Maps acm multimedia. pp. 381- 390 ,(2015) , 10.1145/2733373.2806228
Matthijs Douze, Jerome Revaud, Cordelia Schmid, Herve Jegou, Stable Hyper-pooling and Query Expansion for Event Detection international conference on computer vision. pp. 1825- 1832 ,(2013) , 10.1109/ICCV.2013.229