作者: Lorenzo Baraldi , Matthijs Douze , Rita Cucchiara , Herve Jegou
关键词:
摘要: This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon revisits temporal match kernels within neural networks: we propose new layer that finds alignments by maximizing the scores between two sequences of vectors, according to time-sensitive similarity metric parametrized in Fourier domain. We learn this with proposal strategy, which minimize triplet loss takes into account both localization accuracy recognition rate. evaluate our on video alignment, copy detection event retrieval. outperforms state art alignment datasets comparable setups. It also attains best reported results particular search, while precisely