Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video

作者: Robert Bolles , J. Brian Burns , Martin Graciarena , Andreas Kathol , Aaron Lawson

DOI: 10.1109/CVPRW.2017.238

关键词: VisualizationIdentity (object-oriented programming)Speech recognitionComputer scienceSemantic mappingSpottingFocus (computing)Feature extractionFace (geometry)Computer visionArtificial intelligenceAudio mining

摘要: This paper is part of a larger effort to detect manipulations video by searching for and combining the evidence multiple types inconsistencies between audio visual channels. Here, we focus on type scenes detected in modalities (e.g., indoor, small room versus outdoor, urban), speaker identity tracking over given features face voice change, but no talking change). The scene inconsistency task was complicated mismatches categories used current collections. To deal with this, employed novel semantic mapping method. process challenged complexity comparing tracks speech clusters, requiring method fusing these two sources. Our progress both tasks demonstrated collections tampered videos.

参考文章(26)
Tomas Mikolov, Andrea Frome, Greg S. Corrado, Samy Bengio, Mohammad Norouzi, Yoram Singer, Jonathon Shlens, Jeffrey Dean, Zero-Shot Learning by Convex Combination of Semantic Embeddings international conference on learning representations. ,(2014)
Bart Thomee, David A. Shamma, Damian Borth, Benjamin Elizalde, Li-Jia Li, Douglas Poland, Gerald Friedland, Karl Ni, The New Data and New Challenges in Multimedia Research. arXiv: Multimedia. ,(2015)
Tomas Mikolov, Greg S. Corrado, Kai Chen, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space international conference on learning representations. ,(2013)
Gunnar Farnebäck, Two-frame motion estimation based on polynomial expansion scandinavian conference on image analysis. ,vol. 2749, pp. 363- 370 ,(2003) , 10.1007/3-540-45103-X_50
Thomas Mensink, Efstratios Gavves, Cees G.M. Snoek, COSTA: Co-Occurrence Statistics for Zero-Shot Classification computer vision and pattern recognition. pp. 2441- 2448 ,(2014) , 10.1109/CVPR.2014.313
W. Ge, R.T. Collins, Multi-target data association by tracklets with unsupervised parameter estimation british machine vision conference. pp. 1- 10 ,(2008) , 10.5244/C.22.93
M Sai Praneeth, Xudong Peng, Alice Li, Shahrzad Hosseini Vajargah, Going deeper with convolutions computer vision and pattern recognition. pp. 1- 9 ,(2015) , 10.1109/CVPR.2015.7298594
Jaeyoung Choi, Bart Thomee, Gerald Friedland, Liangliang Cao, Karl Ni, Damian Borth, Benjamin Elizalde, Luke Gottlieb, Carmen Carrano, Roger Pearce, Doug Poland, The Placing Task: A Large-Scale Geo-Estimation Challenge for Social-Media Videos and Images acm multimedia. pp. 27- 31 ,(2014) , 10.1145/2661118.2661125
Mihir Jain, Jan C. van Gemert, Thomas Mensink, Cees G. M. Snoek, Objects2action: Classifying and Localizing Actions without Any Video Example 2015 IEEE International Conference on Computer Vision (ICCV). pp. 4588- 4596 ,(2015) , 10.1109/ICCV.2015.521
Amirhossein Habibian, Thomas Mensink, Cees G. M. Snoek, Video2vec Embeddings Recognize Events When Examples Are Scarce IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 39, pp. 2089- 2103 ,(2017) , 10.1109/TPAMI.2016.2627563