作者: Robert Bolles , J. Brian Burns , Martin Graciarena , Andreas Kathol , Aaron Lawson
关键词: Visualization 、 Identity (object-oriented programming) 、 Speech recognition 、 Computer science 、 Semantic mapping 、 Spotting 、 Focus (computing) 、 Feature extraction 、 Face (geometry) 、 Computer vision 、 Artificial intelligence 、 Audio mining
摘要: This paper is part of a larger effort to detect manipulations video by searching for and combining the evidence multiple types inconsistencies between audio visual channels. Here, we focus on type scenes detected in modalities (e.g., indoor, small room versus outdoor, urban), speaker identity tracking over given features face voice change, but no talking change). The scene inconsistency task was complicated mismatches categories used current collections. To deal with this, employed novel semantic mapping method. process challenged complexity comparing tracks speech clusters, requiring method fusing these two sources. Our progress both tasks demonstrated collections tampered videos.