作者: Trisha Mittal , Uttaran Bhattacharya , Rohan Chandra , Aniket Bera , Dinesh Manocha
关键词: Computer science 、 Audio visual 、 Similarity (psychology) 、 Speech recognition 、 Artificial intelligence 、 Affective computing 、 Network architecture 、 Modalities 、 Metric (mathematics) 、 Deep learning 、 Triplet loss
摘要: We present a learning-based method for detecting real and fake deepfake multimedia content. To maximize information learning, we extract analyze the similarity between two audio visual modalities from within same video. Additionally, compare affective cues corresponding to perceived emotion video infer whether input is "real" or "fake". propose deep learning network, inspired by Siamese network architecture triplet loss. validate our model, report AUC metric on large-scale detection datasets, DeepFake-TIMIT Dataset DFDC. approach with several SOTA methods per-video of 84.4% DFDC 96.6% DF-TIMIT respectively. best knowledge, ours first that simultaneously exploits also emotions detection.