作者: Dhruv Jhamb , David Chan , John F Canny , Avideh Zakhor
DOI:
关键词:
摘要: Within the field of artificial intelligence (AI), multimodal learning is becoming a popular tool for solving different tasks. This is in part due to the abundance of available data that comes from different modalities. While combining different modalities increases the amount of knowledge models have access to, data incompleteness is a problem that diminishes these gains in information. In many multimodal datasets and the real world, there will not be complete modalities for each sample. In other work, generative modeling approaches, such as autoencoders and GANs, have been used to reconstruct the missing modality. Other more simple approaches have also been used as baselines, such as zero padding, which is padding feature representations of the missing modality with the value zero. The purpose of this project is to investigate the commonly used approaches for handling missing modalities by measuring performance on a downstream task. This will allow us to see the robustness of commonly used approaches in dealing with missing modalities during test time. This project will investigate modality reconstruction, namely which approaches can generate missing modality better than others, and see if trends hold across different datasets.The downstream task we will focus on is emotion recognition, using the multimodal datasets: RAVDESS, eNTERFACE’05, and CMU-MOSI. Making emotion recognition our task stems from it being an important prerequisite for AI systems in the future, with it being a central part of natural human-computer interactions. Emotion recognition can be used for tasks such as security measures, HR assistance, customer …