Attention Based Multi-Modal Fusion Architecture for Open-Ended Video Question Answering Systems

作者： Sumedh Pendurkar , Sameer Kolpekwar , Shreyas Dhoot , Yashodhara V. Haribhakta , Biplab Banerjee

DOI: 10.1016/J.PROCS.2020.04.047

关键词:

摘要: Abstract Open-ended Video Question Answering systems is a very challenging problem with widespread applications in real life. Existing tend to focus on single word video question answering system, which cannot be easily extended develop. In this paper, we propose using an architecture, popularly used for captioning solve the of open-ended based systems. For generating good answers, model required each frame separately as well understand how link information from different frames generate answer. The also needs keep mind modalities and adapt itself accordingly while processing videos questions. We attention multimodal fusion architecture (AMF-VQA) that uses mechanism at every time output word. Such kind allows outputting proposed flexible were can just add other such audio features, captions, etc. existing fine-tune get improve results if these new features are available.

sciencedirect.com 本地加速

sci-hub.st HTML 下载加速

参考文章(1)

Jeffrey Pennington, Richard Socher, Christopher Manning, Glove: Global Vectors for Word Representation empirical methods in natural language processing. pp. 1532- 1543 ,(2014) , 10.3115/V1/D14-1162

Attention Based Multi-Modal Fusion Architecture for Open-Ended Video Question Answering Systems

来源期刊

我的账户

Attention Based Multi-Modal Fusion Architecture for Open-Ended Video Question Answering Systems

来源期刊

相似文章 0

我的账户