Translating Videos to Natural Language Using Deep Recurrent Neural Networks

作者： Kate Saenko , Subhashini Venugopalan , Jeff Donahue , Raymond Mooney , Marcus Rohrbach

DOI:

关键词: Object (computer science) 、 Artificial neural network 、 Computer science 、 Recurrent neural network 、 Artificial intelligence 、 Natural language 、 Natural language processing 、 Vocabulary 、 Deep learning 、 Sentence 、 Symbol grounding 、 Machine learning

摘要: Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep …

arxiv.org 本地加速

arxiv.org PDF 下载加速

参考文章(46)

Yoshihiko Gotoh, Muhammad Usman Ghani Khan, Describing Video Contents in Natural Language Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data. pp. 27- 35 ,(2012)

Jason J. Corso, Caiming Xiong, Ran Xu, Wei Chen, Jointly modeling deep video and compositional text to bridge vision and language in a unified framework national conference on artificial intelligence. pp. 2346- 2352 ,(2015)

Ryan Kiros, Ruslan Salakhutdinov, Richard S Zemel, None, Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models arXiv: Learning. ,(2014)

Ilya Sutskever, Wojciech Zaremba, Learning to Execute arXiv: Neural and Evolutionary Computing. ,(2014)

Manfred Pinkal, Bernt Schiele, Anna Rohrbach, Marcus Rohrbach, Marcus Rohrbach, Wei Qiu, Wei Qiu, Annemarie Friedrich, Coherent Multi-sentence Video Description with Variable Level of Detail german conference on pattern recognition. pp. 184- 195 ,(2014) , 10.1007/978-3-319-11752-2_15

Atsuhiro Kojima, Takeshi Tamura, Kunio Fukunaga, Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions International Journal of Computer Vision. ,vol. 50, pp. 171- 184 ,(2002) , 10.1023/A:1020346032608

Matthew D. Zeiler, Rob Fergus, Visualizing and Understanding Convolutional Networks european conference on computer vision. pp. 818- 833 ,(2014) , 10.1007/978-3-319-10590-1_53

Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth, Every Picture Tells a Story: Generating Sentences from Images Computer Vision – ECCV 2010. pp. 15- 29 ,(2010) , 10.1007/978-3-642-15561-1_2

Philipp Koehn, Kevin Knight, Statistical Machine Translation ,(2010)

10.

Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K. Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, From captions to visual concepts and back computer vision and pattern recognition. pp. 1473- 1482 ,(2015) , 10.1109/CVPR.2015.7298754

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

来源期刊

我的账户

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

来源期刊

相似文章 10

我的账户