Jointly modeling deep video and compositional text to bridge vision and language in a unified framework

作者: Jason J. Corso , Caiming Xiong , Ran Xu , Wei Chen

DOI:

关键词:

摘要: … represent the child nodes, we reconstruct the child nodes … to build the video-language space and compare video retrieval/text … deep video feature and average of word vector to learn the …

参考文章(37)
Abhinav Gupta, Praveen Srinivasan, Jianbo Shi, Larry S. Davis, Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos computer vision and pattern recognition. pp. 2012- 2019 ,(2009) , 10.1109/CVPR.2009.5206492
Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi, WordNet::Similarity Demonstration Papers at HLT-NAACL 2004 on XX - HLT-NAACL '04. pp. 38- 41 ,(2004) , 10.3115/1614025.1614037
Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko, YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition international conference on computer vision. pp. 2712- 2719 ,(2013) , 10.1109/ICCV.2013.337
Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng, Grounded Compositional Semantics for Finding and Describing Images with Sentences Transactions of the Association for Computational Linguistics. ,vol. 2, pp. 207- 218 ,(2014) , 10.1162/TACL_A_00177
Ting Yao, Tao Mei, Chong-Wah Ngo, Shipeng Li, Annotation for free: video tagging by mining user search behavior acm multimedia. pp. 977- 986 ,(2013) , 10.1145/2502081.2502085
Kate Saenko, Raymond Mooney, Sergio Guadarrama, Girish Malkarnenkar, Niveda Krishnamoorthy, Generating Natural-Language Video Descriptions Using Text-Mined Knowledge national conference on artificial intelligence. pp. 10- 19 ,(2013)
Ilya Sutskever, Tomas Mikolov, Greg S Corrado, Kai Chen, Jeff Dean, Distributed Representations of Words and Phrases and their Compositionality neural information processing systems. ,vol. 26, pp. 3111- 3119 ,(2013)
Ilya Sutskever, Geoffrey E. Hinton, Alex Krizhevsky, ImageNet Classification with Deep Convolutional Neural Networks neural information processing systems. ,vol. 25, pp. 1097- 1105 ,(2012)
David Chen, William B Dolan, Collecting Highly Parallel Data for Paraphrase Evaluation meeting of the association for computational linguistics. pp. 190- 200 ,(2011)
Vignesh Ramanathan, Percy Liang, Li Fei-Fei, Video Event Understanding Using Natural Language Descriptions international conference on computer vision. pp. 905- 912 ,(2013) , 10.1109/ICCV.2013.117