Composing Simple Image Descriptions using Web-scale N-grams

作者: Girish Kulkarni , Tamara L. Berg , Yejin Choi , Siming Li , Alexander C. Berg

DOI:

关键词:

摘要: Studying natural language, and especially how people describe the world around them can help us better understand visual world. In turn, it also in quest to generate language that describes this a human manner. We present simple yet effective approach automatically compose image descriptions given computer vision based inputs using web-scale n-grams. Unlike most previous work summarizes or retrieves pre-existing text relevant an image, our method composes sentences entirely from scratch. Experimental results indicate is viable textual are pertinent specific content of while permitting creativity description -- making for more human-like annotations than approaches.

参考文章(18)
Eduard Hovy, Liang Zhou, Template-Filtered Headline Summarization Text Summarization Branches Out. pp. 56- 60 ,(2004)
Prasad Tadepalli, Michael Chisholm, Learning Decision Rules by Randomized Iterative Local Search international conference on machine learning. pp. 75- 82 ,(2002)
Chee Wee Leong, Rada Mihalcea, Samer Hassan, Text Mining for Automatic Image Tagging international conference on computational linguistics. pp. 647- 655 ,(2010)
Mirella Lapata, Yansong Feng, Topic Models for Image Annotation and Text Illustration north american chapter of the association for computational linguistics. pp. 831- 839 ,(2010)
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth, Every Picture Tells a Story: Generating Sentences from Images Computer Vision – ECCV 2010. pp. 15- 29 ,(2010) , 10.1007/978-3-642-15561-1_2
Songsak Channarukul, Susan W. McRoy, Syed S. Ali, DOGHED Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Demonstrations - NAACL '03. pp. 5- 6 ,(2003) , 10.3115/1073427.1073430
Katerina Pastra, Horacio Saggion, Yorick Wilks, NLP for indexing and retrieval of captioned photographs conference of the european chapter of the association for computational linguistics. pp. 143- 146 ,(2003) , 10.3115/1067737.1067769
Benjamin Z Yao, Xiong Yang, Liang Lin, Mun Wai Lee, Song-Chun Zhu, I2T: Image Parsing to Text Description Proceedings of the IEEE. ,vol. 98, pp. 1485- 1508 ,(2010) , 10.1109/JPROC.2010.2050411
Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, Tamara L Berg, Baby talk: Understanding and generating simple image descriptions computer vision and pattern recognition. pp. 1601- 1608 ,(2011) , 10.1109/CVPR.2011.5995466
Dan Klein, Christopher D. Manning, Accurate unlexicalized parsing Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03. pp. 423- 430 ,(2003) , 10.3115/1075096.1075150