Baby talk: Understanding and generating simple image descriptions

作者: Girish Kulkarni , Visruth Premraj , Sagnik Dhar , Siming Li , Yejin Choi

DOI: 10.1109/CVPR.2011.5995466

关键词: Natural languageComputer scienceImage (mathematics)Baby talkSimple (philosophy)Natural language processingText miningParsingArtificial intelligence

摘要: We posit that visually descriptive language offers computer vision researchers both information about the world, and how people describe world. The potential benefit from this source is made more significant due to enormous amount of data easily available today. present a system automatically generate natural descriptions images exploits statistics gleaned parsing large quantities text recognition algorithms vision. very effective at producing relevant sentences for images. It also generates are notably true specific image content than previous work.

参考文章(50)
Eduard Hovy, Liang Zhou, Template-Filtered Headline Summarization Text Summarization Branches Out. pp. 56- 60 ,(2004)
Tamara L. Berg, Alexander C. Berg, Jonathan Shih, Automatic attribute discovery and characterization from noisy web data european conference on computer vision. pp. 663- 676 ,(2010) , 10.1007/978-3-642-15549-9_48
Abhinav Gupta, Larry S. Davis, Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers european conference on computer vision. pp. 16- 29 ,(2008) , 10.1007/978-3-540-88682-2_3
Atsuhiro Kojima, Takeshi Tamura, Kunio Fukunaga, Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions International Journal of Computer Vision. ,vol. 50, pp. 171- 184 ,(2002) , 10.1023/A:1020346032608
Girish Kulkarni, Tamara L. Berg, Yejin Choi, Siming Li, Alexander C. Berg, Composing Simple Image Descriptions using Web-scale N-grams conference on computational natural language learning. pp. 220- 228 ,(2011)
Yiannis Aloimonos, Hal Daume Iii, Yezhou Yang, Ching Teo, Corpus-Guided Sentence Generation of Natural Images empirical methods in natural language processing. pp. 444- 454 ,(2011)
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth, Every Picture Tells a Story: Generating Sentences from Images Computer Vision – ECCV 2010. pp. 15- 29 ,(2010) , 10.1007/978-3-642-15561-1_2
Songsak Channarukul, Susan W. McRoy, Syed S. Ali, DOGHED Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Demonstrations - NAACL '03. pp. 5- 6 ,(2003) , 10.3115/1073427.1073430
Benjamin Z Yao, Xiong Yang, Liang Lin, Mun Wai Lee, Song-Chun Zhu, I2T: Image Parsing to Text Description Proceedings of the IEEE. ,vol. 98, pp. 1485- 1508 ,(2010) , 10.1109/JPROC.2010.2050411
A. Torralba, K. P. Murphy, W. T. Freeman, Using the forest to see the trees Communications of the ACM. ,vol. 53, pp. 107- 114 ,(2010) , 10.1145/1666420.1666446