Corpus-Guided Sentence Generation of Natural Images

作者: Yiannis Aloimonos , Hal Daume Iii , Yezhou Yang , Ching Teo

DOI:

关键词:

摘要: We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions make up core structure. The input are initial noisy estimates of objects detected in image using state art trained detectors. As actions from still directly is unreliable, we use language model English Gigaword corpus to obtain their estimates; together with probabilities co-located prepositions. these as parameters on HMM models process, hidden nodes components detections emissions. Experimental results show our combining vision produces readable descriptive sentences compared naive strategies alone.

参考文章(24)
Dave Golland, Percy Liang, Dan Klein, A Game-Theoretic Approach to Generating Spatial Descriptions empirical methods in natural language processing. pp. 410- 419 ,(2010)
Aude Oliva, Antonio Torralba, Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope International Journal of Computer Vision. ,vol. 42, pp. 145- 175 ,(2001) , 10.1023/A:1011139631724
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth, Every Picture Tells a Story: Generating Sentences from Images Computer Vision – ECCV 2010. pp. 15- 29 ,(2010) , 10.1007/978-3-642-15561-1_2
Jinho D. Choi, Martha Palmer, Robust Constituent-to-Dependency Conversion for English ,(2010)
A. Kojima, M. Izumi, T. Tamura, K. Fukunaga, Generating natural language description of human behavior from video images international conference on pattern recognition. ,vol. 4, pp. 4728- 4731 ,(2000) , 10.1109/ICPR.2000.903020
Benjamin Z Yao, Xiong Yang, Liang Lin, Mun Wai Lee, Song-Chun Zhu, I2T: Image Parsing to Text Description Proceedings of the IEEE. ,vol. 98, pp. 1485- 1508 ,(2010) , 10.1109/JPROC.2010.2050411
Bangpeng Yao, Li Fei-Fei, Grouplet: A structured image representation for recognizing human and object interactions computer vision and pattern recognition. pp. 9- 16 ,(2010) , 10.1109/CVPR.2010.5540234
Hsuan-Tien Lin, Chih-Jen Lin, Ruby C Weng, None, A note on Platt's probabilistic outputs for support vector machines Machine Learning. ,vol. 68, pp. 267- 276 ,(2007) , 10.1007/S10994-007-5018-6
C. Urgesi, V. Moro, M. Candidi, S. M. Aglioti, Mapping Implied Body Actions in the Human Motor System The Journal of Neuroscience. ,vol. 26, pp. 7942- 7949 ,(2006) , 10.1523/JNEUROSCI.1289-06.2006
Kathleen McKeown, Query-focused summarization using text-to-text generation: when information comes from multilingual sources UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation. pp. 3- 3 ,(2009) , 10.3115/1708155.1708157