作者: Girish Kulkarni , Tamara L. Berg , Yejin Choi , Siming Li , Alexander C. Berg
DOI:
关键词:
摘要: Studying natural language, and especially how people describe the world around them can help us better understand visual world. In turn, it also in quest to generate language that describes this a human manner. We present simple yet effective approach automatically compose image descriptions given computer vision based inputs using web-scale n-grams. Unlike most previous work summarizes or retrieves pre-existing text relevant an image, our method composes sentences entirely from scratch. Experimental results indicate is viable textual are pertinent specific content of while permitting creativity description -- making for more human-like annotations than approaches.