作者: Kiwon Yun , Yifan Peng , Dimitris Samaras , Gregory J. Zelinsky , Tamara L. Berg
关键词:
摘要: We posit that user behavior during natural viewing of images contains an abundance information about the content as well related to intent and defined importance. In this paper, we conduct experiments better understand relationship between images, eye movements people make while how construct language describe images. explore these relationships in context two commonly used computer vision datasets. then further relate human cues with outputs current visual recognition systems demonstrate prototype applications for gaze-enabled detection annotation.