摘要: Attention over an observed image or natural sentence is run by spotting locating the region position of interest for pattern classification. The attention parameter seen as a latent variable, which was indirectly calculated minimizing classification loss. Using such mechanism, target information may not be correctly identified. Therefore, in addition to error, we can directly attend reconstruction error due supporting data. Our idea learn how through so-called supportive when available. A new mechanism developed conduct attentive learning translation invariance applied caption. derived helpful generating caption from input image. Moreover, this paper presents association network does only implement word-to-image attention, but also carry out image-to-image via self attention. relations between and text are sufficiently represented. Experiments on MS-COCO task show benefit proposed attentions with keyvalue memory network.