The Costs and Benefits of Goal-Directed Attention in Deep Convolutional Neural Networks

作者: Brett D. Roads , Bradley C. Love , Xiaoliang Luo

DOI: 10.1007/S42113-021-00098-Y

关键词: Cognitive neuroscience of visual object recognitionFalse alarmSensitivity (control systems)Computer scienceMechanism (biology)Task (project management)Transfer of learningProcess (engineering)Convolutional neural networkArtificial intelligenceMachine learning

摘要: People deploy top-down, goal-directed attention to accomplish tasks, such as finding lost keys. By tuning the visual system relevant information sources, object recognition can become more efficient (a benefit) and biased toward target potential cost). Motivated by selective in categorisation models, we developed a mechanism that process naturalistic (photographic) stimuli. Our be incorporated into any existing deep convolutional neural networks (DCNNs). The processing stages DCNNs have been related ventral stream. In light, our attentional incorporates top-down influences from prefrontal cortex (PFC) support behaviour. Akin how weights models warp representational spaces, introduce layer of mid-level DCNN amplify or attenuate activity further goal. We evaluated using photographic stimuli, varying target. found increasing has benefits (increasing hit rates) costs false alarm rates). At moderate level, improves sensitivity (i.e. increases $d^{\prime }$ ) at only increase bias for tasks involving standard images, blended images natural adversarial chosen fool DCNNs. These results suggest reconfigure general-purpose better suit current task goal, much like PFC modulates along addition being parsimonious brain consistent, approach performed than machine learning transfer learning, namely retraining final network accommodate new task.

参考文章(51)
Stefan Treue, Julio C. Martínez Trujillo, Feature-based attention influences motion processing gain in macaque visual cortex Nature. ,vol. 399, pp. 575- 579 ,(1999) , 10.1038/21176
Anh Nguyen, Jason Yosinski, Jeff Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images computer vision and pattern recognition. pp. 427- 436 ,(2015) , 10.1109/CVPR.2015.7298640
Charles E. Connor, Howard E. Egeth, Steven Yantis, Visual Attention: Bottom-Up Versus Top-Down Current Biology. ,vol. 14, pp. R850- R852 ,(2004) , 10.1016/J.CUB.2004.09.041
John K. Kruschke, ALCOVE: an exemplar-based connectionist model of category learning. Psychological Review. ,vol. 99, pp. 22- 44 ,(1992) , 10.1037/0033-295X.99.1.22
George A. Miller, WordNet Communications of the ACM. ,vol. 38, pp. 39- 41 ,(1995) , 10.1145/219717.219748
Jeremy M. Wolfe, Guided Search 2.0 A revised model of visual search Psychonomic Bulletin & Review. ,vol. 1, pp. 202- 238 ,(1994) , 10.3758/BF03200774
John K. Kruschke, ALCOVE: A Connectionist Model of Human Category Learning neural information processing systems. pp. 649- 655 ,(1990)
Moshe Bar, A Cortical Mechanism for Triggering Top-Down Facilitation in Visual Object Recognition Journal of Cognitive Neuroscience. ,vol. 15, pp. 600- 609 ,(2003) , 10.1162/089892903321662976
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei, ImageNet: A large-scale hierarchical image database computer vision and pattern recognition. pp. 248- 255 ,(2009) , 10.1109/CVPR.2009.5206848
L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 20, pp. 1254- 1259 ,(1998) , 10.1109/34.730558