Explore and Explain: Self-supervised Navigation and Recounting

作者: Silvia Cascianelli , Rita Cucchiara , Lorenzo Baraldi , Marcella Cornia , Federico Landi

DOI:

关键词:

摘要: Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs explore previously unknown environment while recounting what sees during path. context, navigate driven by exploration goal, select proper moments for description, output natural language descriptions relevant objects scenes. Our model integrates self-supervised module with penalty, fully-attentive captioning explanation. Also, investigate different policies selecting explanation, information coming from both navigation. Experiments are conducted on photorealistic environments Matterport3D dataset navigation explanation capabilities well role their interactions.

参考文章(41)
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 39, pp. 1137- 1149 ,(2017) , 10.1109/TPAMI.2016.2577031
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio, None, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention international conference on machine learning. ,vol. 3, pp. 2048- 2057 ,(2015)
Yi Sun, Faustino Gomez, Jürgen Schmidhuber, Planning to be surprised: optimal Bayesian exploration in dynamic environments artificial general intelligence. pp. 41- 51 ,(2011) , 10.1007/978-3-642-22887-2_5
A.S. Klyubin, D. Polani, C.L. Nehaniv, Empowerment: a universal agent-centric measure of control congress on evolutionary computation. ,vol. 1, pp. 128- 135 ,(2005) , 10.1109/CEC.2005.1554676
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context Computer Vision – ECCV 2014. pp. 740- 755 ,(2014) , 10.1007/978-3-319-10602-1_48
Andrej Karpathy, Li Fei-Fei, Deep visual-semantic alignments for generating image descriptions computer vision and pattern recognition. pp. 3128- 3137 ,(2015) , 10.1109/CVPR.2015.7298932
Pierre-Yves Oudeyer, Frederic Kaplan, What is Intrinsic Motivation? A Typology of Computational Approaches. Frontiers in Neurorobotics. ,vol. 1, pp. 6- 6 ,(2007) , 10.3389/NEURO.12.006.2007
Jürgen Schmidhuber, Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) IEEE Transactions on Autonomous Mental Development. ,vol. 2, pp. 230- 247 ,(2010) , 10.1109/TAMD.2010.2056368
M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, The arcade learning environment: an evaluation platform for general agents Journal of Artificial Intelligence Research. ,vol. 47, pp. 253- 279 ,(2013) , 10.1613/JAIR.3912
H. W. Kuhn, The Hungarian method for the assignment problem Naval Research Logistics Quarterly. ,vol. 2, pp. 83- 97 ,(1955) , 10.1002/NAV.3800020109