作者: Xiaodan Liang , Liang Lin , Bailing Li , Guanbin Li , Qingxing Cao
DOI:
关键词:
摘要: … The key to this task is the capability of coreasoning over both image and language domains. However, most of the previous methods [21, 20, 16] work more like a black-box manner, ie, …