作者: Qian Lou , Yen-Chang Hsu , Burak Uzkent , Ting Hua , Yilin Shen
DOI:
关键词:
摘要: … MDETR model size; the multi-modal transformer is a transformer with hidden size 256. Further investigating the text encoder and the multi-modal transformer, we find out that the Linear …