Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

作者: Chao-Yuan Wu , Yanghao Li , Karttikeya Mangalam , Haoqi Fan , Bo Xiong

DOI:

关键词:

摘要: … The techniques presented in this paper are general and applicable to other transformer-based video models. We hope MeMViT will be useful for future long-term video modeling …

参考文章(0)