Token Adaptive Vision Transformer with Efficient Deployment for Fine-Grained Image Recognition

作者: Chonghan Lee , Rita Brugarolas Brufau , Ke Ding , Vijaykrishnan Narayanan

DOI:

关键词:

摘要: Fine-grained Visual Classification (FGVC) aims to distinguish object classes belonging to the same category, e.g., different bird species or models of vehicles. The task is more challenging than ordinary image classification due to the subtle inter-class differences. Recent works proposed deep learning models based on the vision transformer (ViT) architecture with its self-attention mechanism to locate important regions of the objects and derive global information. However, deploying them on resource-restricted internet of things (IoT) devices is challenging due to their intensive computational cost and memory footprint. Energy and power consumption varies in different IoT devices. To improve their inference efficiency, previous approaches require manually designing the model architecture and training a separate model for each computational budget. In this work, we propose Token Adaptive Vision Transformer …

参考文章(0)