作者: Carlos Guestrin , Arvind Krishnamurthy , Tianqi Chen , Thierry Moreau , Luis Ceze
DOI:
关键词:
摘要: Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure high performance while sacrificing flexibility. Changes in algorithms, models, or numerical systems threaten viability specialized hardware accelerators. We propose VTA, programmable deep learning architecture template to be extensible face evolving workloads. VTA achieves this flexibility via parametrizable architecture, two-level ISA, JIT compiler. The ISA is based on (1) task-ISA that explicitly orchestrates concurrent compute memory tasks (2) microcode-ISA which implements wide variety operators with single-cycle tensor-tensor operations. Next, we runtime system equipped compiler flexible code-generation heterogeneous execution enables effective use architecture. integrated open-sourced into Apache TVM, state-of-the-art compilation stack provides diverse models divergent backends. flow performs design space exploration generate customized software operator library can leveraged by mainstream frameworks. demonstrate our approach deploying optimized used object classification style transfer edge-class FPGAs.