作者: Mansour Al Ghanim , Muhammad Santriaji , Qian Lou , Yan Solihin
DOI:
关键词:
摘要: Transformer-based language models demonstrate exceptional performance in Natural Language Processing (NLP) tasks but remain susceptible to backdoor attacks involving hidden input triggers. Trojan injection via hardware bitflips presents a significant challenge for contemporary language models. However, previous research overlooks practical hardware considerations, such as DRAM and cache memory structures, resulting in unrealistic attacks that demand the manipulation of an excessive number of parameters and bits. In this paper, we present TrojBits, a novel approach requiring minimal bit-flips to effectively insert Trojans into real-world Transformer language model systems. This is achieved through a three-module framework designed to efficiently target Transformer-based language models, consisting of Vulnerable Parameters Ranking (VPR), Hardware-aware Attack Optimization (HAO), and …