Thieves on Sesame Street! Model Extraction of BERT-based APIs

作者: Nicolas Papernot , Mohit Iyyer , Ankur P. Parikh , Gaurav Singh Tomar , Kalpesh Krishna

DOI:

关键词:

摘要: We study the problem of model extraction in natural language processing, which an adversary with only query access to a victim attempts reconstruct local copy that model. Assuming both and fine-tune large pretrained such as BERT (Devlin et al., 2019), we show does not need any real training data successfully mount attack. In fact, attacker even use grammatical or semantically meaningful queries: random sequences words coupled task-specific heuristics form effective queries for on diverse set NLP tasks including inference question answering. Our work thus highlights exploit made feasible by shift towards transfer learning methods within community: budget few hundred dollars, can extract performs slightly worse than Finally, two defense strategies against extraction—membership classification API watermarking—which while successful some adversaries also be circumvented more clever ones.

参考文章(42)
J.J. Godfrey, E.C. Holliman, J. McDaniel, SWITCHBOARD: telephone speech corpus for research and development international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 517- 520 ,(1992) , 10.1109/ICASSP.1992.225858
Jeffrey Pennington, Richard Socher, Christopher Manning, Glove: Global Vectors for Word Representation empirical methods in natural language processing. pp. 1532- 1543 ,(2014) , 10.3115/V1/D14-1162
Richard Socher, Andrew Ng, Christopher Potts, Christopher D. Manning, Jason Chuang, Alex Perelygin, Jean Wu, Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank empirical methods in natural language processing. pp. 1631- 1642 ,(2013)
Daniel Lowd, Christopher Meek, Adversarial learning knowledge discovery and data mining. pp. 641- 647 ,(2005) , 10.1145/1081870.1081950
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang, None, Deep Learning with Differential Privacy computer and communications security. pp. 308- 318 ,(2016) , 10.1145/2976749.2978318
Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov, Membership Inference Attacks Against Machine Learning Models 2017 IEEE Symposium on Security and Privacy (SP). pp. 3- 18 ,(2017) , 10.1109/SP.2017.41
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, Ananthram Swami, None, Practical Black-Box Attacks against Machine Learning computer and communications security. pp. 506- 519 ,(2017) , 10.1145/3052973.3053009
Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Illia Polosukhin, Llion Jones, Niki Parmar, Aidan N. Gomez, Lukasz Kaiser, Attention Is All You Need arXiv: Computation and Language. ,(2017)
Luke Zettlemoyer, Mohit Iyyer, Matt Gardner, Matthew E. Peters, Kenton Lee, Christopher Clark, Mark Neumann, Deep contextualized word representations arXiv: Computation and Language. ,(2018)
Nicolas Papernot, Patrick D. McDaniel, Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning arXiv: Learning. ,(2018)