Thieves on Sesame Street! Model Extraction of BERT-based APIs

作者： Nicolas Papernot , Mohit Iyyer , Ankur P. Parikh , Gaurav Singh Tomar , Kalpesh Krishna

DOI:

关键词:

摘要: We study the problem of model extraction in natural language processing, which an adversary with only query access to a victim attempts reconstruct local copy that model. Assuming both and fine-tune large pretrained such as BERT (Devlin et al., 2019), we show does not need any real training data successfully mount attack. In fact, attacker even use grammatical or semantically meaningful queries: random sequences words coupled task-specific heuristics form effective queries for on diverse set NLP tasks including inference question answering. Our work thus highlights exploit made feasible by shift towards transfer learning methods within community: budget few hundred dollars, can extract performs slightly worse than Finally, two defense strategies against extraction—membership classification API watermarking—which while successful some adversaries also be circumvented more clever ones.

arxiv-vanity.com 本地加速

openreview.net PDF 下载加速

arxiv.org PDF 下载加速

参考文章(42)

J.J. Godfrey, E.C. Holliman, J. McDaniel, SWITCHBOARD: telephone speech corpus for research and development international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 517- 520 ,(1992) , 10.1109/ICASSP.1992.225858

Jeffrey Pennington, Richard Socher, Christopher Manning, Glove: Global Vectors for Word Representation empirical methods in natural language processing. pp. 1532- 1543 ,(2014) , 10.3115/V1/D14-1162

Richard Socher, Andrew Ng, Christopher Potts, Christopher D. Manning, Jason Chuang, Alex Perelygin, Jean Wu, Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank empirical methods in natural language processing. pp. 1631- 1642 ,(2013)

Daniel Lowd, Christopher Meek, Adversarial learning knowledge discovery and data mining. pp. 641- 647 ,(2005) , 10.1145/1081870.1081950

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang, None, Deep Learning with Differential Privacy computer and communications security. pp. 308- 318 ,(2016) , 10.1145/2976749.2978318

Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov, Membership Inference Attacks Against Machine Learning Models 2017 IEEE Symposium on Security and Privacy (SP). pp. 3- 18 ,(2017) , 10.1109/SP.2017.41

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, Ananthram Swami, None, Practical Black-Box Attacks against Machine Learning computer and communications security. pp. 506- 519 ,(2017) , 10.1145/3052973.3053009

Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Illia Polosukhin, Llion Jones, Niki Parmar, Aidan N. Gomez, Lukasz Kaiser, Attention Is All You Need arXiv: Computation and Language. ,(2017)

Luke Zettlemoyer, Mohit Iyyer, Matt Gardner, Matthew E. Peters, Kenton Lee, Christopher Clark, Mark Neumann, Deep contextualized word representations arXiv: Computation and Language. ,(2018)

10.

Nicolas Papernot, Patrick D. McDaniel, Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning arXiv: Learning. ,(2018)

Thieves on Sesame Street! Model Extraction of BERT-based APIs

来源期刊

我的账户

Thieves on Sesame Street! Model Extraction of BERT-based APIs

来源期刊

相似文章 10

我的账户