DNRTI: A Large-Scale Dataset for Named Entity Recognition in Threat Intelligence

作者: Ning Li , Zhengwei Jiang , Xuren Wang , Shengqin Ao , Mengbo Xiong

DOI: 10.1109/TRUSTCOM50675.2020.00252

关键词:

摘要: Named entity recognition is an important and challenging problem in Natural language processing. Although the past decade has witnessed major advances many fields, such successes have been slow to network security field, not only because of data field very professional, but also due sensitive information data. To advance named research we introduce a large-scale Dataset for Entity Recognition Threat Intelligence (DNRTI). this end, collect more than 300 pieces threat intelligence. The DNRTI all annotated by experts intelligence interpretation using 13 object categories. fully contains 175220 words. build baseline evaluate some deep learning model on DNRTI. Experiments demonstrate that well represents key are quite challenging.

参考文章(13)
Ilya Sutskever, Geoffrey E. Hinton, Alex Krizhevsky, Ruslan R. Salakhutdinov, Nitish Srivastava, Improving neural networks by preventing co-adaptation of feature detectors arXiv: Neural and Evolutionary Computing. ,(2012)
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
Felix A. Gers, Jürgen Schmidhuber, Fred Cummins, Learning to Forget: Continual Prediction with LSTM Neural Computation. ,vol. 12, pp. 2451- 2471 ,(2000) , 10.1162/089976600300015015
Erik F. Tjong Kim Sang, Fien De Meulder, Introduction to the CoNLL-2003 shared task: language-independent named entity recognition north american chapter of the association for computational linguistics. pp. 142- 147 ,(2003) , 10.3115/1119176.1119195
Jason Lee, Kyunghyun Cho, Thomas Hofmann, Fully Character-Level Neural Machine Translation without Explicit Segmentation Transactions of the Association for Computational Linguistics. ,vol. 5, pp. 365- 378 ,(2017) , 10.1162/TACL_A_00067
Nut Limsopatham, Nigel Collier, Bidirectional LSTM for Named Entity Recognition in Twitter Messages international conference on computational linguistics. pp. 145- 152 ,(2016) , 10.17863/CAM.7201
Daiki Chiba, Mitsuaki Akiyama, Takeshi Yagi, Kunio Hato, Tatsuya Mori, Shigeki Goto, DomainChroma: Building actionable threat intelligence from malicious domain names Computers & Security. ,vol. 77, pp. 138- 161 ,(2018) , 10.1016/J.COSE.2018.03.013
Fabian Böhm, Florian Menges, Günther Pernul, Graph-based visual analytics for cyber threat intelligence Cybersecurity. ,vol. 1, pp. 1- 19 ,(2018) , 10.1186/S42400-018-0017-4