A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

作者： Zhi Liu , Sanya Liu , Lin Liu , Meng Wang , Jianwen Sun

DOI: 10.1080/1206212X.2016.1160643

关键词:

摘要: AbstractThe auto authorship recognition has become a novel technique to investigate cybercrimes. But the challenge of research is that huge number features exist in moderate-sized corpus, which causes curse over-training. Besides, it hard distinguish between potential authors only by single feature set. In this paper, we proposed random sampling style ensemble method with individual-author selection exploit high-dimensional space. The randomly picks writing-style on each set (IAFS) partitioned from whole IAFSs are heuristically selected training author. Then, multiple base classifiers (BCs) formed sampled sets. Finally, all BCs fused get final decision. Experimental results real-life Chinese forum data verify robustness compared conventional methods. We also analyze diversity algorith...

tandfonline.com 本地加速

sci-hub.se PDF 下载加速

参考文章(21)

Michael R. Schmid, Farkhund Iqbal, Benjamin C.M. Fung, E-mail authorship attribution using customized associative classification Digital Investigation. ,vol. 14, ,(2015) , 10.1016/J.DIIN.2015.05.012

John Houvardas, Efstathios Stamatatos, N-Gram Feature Selection for Authorship Identification Artificial Intelligence: Methodology, Systems, and Applications. pp. 77- 86 ,(2006) , 10.1007/11861461_10

Thiago Salles, Leonardo Rocha, Marcos André Gonçalves, Jussara M. Almeida, Fernando Mourão, Wagner Meira, Felipe Viegas, A quantitative analysis of the temporal effects on automatic text classification association for information science and technology. ,vol. 67, pp. 1639- 1667 ,(2016) , 10.1002/ASI.23452

Zhi Liu, Zongkai Yang, Sanya Liu, Yinghui Shi, Semi-random subspace method for writeprint identification Neurocomputing. ,vol. 108, pp. 93- 102 ,(2013) , 10.1016/J.NEUCOM.2012.11.015

Jiexun Li, Rong Zheng, Hsinchun Chen, From fingerprint to writeprint Communications of The ACM. ,vol. 49, pp. 76- 82 ,(2006) , 10.1145/1121949.1121951

EFSTATHIOS STAMATATOS, AUTHORSHIP ATTRIBUTION BASED ON FEATURE SET SUBSPACING ENSEMBLES International Journal on Artificial Intelligence Tools. ,vol. 15, pp. 823- 838 ,(2006) , 10.1142/S0218213006002965

Marcelo Luiz Brocardo, Issa Traore, Isaac Woungang, Authorship verification of e-mail and tweet messages applied for continuous authentication Journal of Computer and System Sciences. ,vol. 81, pp. 1429- 1440 ,(2015) , 10.1016/J.JCSS.2014.12.019

Upul Bandara, Gamini Wijayarathna, Source code author identification with unsupervised feature learning Pattern Recognition Letters. ,vol. 34, pp. 330- 334 ,(2013) , 10.1016/J.PATREC.2012.10.027

Sarwat Nizamani, Nasrullah Memon, CEAI: CCM based Email Authorship Identification Model Egyptian Informatics Journal. ,vol. 14, pp. 239- 249 ,(2013) , 10.1016/J.EIJ.2013.10.001

10.

Ahmed Abbasi, Hsinchun Chen, Writeprints ACM Transactions on Information Systems. ,vol. 26, pp. 1- 29 ,(2008) , 10.1145/1344411.1344413

A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

来源期刊

我的账户

A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

来源期刊

相似文章 0

我的账户