A discriminative random sampling strategy with individual-author feature selection for writeprint recognition of Chinese texts

作者: Zhi Liu , Sanya Liu , Lin Liu , Meng Wang , Jianwen Sun

DOI: 10.1080/1206212X.2016.1160643

关键词:

摘要: AbstractThe auto authorship recognition has become a novel technique to investigate cybercrimes. But the challenge of research is that huge number features exist in moderate-sized corpus, which causes curse over-training. Besides, it hard distinguish between potential authors only by single feature set. In this paper, we proposed random sampling style ensemble method with individual-author selection exploit high-dimensional space. The randomly picks writing-style on each set (IAFS) partitioned from whole IAFSs are heuristically selected training author. Then, multiple base classifiers (BCs) formed sampled sets. Finally, all BCs fused get final decision. Experimental results real-life Chinese forum data verify robustness compared conventional methods. We also analyze diversity algorith...

参考文章(21)
Michael R. Schmid, Farkhund Iqbal, Benjamin C.M. Fung, E-mail authorship attribution using customized associative classification Digital Investigation. ,vol. 14, ,(2015) , 10.1016/J.DIIN.2015.05.012
John Houvardas, Efstathios Stamatatos, N-Gram Feature Selection for Authorship Identification Artificial Intelligence: Methodology, Systems, and Applications. pp. 77- 86 ,(2006) , 10.1007/11861461_10
Thiago Salles, Leonardo Rocha, Marcos André Gonçalves, Jussara M. Almeida, Fernando Mourão, Wagner Meira, Felipe Viegas, A quantitative analysis of the temporal effects on automatic text classification association for information science and technology. ,vol. 67, pp. 1639- 1667 ,(2016) , 10.1002/ASI.23452
Zhi Liu, Zongkai Yang, Sanya Liu, Yinghui Shi, Semi-random subspace method for writeprint identification Neurocomputing. ,vol. 108, pp. 93- 102 ,(2013) , 10.1016/J.NEUCOM.2012.11.015
Jiexun Li, Rong Zheng, Hsinchun Chen, From fingerprint to writeprint Communications of The ACM. ,vol. 49, pp. 76- 82 ,(2006) , 10.1145/1121949.1121951
EFSTATHIOS STAMATATOS, AUTHORSHIP ATTRIBUTION BASED ON FEATURE SET SUBSPACING ENSEMBLES International Journal on Artificial Intelligence Tools. ,vol. 15, pp. 823- 838 ,(2006) , 10.1142/S0218213006002965
Marcelo Luiz Brocardo, Issa Traore, Isaac Woungang, Authorship verification of e-mail and tweet messages applied for continuous authentication Journal of Computer and System Sciences. ,vol. 81, pp. 1429- 1440 ,(2015) , 10.1016/J.JCSS.2014.12.019
Upul Bandara, Gamini Wijayarathna, Source code author identification with unsupervised feature learning Pattern Recognition Letters. ,vol. 34, pp. 330- 334 ,(2013) , 10.1016/J.PATREC.2012.10.027
Sarwat Nizamani, Nasrullah Memon, CEAI: CCM based Email Authorship Identification Model Egyptian Informatics Journal. ,vol. 14, pp. 239- 249 ,(2013) , 10.1016/J.EIJ.2013.10.001
Ahmed Abbasi, Hsinchun Chen, Writeprints ACM Transactions on Information Systems. ,vol. 26, pp. 1- 29 ,(2008) , 10.1145/1344411.1344413