作者: Zhi Liu , Sanya Liu , Lin Liu , Meng Wang , Jianwen Sun
DOI: 10.1080/1206212X.2016.1160643
关键词:
摘要: AbstractThe auto authorship recognition has become a novel technique to investigate cybercrimes. But the challenge of research is that huge number features exist in moderate-sized corpus, which causes curse over-training. Besides, it hard distinguish between potential authors only by single feature set. In this paper, we proposed random sampling style ensemble method with individual-author selection exploit high-dimensional space. The randomly picks writing-style on each set (IAFS) partitioned from whole IAFSs are heuristically selected training author. Then, multiple base classifiers (BCs) formed sampled sets. Finally, all BCs fused get final decision. Experimental results real-life Chinese forum data verify robustness compared conventional methods. We also analyze diversity algorith...