Customer Activity Sequence Classification for Debt Prevention in Social Security

作者: Huaifeng Zhang , Yanchang Zhao , Longbing Cao , Chengqi Zhang , Hans Bohlscheid

DOI: 10.1007/S11390-009-9288-2

关键词:

摘要: From a data mining perspective, sequence classification is to build classifier using frequent sequential patterns. However, for complete set of patterns on large dataset can be extremely time-consuming and the number discovered also makes pattern selection building very time-consuming. The fact that, in classification, it much more important discover discriminative than set. In this paper, we propose novel hierarchical algorithm classifiers Firstly, mine which are most strongly correlated each target class. step, an aggressive strategy employed select small Secondly, pruning serial coverage test done mined that pass used sub-classifier at first level final classifier. And thirdly, training samples cannot covered fed back stage with updated parameters. This process continues until predefined interestingness measure thresholds reached, or all covered. generated loop form Within framework, searching space reduced dramatically while good performance achieved. proposed tested real-world business application debt prevention social security area. shows effectiveness efficiency predicting occurrences based customer activity data.

参考文章(29)
Vincent Shin-Mu Tseng, Chao-Hui Lee, CBS: A New Classification Method by Using Sequential Patterns. siam international conference on data mining. pp. 596- 600 ,(2005)
Text classification using string kernels Journal of Machine Learning Research. ,vol. 2, pp. 419- 444 ,(2002) , 10.1162/153244302760200687
Zhengzheng Xing, Jian Pei, Guozhu Dong, Philip S. Yu, Mining Sequence Classifiers for Early Prediction siam international conference on data mining. pp. 644- 655 ,(2008) , 10.1137/1.9781611972788.59
Yaser Sheikh, Mubarak Shah, Asaad Hakeem, CASE E : a hierarchical event representation for the analysis of videos national conference on artificial intelligence. pp. 263- 268 ,(2004)
S. Sonnenburg, G. Rätsch, C. Schäfer, Learning interpretable SVMs for biological sequence classification research in computational molecular biology. pp. 389- 407 ,(2005) , 10.1007/11415770_30
Mohammed J. Zaki, SPADE: An Efficient Algorithm for Mining Frequent Sequences Machine Learning. ,vol. 42, pp. 31- 60 ,(2001) , 10.1023/A:1007652502315
Jiawei Han, Ramin Afshar, Xifeng Yan, CloSpan: Mining Closed Sequential Patterns in Large Databases. siam international conference on data mining. pp. 166- 177 ,(2003)
Cathy Wu, Michael Berry, Sailaja Shivakumar, Jerry McLarty, Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition Machine Learning. ,vol. 21, pp. 177- 193 ,(1995) , 10.1007/BF00993384
Themis P. Exarchos, Markos G. Tsipouras, Costas Papaloukas, Dimitrios I. Fotiadis, A two-stage methodology for sequence classification based on sequential pattern mining and optimization data and knowledge engineering. ,vol. 66, pp. 467- 487 ,(2008) , 10.1016/J.DATAK.2008.05.007
L. Douglas Baker, Andrew Kachites McCallum, Distributional clustering of words for text classification international acm sigir conference on research and development in information retrieval. pp. 96- 103 ,(1998) , 10.1145/290941.290970