Automatic categorization of web pages and user clustering with mixtures of hidden Markov models

作者: Alexander Ypma , Tom Heskes

DOI: 10.1007/978-3-540-39663-5_3

关键词:

摘要: We propose mixtures of hidden Markov models for modelling clickstreams web surfers. Hence, the page categorization is learned from data without need a (possibly cumbersome) manual categorization. provide an EM algorithm training mixture HMMs and show that additional static user can be incorporated easily to possibly enhance labelling users. Furthermore, we use prior knowledge generalization avoid numerical problems. parameter tying decrease danger overfitting reduce computational overhead. put flat on parameters deal with problem certain transitions between categories occur very seldom or not at all, in order ensure nonzero transition probability these nonetheless remains. In applications artificial real-world logs demonstrate usefulness our approach. train navigation patterns, correct model being learned. Moreover, 'satellite data' may labeling shorter patterns. When applying large Dutch commercial site, sensible categorizations are

参考文章(14)
Robert Walker Cooley, Jaideep Srivastava, Web usage mining: discovery and application of interesting patterns from web data University of Minnesota. ,(2000)
Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, Lawrence K Saul, None, An introduction to variational methods for graphical models Machine Learning. ,vol. 37, pp. 105- 161 ,(1999) , 10.1023/A:1007665907178
Robert Cooley, Pang-Ning Tan, Jaideep Srivastava, Discovery of Interesting Usage Patterns from Web Data Web Usage Analysis and User Profiling. pp. 163- 182 ,(2000) , 10.1007/3-540-44934-5_10
Ramesh R. Sarukkai, Link prediction and path analysis using Markov chains the web conference. ,vol. 33, pp. 377- 386 ,(2000) , 10.1016/S1389-1286(00)00044-X
Jon M. Kleinberg, Authoritative sources in a hyperlinked environment symposium on discrete algorithms. pp. 668- 677 ,(1998) , 10.5555/314613.315045
Igor V. Cadez, Scott Gaffney, Padhraic Smyth, A general probabilistic framework for clustering individuals and objects knowledge discovery and data mining. pp. 140- 149 ,(2000) , 10.1145/347090.347119
Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning Tan, Web usage mining ACM SIGKDD Explorations Newsletter. ,vol. 1, pp. 12- 23 ,(2000) , 10.1145/846183.846188
Mark Levene, George Loizou, Computing the entropy of user navigation in the web International Journal of Information Technology and Decision Making. ,vol. 02, pp. 459- 476 ,(2003) , 10.1142/S0219622003000768
Igor Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White, Visualization of navigation patterns on a Web site using model-based clustering knowledge discovery and data mining. pp. 280- 284 ,(2000) , 10.1145/347090.347151
Bernardo A Huberman, Peter LT Pirolli, James E Pitkow, Rajan M Lukose, Strong Regularities in World Wide Web Surfing Science. ,vol. 280, pp. 95- 97 ,(1998) , 10.1126/SCIENCE.280.5360.95