Text mining : applications and theory

作者: Jacob Kogan , Michael W. Berry

DOI:

关键词:

摘要: List of Contributors. Preface. PART I TEXT EXTRACTION, CLASSIFICATION, ANDCLUSTERING. 1 Automatic keyword extraction from individualdocuments. 1.1 Introduction. 1.2 Rapid automatic extraction. 1.3 Benchmark evaluation. 1.4 Stoplist generation. 1.5 Evaluation on news articles. 1.6 Summary. 1.7 Acknowledgements. 2 Algebraic techniques for multilingual documentclustering. 2.1 2.2 Background. 2.3 Experimental setup. 2.4 Multilingual LSA. 2.5 Tucker1 method. 2.6 PARAFAC2 2.7 LSA with term alignments. 2.8 Latent morpho-semantic analysis (LMSA). 2.9 LMSA 2.10 Discussion results and techniques. 2.11 3 Content-based spam email classification usingmachine-learning algorithms. 3.1 3.2 Machine-learning 3.3 Data preprocessing. 3.4 classification. 3.5 Experiments. 3.6 Characteristics classifiers. 3.7 Concluding remarks. 3.8 4 Utilizing nonnegative matrix factorization emailclassification problems. 4.1 4.2 4.3 NMF initialization based feature ranking. 4.4 NMF-based methods. 4.5 Conclusions. 4.6 5 Constrained clustering k-means typealgorithms. 5.1 5.2 Notations classical k-means. 5.3 Bregman divergences. 5.4 smoka type clustering. 5.5 spherical 5.6 Numerical experiments. 5.7 Conclusion. II ANOMALY AND TREND DETECTION. 6 Survey text visualization 6.1 Visualization in analysis. 6.2 Tag clouds. 6.3 Authorship change tracking. 6.4 exploration the search novel patterns. 6.5 Sentiment 6.6 Visual analytics FutureLens. 6.7 Scenario discovery. 6.8 Earlier prototype. 6.9 Features 6.10 discovery example: bioterrorism. 6.11 drug trafficking. 6.12 Future work. 7 Adaptive threshold setting novelty mining. 7.1 7.2 7.3 study. 7.4 8 Text mining cybercrime. 8.1 8.2 Current research Internet predation andcyberbullying. 8.3 Commercial software monitoring chat. 8.4 Conclusions future directions. 8.5 III STREAMS. 9 Events trends streams. 9.1 9.2 9.3 Feature data reduction. 9.4 Event detection. 9.5 Trend 9.6 trend descriptions. 9.7 Discussion. 9.8 9.9 10 Embedding semantics LDA topic models. 10.1 10.2 10.3 Dirichlet allocation. 10.4 external Wikipedia. 10.5 Data-driven semantic embedding. 10.6 Related 10.7 Conclusion References. Index.

参考文章(24)
Nello Cristianini, John Shawe-Taylor, Huma Lodhi, Latent Semantic Kernels international conference on machine learning. ,vol. 18, pp. 127- 152 ,(2001) , 10.1023/A:1013625426931
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Roger E. Story, An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model Information Processing and Management. ,vol. 32, pp. 329- 344 ,(1996) , 10.1016/0306-4573(95)00055-0
T. L. Griffiths, M. Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5228- 5235 ,(2004) , 10.1073/PNAS.0307752101
Mark Pendergast, Marilyn Tremaine, Carla Simone, Kjeld Schmidt, Proceedings of the 2003 international ACM SIGGROUP conference on Supporting group work international conference on supporting group work. ,(2003)
Xing Wei, W. Bruce Croft, LDA-based document models for ad-hoc retrieval Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 178- 185 ,(2006) , 10.1145/1148170.1148204
Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, Santosh Vempala, Latent semantic indexing: a probabilistic analysis symposium on principles of database systems. pp. 159- 168 ,(1998) , 10.1145/275487.275505
Thomas Minka, John Lafferty, Expectation-propagation for the generative aspect model uncertainty in artificial intelligence. pp. 352- 359 ,(2002)