摘要: List of Contributors. Preface. PART I TEXT EXTRACTION, CLASSIFICATION, ANDCLUSTERING. 1 Automatic keyword extraction from individualdocuments. 1.1 Introduction. 1.2 Rapid automatic extraction. 1.3 Benchmark evaluation. 1.4 Stoplist generation. 1.5 Evaluation on news articles. 1.6 Summary. 1.7 Acknowledgements. 2 Algebraic techniques for multilingual documentclustering. 2.1 2.2 Background. 2.3 Experimental setup. 2.4 Multilingual LSA. 2.5 Tucker1 method. 2.6 PARAFAC2 2.7 LSA with term alignments. 2.8 Latent morpho-semantic analysis (LMSA). 2.9 LMSA 2.10 Discussion results and techniques. 2.11 3 Content-based spam email classification usingmachine-learning algorithms. 3.1 3.2 Machine-learning 3.3 Data preprocessing. 3.4 classification. 3.5 Experiments. 3.6 Characteristics classifiers. 3.7 Concluding remarks. 3.8 4 Utilizing nonnegative matrix factorization emailclassification problems. 4.1 4.2 4.3 NMF initialization based feature ranking. 4.4 NMF-based methods. 4.5 Conclusions. 4.6 5 Constrained clustering k-means typealgorithms. 5.1 5.2 Notations classical k-means. 5.3 Bregman divergences. 5.4 smoka type clustering. 5.5 spherical 5.6 Numerical experiments. 5.7 Conclusion. II ANOMALY AND TREND DETECTION. 6 Survey text visualization 6.1 Visualization in analysis. 6.2 Tag clouds. 6.3 Authorship change tracking. 6.4 exploration the search novel patterns. 6.5 Sentiment 6.6 Visual analytics FutureLens. 6.7 Scenario discovery. 6.8 Earlier prototype. 6.9 Features 6.10 discovery example: bioterrorism. 6.11 drug trafficking. 6.12 Future work. 7 Adaptive threshold setting novelty mining. 7.1 7.2 7.3 study. 7.4 8 Text mining cybercrime. 8.1 8.2 Current research Internet predation andcyberbullying. 8.3 Commercial software monitoring chat. 8.4 Conclusions future directions. 8.5 III STREAMS. 9 Events trends streams. 9.1 9.2 9.3 Feature data reduction. 9.4 Event detection. 9.5 Trend 9.6 trend descriptions. 9.7 Discussion. 9.8 9.9 10 Embedding semantics LDA topic models. 10.1 10.2 10.3 Dirichlet allocation. 10.4 external Wikipedia. 10.5 Data-driven semantic embedding. 10.6 Related 10.7 Conclusion References. Index.