iVisClustering: An Interactive Visual Document Clustering via Topic Modeling

作者: Hanseung Lee , Jaeyeon Kihm , Jaegul Choo , John Stasko , Haesun Park

DOI: 10.1111/J.1467-8659.2012.03108.X

关键词:

摘要: Clustering plays an important role in many large-scale data analyses providing users with overall understanding of their data. Nonetheless, clustering is not easy task due to noisy features and outliers existing the data, thus results obtained from automatic algorithms often do make clear sense. To remedy this problem, should be complemented interactive visualization strategies. This paper proposes visual analytics system for document clustering, called iVisClustering, based on a widely-used topic modeling method, latent Dirichlet allocation (LDA). iVisClustering provides summary each cluster terms its most representative keywords visualizes soft parallel coordinates. The main view 2D plot that similarities relation among items graph-based representation. several other views, which contain useful interaction methods. With help these modules, we can interactively refine various ways. Keywords adjusted so they characterize better. In addition, our filter out re-cluster accordingly. Cluster hierarchy constructed using tree structure purpose, supports cluster-level interactions such as sub-clustering, removing unimportant clusters, merging clusters have similar meanings, moving certain any node structure. Furthermore, document-level mis-clustered documents another useless documents. Finally, present how performed via by real-world sets. © 2012 Wiley Periodicals, Inc.

参考文章(32)
H. W. Kuhn, The Hungarian method for the assignment problem Naval Research Logistics Quarterly. ,vol. 2, pp. 83- 97 ,(1955) , 10.1002/NAV.3800020109
Kevin Bache, Moshe Lichman, UCI Machine Learning Repository University of California, School of Information and Computer Science. ,(2007)
David Knoke, Song Yang, Social Network Analysis ,(2020)
H. L. Le Roy, L. Lecam, J. Neyman, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV Revue de l'Institut International de Statistique / Review of the International Statistical Institute. ,vol. 37, pp. 230- ,(1969) , 10.2307/1402306
Christopher M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics) Springer-Verlag New York, Inc.. ,(2006)
Steven M. Drucker, Danyel Fisher, Sumit Basu, Helping users sort faster with adaptive machine learning recommendations international conference on human computer interaction. pp. 187- 203 ,(2011) , 10.1007/978-3-642-23765-2_13
Christopher M. Bishop, Pattern Recognition and Machine Learning ,(2006)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Lei Shi, Furu Wei, Shixia Liu, Li Tan, Xiaoxiao Lian, Michelle X. Zhou, Understanding text corpora with multiple facets visual analytics science and technology. pp. 99- 106 ,(2010) , 10.1109/VAST.2010.5652931