HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies

作者: Wenwen Dou , Li Yu , Xiaoyu Wang , Zhiqiang Ma , William Ribarsky

DOI: 10.1109/TVCG.2013.162

关键词:

摘要: Analyzing large textual collections has become increasingly challenging given the size of data available and rate that more is being generated. Topic-based text summarization methods coupled with interactive visualizations have presented promising approaches to address challenge analyzing corpora. As corpora vocabulary grow larger, topics need be generated in order capture meaningful latent themes nuances However, it difficult for most current topic-based represent number without cluttered or illegible. To facilitate representation navigation a topics, we propose visual analytics system - HierarchicalTopic (HT). HT integrates computational algorithm, Topic Rose Tree, an interface. The Tree constructs topic hierarchy based on list topics. interface designed present content as well temporal evolution hierarchical fashion. User interactions are provided users make changes their mental model space. qualitatively evaluate HT, case study showcases how HierarchicalTopics aid expert making sense discovering interesting patterns groups. We also conducted user quantitatively effect structure. results reveal leads faster identification relevant solicited feedback during experiments incorporated some suggestions into version HierarchicalTopics.

参考文章(30)
Charles Blundell, Yee Whye Teh, Katherine A. Heller, Discovering Nonbinary Hierarchical Structures with Bayesian Rose Trees In: Mengersen, K and Robert, CP and Titterington, M, (eds.) Mixture: Estimation and Applications. (pp. 161-187). John Wiley & Sons: Chichester, UK. (2011). pp. 161- 187 ,(2011) , 10.1002/9781119995678.CH8
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Dafna Shahaf, Carlos Guestrin, Eric Horvitz, Metro maps of science Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12. pp. 1122- 1130 ,(2012) , 10.1145/2339530.2339706
Arif E. Jinha, Article 50 million: an estimate of the number of scholarly articles in existence Learned Publishing. ,vol. 23, pp. 258- 263 ,(2010) , 10.1087/20100308
Lei Shi, Furu Wei, Shixia Liu, Li Tan, Xiaoxiao Lian, Michelle X. Zhou, Understanding text corpora with multiple facets visual analytics science and technology. pp. 99- 106 ,(2010) , 10.1109/VAST.2010.5652931
Hanseung Lee, Jaeyeon Kihm, Jaegul Choo, John Stasko, Haesun Park, iVisClustering: An Interactive Visual Document Clustering via Topic Modeling Computer Graphics Forum. ,vol. 31, pp. 1155- 1164 ,(2012) , 10.1111/J.1467-8659.2012.03108.X
S. M. LANE, The national science board. Science. ,vol. 231, pp. 103- 103 ,(1986) , 10.1126/SCIENCE.231.4734.103
Junghoon Chae, Dennis Thom, Harald Bosch, Yun Jang, Ross Maciejewski, David S. Ebert, Thomas Ertl, Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition visual analytics science and technology. pp. 143- 152 ,(2012) , 10.1109/VAST.2012.6400557
Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, Alison Smith, Interactive Topic Modeling meeting of the association for computational linguistics. ,vol. 95, pp. 248- 257 ,(2011) , 10.1007/S10994-013-5413-0
Jason Chuang, Daniel Ramage, Christopher Manning, Jeffrey Heer, Interpretation and trust Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12. pp. 443- 452 ,(2012) , 10.1145/2207676.2207738