Comparing Apples to Apple: The Effects of Stemmers on Topic Models

作者: Alexandra Schofield , David Mimno

DOI: 10.1162/TACL_A_00099

关键词:

摘要: Rule-based stemmers such as the Porter stemmer are frequently used to preprocess English corpora for topic modeling. In this work, we train and evaluate models on a variety of using several different stemming algorithms. We examine quantitative measures resulting models, including likelihood, coherence, model stability, entropy. Despite their frequent use in modeling, find that produce no meaningful improvement likelihood coherence fact can degrade stability.

参考文章(29)
Julie Beth Lovins, Development of a Stemming Algorithm Mech. Transl. Comput. Linguistics. ,vol. 11, pp. 22- 31 ,(1968)
Guofei Gu, Zhemin Yang, Yuhong Nan, Shunfan Zhou, Min Yang, XiaoFeng Wang, UIPicker: user-input privacy identification in mobile applications usenix security symposium. pp. 993- 1008 ,(2015)
Sowmya Kamath S., Atif Ahmed, Mani Shankar, A composite classification model for web services based on semantic & syntactic information integration ieee international advance computing conference. pp. 1169- 1173 ,(2015) , 10.1109/IADCC.2015.7154887
Edward Loper, Ewan Klein, Steven Bird, Natural Language Processing with Python ,(2009)
Mark Dredze, Kuzman Ganchev, Small Statistical Models by Random Feature Mixing meeting of the association for computational linguistics. pp. 19- 20 ,(2008)
Ivan Stankov, Diman Todorov, Rossitza Setchi, Enhanced cross-domain document clustering with a semantically enhanced text stemmer SETS International Journal of Knowledge-based and Intelligent Engineering Systems. ,vol. 17, pp. 113- 126 ,(2013) , 10.3233/KES-130267
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Carina Jacobi, Wouter van Atteveldt, Kasper Welbers, Quantitative analysis of large amounts of journalistic texts using topic modelling Digital Journalism. ,vol. 4, pp. 89- 106 ,(2016) , 10.1080/21670811.2015.1093271
Donna Harman, How effective is suffixing Journal of the Association for Information Science and Technology. ,vol. 42, pp. 7- 15 ,(1991) , 10.1002/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P