Comparing Apples to Apple: The Effects of Stemmers on Topic Models

关键词:

摘要: Rule-based stemmers such as the Porter stemmer are frequently used to preprocess English corpora for topic modeling. In this work, we train and evaluate models on a variety of using several different stemming algorithms. We examine quantitative measures resulting models, including likelihood, coherence, model stability, entropy. Despite their frequent use in modeling, find that produce no meaningful improvement likelihood coherence fact can degrade stability.

mit.edu 本地加速

mitpressjournals.org 本地加速

transacl.org PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(29)

Julie Beth Lovins, Development of a Stemming Algorithm Mech. Transl. Comput. Linguistics. ,vol. 11, pp. 22- 31 ,(1968)

Siaw Ling Lo, David Cornforth, Raymond Chiong, Effects of Training Datasets on Both the Extreme Learning Machine and Support Vector Machine for Target Audience Identification on Twitter Springer, Cham. ,vol. 1, pp. 417- 434 ,(2015) , 10.1007/978-3-319-14063-6_35

Guofei Gu, Zhemin Yang, Yuhong Nan, Shunfan Zhou, Min Yang, XiaoFeng Wang, UIPicker: user-input privacy identification in mobile applications usenix security symposium. pp. 993- 1008 ,(2015)

Sowmya Kamath S., Atif Ahmed, Mani Shankar, A composite classification model for web services based on semantic & syntactic information integration ieee international advance computing conference. pp. 1169- 1173 ,(2015) , 10.1109/IADCC.2015.7154887

Edward Loper, Ewan Klein, Steven Bird, Natural Language Processing with Python ,(2009)

Mark Dredze, Kuzman Ganchev, Small Statistical Models by Random Feature Mixing meeting of the association for computational linguistics. pp. 19- 20 ,(2008)

Ivan Stankov, Diman Todorov, Rossitza Setchi, Enhanced cross-domain document clustering with a semantically enhanced text stemmer SETS International Journal of Knowledge-based and Intelligent Engineering Systems. ,vol. 17, pp. 113- 126 ,(2013) , 10.3233/KES-130267

David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937

Carina Jacobi, Wouter van Atteveldt, Kasper Welbers, Quantitative analysis of large amounts of journalistic texts using topic modelling Digital Journalism. ,vol. 4, pp. 89- 106 ,(2016) , 10.1080/21670811.2015.1093271

10.

Donna Harman, How effective is suffixing Journal of the Association for Information Science and Technology. ,vol. 42, pp. 7- 15 ,(1991) , 10.1002/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P

Comparing Apples to Apple: The Effects of Stemmers on Topic Models

来源期刊

我的账户

Comparing Apples to Apple: The Effects of Stemmers on Topic Models

来源期刊

相似文章 10

我的账户