作者: Alexandra Schofield , David Mimno
DOI: 10.1162/TACL_A_00099
关键词:
摘要: Rule-based stemmers such as the Porter stemmer are frequently used to preprocess English corpora for topic modeling. In this work, we train and evaluate models on a variety of using several different stemming algorithms. We examine quantitative measures resulting models, including likelihood, coherence, model stability, entropy. Despite their frequent use in modeling, find that produce no meaningful improvement likelihood coherence fact can degrade stability.