作者: Jimmy Lin , Rion Snow , William Morgan
关键词:
摘要: We are interested in the problem of tracking broad topics such as "baseball" and "fashion" continuous streams short texts, exemplified by tweets from microblogging service Twitter. The task is conceived a language modeling where per-topic models trained using hashtags tweet stream, which serve proxies for topic labels. Simple perplexity-based classifiers then applied to filter stream interest. Within this framework, we evaluate, both intrinsically extrinsically, smoothing techniques integrating "foreground" (to capture recency) "background" combat sparsity), well different retaining history. Experiments show that unigram smoothed normalized extension stupid backoff simple queue history retention performs on task.