摘要: Data mining social media has become a valuable resource for infectious disease surveillance. However, there are considerable risks associated with incorrectly predicting an epidemic. The large amount of data combined the small ground truth and general dynamics diseases present unique challenges when evaluating model performance. In this paper, we look at several methods that have been used to assess influenza prevalence using Twitter. We then validate them tests designed avoid illustrate issues standard k-fold cross validation method. also find modifications way partitioned can major effects on model's reported