作者: David A. Broniatowski , Michael J. Paul , Mark Dredze
DOI: 10.1371/JOURNAL.PONE.0083672
关键词:
摘要: Social media have been proposed as a data source for influenza surveillance because they the potential to offer real-time access millions of short, geographically localized messages containing information regarding personal well-being. However, accuracy social systems declines with attention increases “chatter” – that are about but do not pertain an actual infection masking signs true prevalence. This paper summarizes our recently developed detection algorithm automatically distinguishes relevant tweets from other chatter, and we describe current system which was actively deployed during full 2012-2013 season. Our objective analyze performance this most recent 2012–2013 season at multiple levels geographic granularity, unlike past studies focused on national or regional surveillance. system’s prevalence estimates were strongly correlated Centers Disease Control Prevention United States (r = 0.93, p < 0.001) well Department Health Mental Hygiene New York City 0.88, 0.001). detected weekly change in direction (increasing decreasing) 85% accuracy, nearly twofold increase over simpler model, demonstrating utility explicitly distinguishing chatter.