Deep learning for pollen allergy surveillance from twitter in Australia

作者: Jia Rong , Sandra Michalska , Sudha Subramani , Jiahua Du , Hua Wang

DOI: 10.1186/S12911-019-0921-X

关键词:

摘要: Background The paper introduces a deep learning-based approach for real-time detection and insights generation about one of the most prevalent chronic conditions in Australia - Pollen allergy. The popular social media platform is used for data collection as cost-effective and unobtrusive alternative for public health monitoring to complement the traditional survey-based approaches. Methods The data was extracted from Twitter based on pre-defined keywords (i.e. ’hayfever’ OR ’hay fever’) throughout the period of 6 months, covering the high pollen season in Australia. The following deep learning architectures were adopted in the experiments: CNN, RNN, LSTM and GRU. Both default (GloVe) and domain-specific (HF) word embeddings were used in training the classifiers. Standard evaluation metrics (i.e. Accuracy, Precision and Recall) were calculated for the results validation. Finally, visual correlation with weather variables was performed. Results The neural networks-based approach was able to correctly identify the implicit mentions of the symptoms and treatments, even unseen previously (accuracy up to 87.9% for GRU with GloVe embeddings of 300 dimensions). Conclusions The system addresses the shortcomings of the conventional machine learning techniques with manual feature-engineering that prove limiting when exposed to a wide range of non-standard expressions relating to medical concepts. The case-study presented demonstrates an application of ’black-box’ approach to the real-world problem, along with its internal workings demonstration towards more transparent, interpretable and reproducible decision-making in health informatics domain.

参考文章(37)
Aron Culotta, Towards detecting influenza epidemics by analyzing Twitter messages Proceedings of the First Workshop on Social Media Analytics - SOMA '10. pp. 115- 122 ,(2010) , 10.1145/1964858.1964874
Kathryn P. Davison, James W. Pennebaker, Sally S. Dickerson, Who talks? The social psychology of illness support groups. American Psychologist. ,vol. 55, pp. 205- 217 ,(2000) , 10.1037/0003-066X.55.2.205
Daniel Scanfeld, Vanessa Scanfeld, Elaine L. Larson, Dissemination of health information through social networks: Twitter and antibiotics American Journal of Infection Control. ,vol. 38, pp. 182- 188 ,(2010) , 10.1016/J.AJIC.2009.11.004
Abeed Sarker, Graciela Gonzalez, Portable automatic text classification for adverse drug reaction detection via multi-corpus training Journal of Biomedical Informatics. ,vol. 53, pp. 196- 207 ,(2015) , 10.1016/J.JBI.2014.11.002
Paola Velardi, Giovanni Stilo, Alberto E. Tozzi, Francesco Gesualdo, Twitter mining for fine-grained syndromic surveillance Artificial Intelligence in Medicine. ,vol. 61, pp. 153- 163 ,(2014) , 10.1016/J.ARTMED.2014.01.002
Laura Wojtulewicz, Robert Leaman, Graciela Gonzalez, Ryan Sullivan, Annie Skariah, Jian Yang, Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks meeting of the association for computational linguistics. pp. 117- 125 ,(2010)
C. Goller, A. Kuchler, Learning task-dependent distributed representations by backpropagation through structure Proceedings of International Conference on Neural Networks (ICNN'96). ,vol. 1, pp. 347- 352 ,(1996) , 10.1109/ICNN.1996.548916
Felix A Gers, Jürgen Schmidhuber, Fred Cummins, Learning to forget: continual prediction with LSTM 9th International Conference on Artificial Neural Networks: ICANN '99. ,vol. 2, pp. 850- 855 ,(1999) , 10.1049/CP:19991218
L. Ziska, K. Knowlton, C. Rogers, D. Dalan, N. Tierney, M. A. Elder, W. Filley, J. Shropshire, L. B. Ford, C. Hedberg, P. Fleetwood, K. T. Hovanky, T. Kavanaugh, G. Fulford, R. F. Vrtis, J. A. Patz, J. Portnoy, F. Coates, L. Bielory, D. Frenz, Recent warming by latitude associated with increased length of ragweed pollen season in central North America Proceedings of the National Academy of Sciences of the United States of America. ,vol. 108, pp. 4248- 4251 ,(2011) , 10.1073/PNAS.1014107108
Jean Carletta, Assessing agreement on classification tasks: the kappa statistic Computational Linguistics. ,vol. 22, pp. 249- 254 ,(1996)