An unsupervised machine learning model for discovering latent infectious diseases using social media data

作者: Sunghoon Lim , Conrad S. Tucker , Soundar Kumara

DOI: 10.1016/J.JBI.2016.12.007

关键词:

摘要: Abstract Introduction The authors of this work propose an unsupervised machine learning model that has the ability to identify real-world latent infectious diseases by mining social media data. In study, a disease is defined as communicable not yet been formalized national public health institutes and explicitly communicated general public. Most existing approaches modeling infectious-disease-related knowledge discovery through networks are top-down based on already known information, such names their symptoms. approaches, necessary but unknown symptoms, mostly unidentified in data until have disease. formalizing processes for time consuming. Therefore, study presents bottom-up approach given location without prior related Methods Social messages with user temporal information extracted during preprocessing stage. An sentiment analysis then presented. Users’ expressions about body parts, pain locations also identified from Then, symptom weighting vectors each individual period created, expressions. Finally, latent-infectious-disease-related retrieved individuals’ vectors. Datasets results Twitter August 2012 May 2013 used validate study. Real electronic medical records 104 individuals, who were diagnosed influenza same period, serve ground truth validation. promising, highest precision, recall, F1 score values 0.773, 0.680, 0.724, respectively. Conclusion This uses diseases, quicker than when disease(s) institutes. particular, using user, textual, data, along analysis, identifies location.

参考文章(81)
Melody Y. Kiang, Xiaodi Wang, Ming Yang, Identification of Consumer Adverse Drug Reaction Messages on Social Media. pacific asia conference on information systems. pp. 193- ,(2013)
Naftali Tishby, Noam Slonim, The Power of Word Clusters for Text Classification ,(2006)
Eiji Aramaki, Sachiko Maskawa, Mizuki Morita, Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter empirical methods in natural language processing. pp. 1568- 1576 ,(2011)
Nilam Ram, Conrad S. Tucker, Victoria C. Barclay, Marcel Salathé, Todd Bodnar, On the ground validation of online diagnosis with Twitter and medical records the web conference. pp. 651- 656 ,(2014) , 10.1145/2567948.2579272
André V. Carreiro, Pedro M.T. Amaral, Susana Pinto, Pedro Tomás, Mamede de Carvalho, Sara C. Madeira, Prognostic models based on patient snapshots and time windows Journal of Biomedical Informatics. ,vol. 58, pp. 133- 144 ,(2015) , 10.1016/J.JBI.2015.09.021
Ahmed Abdeen Hamed, Xindong Wu, Robert Erickson, Tamer Fandy, Twitter K-H networks in action Journal of Biomedical Informatics. ,vol. 56, pp. 157- 168 ,(2015) , 10.1016/J.JBI.2015.05.015
Beatriz Pontes, Raúl Giráldez, Jesús S. Aguilar-Ruiz, Biclustering on expression data Journal of Biomedical Informatics. ,vol. 57, pp. 163- 180 ,(2015) , 10.1016/J.JBI.2015.06.028
David Ahn, Brendan O'Connor, Michel Krieger, TweetMotif: Exploratory Search and Topic Summarization for Twitter international conference on weblogs and social media. ,(2010)
Rainer Winnenburg, Alfred Sorbello, Anna Ripple, Rave Harpaz, Joseph Tonning, Ana Szarfman, Henry Francis, Olivier Bodenreider, Leveraging MEDLINE indexing for pharmacovigilance - Inherent limitations and mitigation strategies Journal of Biomedical Informatics. ,vol. 57, pp. 425- 435 ,(2015) , 10.1016/J.JBI.2015.08.022
Kevin Y. Yip, David W. Cheung, Michael K. Ng, Kei-Hoi Cheung, Identifying projected clusters from gene expression profiles bioinformatics and bioengineering. ,vol. 37, pp. 345- 357 ,(2004) , 10.1016/J.JBI.2004.05.002