作者: Sunghoon Lim , Conrad S. Tucker , Soundar Kumara
DOI: 10.1016/J.JBI.2016.12.007
关键词:
摘要: Abstract Introduction The authors of this work propose an unsupervised machine learning model that has the ability to identify real-world latent infectious diseases by mining social media data. In study, a disease is defined as communicable not yet been formalized national public health institutes and explicitly communicated general public. Most existing approaches modeling infectious-disease-related knowledge discovery through networks are top-down based on already known information, such names their symptoms. approaches, necessary but unknown symptoms, mostly unidentified in data until have disease. formalizing processes for time consuming. Therefore, study presents bottom-up approach given location without prior related Methods Social messages with user temporal information extracted during preprocessing stage. An sentiment analysis then presented. Users’ expressions about body parts, pain locations also identified from Then, symptom weighting vectors each individual period created, expressions. Finally, latent-infectious-disease-related retrieved individuals’ vectors. Datasets results Twitter August 2012 May 2013 used validate study. Real electronic medical records 104 individuals, who were diagnosed influenza same period, serve ground truth validation. promising, highest precision, recall, F1 score values 0.773, 0.680, 0.724, respectively. Conclusion This uses diseases, quicker than when disease(s) institutes. particular, using user, textual, data, along analysis, identifies location.