A Scalable Framework to Detect Personal Health Mentions on Twitter

作者: Zhijun Yin , Daniel Fabbri , S Trent Rosenbloom , Bradley Malin

DOI: 10.2196/JMIR.4305

关键词: InfodemiologyMedical Expenditure Panel SurveyClassifier (UML)Self-disclosureData collectionSocial mediaMedicineInternet privacyMedical recordThe Internet

摘要: Background: Biomedical research has traditionally been conducted via surveys and the analysis of medical records. However, these resources are limited in their content, such that non-traditional domains (eg, online forums social media) have an opportunity to supplement view individual’s health. Objective: The objective this study was develop a scalable framework detect personal health status mentions on Twitter assess extent which information is disclosed. Methods: We collected more than 250 million tweets streaming API over 2-month period 2014. corpus filtered down approximately 250,000 tweets, stratified across 34 high-impact issues, based guidance from Medical Expenditure Panel Survey. created labeled several thousand survey, administered Amazon Mechanical Turk, documents when terms correspond issues or alternative metaphor). engineered classifier for feature selection assessed its potential issues. further investigated utility by determining users disclose status. Results: Our investigation yielded notable findings. First, we find small subset can train mentions. Specifically, training 2000 four (cancer, depression, hypertension, leukemia) with precision 0.77 all Second, disclosed Notably, 50% time 11 out (33%) Third, disclosure rate dependent issue statistically significant manner ( P <.001). For instance, 80% about migraines (83/100) allergies (85/100) communicated status, while only around 10% obesity (13/100) heart attack (12/100) did so. Fourth, likelihood people own versus other people’s as well example, 69% (69/100) insomnia author’s 1% (1/100) another person’s By contrast, Down syndrome 21% (21/100) Conclusions: It possible automatically manner. These themselves, but also individuals. Though not investigate veracity statements, anticipate may be useful supplementing traditional health-related sources purposes. [J Med Internet Res 2015;17(6):e138]

参考文章(38)
Nilam Ram, Conrad S. Tucker, Victoria C. Barclay, Marcel Salathé, Todd Bodnar, On the ground validation of online diagnosis with Twitter and medical records the web conference. pp. 651- 656 ,(2014) , 10.1145/2567948.2579272
Kumanan Wilson, Jennifer Keelan, Social media and the empowering of opponents of medical technologies: the case of anti-vaccinationism. Journal of Medical Internet Research. ,vol. 15, ,(2013) , 10.2196/JMIR.2409
Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil, People on drugs: credibility of user statements in health communities knowledge discovery and data mining. pp. 65- 74 ,(2014) , 10.1145/2623330.2623714
Nilanjan Banerjee, Dipanjan Chakraborty, Koustuv Dasgupta, Sumit Mittal, Anupam Joshi, Seema Nagar, Angshu Rai, Sameer Madan, User interests in social media sites Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09. pp. 1823- 1826 ,(2009) , 10.1145/1645953.1646240
Carl L Hanson, Scott H Burton, Christophe Giraud-Carrier, Josh H West, Michael D Barnes, Bret Hansen, Tweaking and Tweeting: Exploring Twitter for Nonmedical Use of a Psychostimulant Drug (Adderall) Among College Students Journal of Medical Internet Research. ,vol. 15, ,(2013) , 10.2196/JMIR.2503
Jesse Davis, Mark Goadrich, The relationship between Precision-Recall and ROC curves Proceedings of the 23rd international conference on Machine learning - ICML '06. ,vol. 148, pp. 233- 240 ,(2006) , 10.1145/1143844.1143874
P. Coorevits, M. Sundgren, G. O. Klein, A. Bahr, B. Claerhout, C. Daniel, M. Dugas, D. Dupont, A. Schmidt, P. Singleton, G. De Moor, D. Kalra, Electronic health records: new opportunities for clinical research Journal of Internal Medicine. ,vol. 274, pp. 547- 560 ,(2013) , 10.1111/JOIM.12119
A M Garratt, D A Ruta, M I Abdalla, J K Buckingham, I T Russell, The SF36 health survey questionnaire: an outcome measure suitable for routine use within the NHS? BMJ. ,vol. 306, pp. 1440- 1444 ,(1993) , 10.1136/BMJ.306.6890.1440
Deborah Estrin, Small data, where n = me Communications of the ACM. ,vol. 57, pp. 32- 34 ,(2014) , 10.1145/2580944