作者: Zhijun Yin , Daniel Fabbri , S Trent Rosenbloom , Bradley Malin
DOI: 10.2196/JMIR.4305
关键词: Infodemiology 、 Medical Expenditure Panel Survey 、 Classifier (UML) 、 Self-disclosure 、 Data collection 、 Social media 、 Medicine 、 Internet privacy 、 Medical record 、 The Internet
摘要: Background: Biomedical research has traditionally been conducted via surveys and the analysis of medical records. However, these resources are limited in their content, such that non-traditional domains (eg, online forums social media) have an opportunity to supplement view individual’s health. Objective: The objective this study was develop a scalable framework detect personal health status mentions on Twitter assess extent which information is disclosed. Methods: We collected more than 250 million tweets streaming API over 2-month period 2014. corpus filtered down approximately 250,000 tweets, stratified across 34 high-impact issues, based guidance from Medical Expenditure Panel Survey. created labeled several thousand survey, administered Amazon Mechanical Turk, documents when terms correspond issues or alternative metaphor). engineered classifier for feature selection assessed its potential issues. further investigated utility by determining users disclose status. Results: Our investigation yielded notable findings. First, we find small subset can train mentions. Specifically, training 2000 four (cancer, depression, hypertension, leukemia) with precision 0.77 all Second, disclosed Notably, 50% time 11 out (33%) Third, disclosure rate dependent issue statistically significant manner ( P <.001). For instance, 80% about migraines (83/100) allergies (85/100) communicated status, while only around 10% obesity (13/100) heart attack (12/100) did so. Fourth, likelihood people own versus other people’s as well example, 69% (69/100) insomnia author’s 1% (1/100) another person’s By contrast, Down syndrome 21% (21/100) Conclusions: It possible automatically manner. These themselves, but also individuals. Though not investigate veracity statements, anticipate may be useful supplementing traditional health-related sources purposes. [J Med Internet Res 2015;17(6):e138]