DEVELOPMENT AND PERFORMANCE OF TEXT-MINING ALGORITHMS TO EXTRACT SOCIOECONOMIC STATUS FROM DE-IDENTIFIED ELECTRONIC HEALTH RECORDS.

作者: BRITTANY M. HOLLISTER , NICOLE A. RESTREPO , ERIC FARBER-EGER , DANA C. CRAWFORD , MELINDA C. ALDRICH

DOI: 10.1142/9789813207813_0023

关键词: PsychologyMedicaidSocial classMEDLINEBiobankData extractionSocial environmentAlgorithmSocioeconomic statusBiorepository

摘要: Socioeconomic status (SES) is a fundamental contributor to health, and key factor underlying racial disparities in disease. However, SES data are rarely included genetic studies due part the difficultly of collecting these when were not originally designed for that purpose. The emergence large clinic-based biobanks linked electronic health records (EHRs) provides research access patient populations with longitudinal phenotype captured structured fields as billing codes, procedure prescriptions. however, often explicitly recorded fields, but rather free text clinical notes communications. content completeness vary widely by practitioner. To enable gene-environment consider an exposure, we sought extract variables from racial/ethnic minority adult patients (n=9,977) BioVU, Vanderbilt University Medical Center biorepository de-identified EHRs. We developed several measures using information available within EHR, including broad categories occupation, education, insurance status, homelessness. Two hundred randomly selected manual review develop set seven algorithms extracting consist 15 information, 830 unique search terms. extracted 50 compared produced algorithm, resulting positive predictive values 80.0% (education), 85.4% (occupation), 87.5% (unemployment), 63.6% (retirement), 23.1% (uninsured), 81.8% (Medicaid), 33.3% (homelessness), suggesting some easier this EHR than others. extraction approach here will future EHR-based integrate into statistical analyses. Ultimately, incorporation help elucidate impact social environment on disease risk outcomes.

参考文章(20)
Annette Prüss-Üstün, Diarmid Campbell-Lendrum, Tony Blakely, Simon Hales, Alistair Woodward, Carlos Corvalán, Assessing the distribution of health risks by socioeconomic position at national and local levels ,(2004)
Melinda C. Aldrich, Steve Selvin, Margaret R. Wrensch, Jennette D. Sison, Helen M. Hansen, Charles P. Quesenberry, Michael F. Seldin, Lisa F. Barcellos, Patricia A. Buffler, John K. Wiencke, Socioeconomic Status and Lung Cancer: Unraveling the Contribution of Genetic Admixture American Journal of Public Health. ,vol. 103, pp. 73- 80 ,(2013) , 10.2105/AJPH.2013.301370
DM Roden, JM Pulley, MA Basford, GR Bernard, EW Clayton, JR Balser, DR Masys, Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clinical Pharmacology & Therapeutics. ,vol. 84, pp. 362- 369 ,(2008) , 10.1038/CLPT.2008.89
Logan Dumitrescu, Marylyn D Ritchie, Kristin Brown-Gentry, Jill M Pulley, Melissa Basford, Joshua C Denny, Jorge R Oksenberg, Dan M Roden, Jonathan L Haines, Dana C Crawford, Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records Genetics in Medicine. ,vol. 12, pp. 648- 650 ,(2010) , 10.1097/GIM.0B013E3181EFE2DF
Amy L. Non, Clarence C. Gravlee, Connie J. Mulligan, Education, genetic ancestry, and blood pressure in African Americans and Whites. American Journal of Public Health. ,vol. 102, pp. 1559- 1565 ,(2012) , 10.2105/AJPH.2011.300448
Isaac S. Kohane, Using electronic health records to drive discovery in disease genomics Nature Reviews Genetics. ,vol. 12, pp. 417- 428 ,(2011) , 10.1038/NRG2999
Francis S. Collins, Harold Varmus, A New Initiative on Precision Medicine New England Journal of Medicine. ,vol. 372, pp. 793- 795 ,(2015) , 10.1056/NEJMP1500523
Jacob B. Hall, Logan Dumitrescu, Holli H. Dilks, Dana C. Crawford, William S. Bush, Accuracy of Administratively-Assigned Ancestry for Diverse Populations in an Electronic Medical Record-Linked Biobank PLoS ONE. ,vol. 9, pp. e99161- ,(2014) , 10.1371/JOURNAL.PONE.0099161
Tyler J. VanderWeele, Whitney R. Robinson, On the causal interpretation of race in regressions adjusting for confounding and mediating variables Epidemiology. ,vol. 25, pp. 473- 484 ,(2014) , 10.1097/EDE.0000000000000105