作者: Sam Khinda , O. Uspenskaya-Cadoz , C. Rubel , Y. Nigmatullina , N. Kayal
关键词:
摘要: Recruiting patients for clinical trials of potential therapies Alzheimer’s disease (AD) remains a major challenge, with demand trial participants at an all-time high. The AD treatment R&D pipeline includes around 112 agents. In the United States alone, 150 are seeking 70,000 participants. Most people early cognitive impairment consult primary care providers, who may lack time, diagnostic skills and awareness local trials. Machine learning predictive analytics offer promise to boost enrollment by predicting which have prodromal AD, will go on develop AD. authors set out machine model that identifies in general population, aid detection physicians timely referral expert sites biomarker confirmation diagnosis enrollment. use classification algorithm extract patterns within healthcare claims prescription data three years prior diagnosis/AD drug initiation. study focused subjects included proprietary IQVIA US assets (claims databases). Patient information was extracted from January 2010 July 2018, cohorts aged between 50 85 years. A total 88,298,289 were identified. For positive cohort, 667,288 identified had 24 months medical history least one record or treatment. negative 3,670,254 selected similar length matched cohort based prevalence rate. scoring availability recent 2–5 72,670,283 ages None. list clinically–relevant interpretable predictors generated sets each subject, including pharmacological treatments (NDC/ product), office/specialist visits (specialty), tests procedures (HCPCS CPT), (ICD). defined as 3 offset estimate diagnosis. Supervised ML techniques used algorithms predict occurrence cases. sample dataset divided randomly into training test dataset. models trained executed PySpark framework. Training evaluation LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, GBTClassifier using PySpark’s mllib module. area under precision-recall curve (AUCPR) compare results various models. AUCPRs 0.426, 0.157, 0.436, 0.440 GBTClassifier, respectively, meaning (Gradient Boosted Tree) outperforms other classifiers. GBT 222,721 stage 80% precision. Some 76% setting. Applying developed U.S. residents, identified, majority whom This could drive advances research enabling more accurate earlier physician level, would facilitate in–depth assessment enrolment