作者: Vu Nguyen , Svetha Venkatesh , Truyen Tran , Dinh Phung , Wei Luo
DOI:
关键词:
摘要: Although random control trial is the gold standard in medical research, researchers are increasingly looking to alternative data sources for hypothesis generation and early-stage evidence collection. Coded clinical collected routinely most hospitals. While they contain rich information directly related real setting, both noisy semantically diverse, making them difficult analyze with conventional statistical tools. This paper presents a novel application of Bayesian nonparametric modeling uncover latent coded data. For patient cohort, model used reveal common comorbidity groups shared by patients proportion that each group reflected individual patient. To demonstrate method, we present case study based on hospitalization coding from an Australian hospital. The recovered 15 among 1012 hospitalized during month. When two areas unequal socio-economic status were compared, it reveals higher prevalence diverticular disease region lower status. builds convincing routine speed up generation.