作者: Keith Feldman , Nicholas Hazekamp , Nitesh V. Chawla
DOI: 10.1109/ICHI.2016.37
关键词:
摘要: Over the past decade, application of data science techniques to clinical has allowed practitioners and researchers develop a sundry analytical models. These models have traditionally relied on structured drawn from Electronic Medical Records (EMR). Yet, large portion EMR remains unstructured, primarily held within notes. While recent work produced for extracting features unstructured text, this generally operates under untested assumption that all text can be processed in similar manner. This paper provides what we believe first comprehensive evaluation differences between four major sources providing an structural, linguistic, topical among notes each category. Our conclusions support premise tools designed extract must account categories they process.