Mining the Clinical Narrative: All Text are Not Equal

作者: Keith Feldman , Nicholas Hazekamp , Nitesh V. Chawla

DOI: 10.1109/ICHI.2016.37

关键词:

摘要: Over the past decade, application of data science techniques to clinical has allowed practitioners and researchers develop a sundry analytical models. These models have traditionally relied on structured drawn from Electronic Medical Records (EMR). Yet, large portion EMR remains unstructured, primarily held within notes. While recent work produced for extracting features unstructured text, this generally operates under untested assumption that all text can be processed in similar manner. This paper provides what we believe first comprehensive evaluation differences between four major sources providing an structural, linguistic, topical among notes each category. Our conclusions support premise tools designed extract must account categories they process.

参考文章(44)
Radim Řehůřek, Petr Sojka, Software Framework for Topic Modelling with Large Corpora University of Malta. ,(2010)
Danning He, Simon C Mathews, Anthony N Kalloo, Susan Hutfless, Mining high-dimensional administrative claims data to predict early hospital readmissions. Journal of the American Medical Informatics Association. ,vol. 21, pp. 272- 279 ,(2014) , 10.1136/AMIAJNL-2013-002151
Dan Jurafsky, James H. Martin, Speech and Language Processing ,(1999)
William R. Hogan, Michael M. Wagner, Free-text fields change the meaning of coded data. conference of american medical informatics association. pp. 517- 521 ,(1996)
Lehnert Wg, Feng F, Ponte Jm, Croft Wb, Aronow Db, Soderland S, Automated classification of encounter notes in a computer based medical record. Medinfo. MEDINFO. pp. 8- 12 ,(1995)
George Hripcsak, Matthew Scotch, Stephen B. Johnson, Peter D. Stetson, The sublanguage of cross-coverage. american medical informatics association annual symposium. pp. 742- 746 ,(2002)
Hude Quan, Bing Li, L. Duncan Saunders, Gerry A. Parsons, Carolyn I. Nilsson, Arif Alibhai, William A. Ghali, , Assessing Validity of ICD-9-CM and ICD-10 Administrative Data in Recording Clinical Conditions in a Unique Dually Coded Database Health Services Research. ,vol. 43, pp. 1424- 1441 ,(2008) , 10.1111/J.1475-6773.2007.00822.X
Carol Friedman, A broad-coverage natural language processing system. american medical informatics association annual symposium. pp. 270- 274 ,(2000)
Douglas Biber, Susan Conrad, Randi Reppen, Corpus Linguistics Cambridge University Press. ,(1998) , 10.1017/CBO9780511804489
Thomas Ernest Perry, Hongyuan Zha, Ke Zhou, Patricio Frias, Dadan Zeng, Mark Braunstein, Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology. Journal of the American Medical Informatics Association. ,vol. 21, ,(2014) , 10.1136/AMIAJNL-2013-001792