Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

作者: Keith Feldman , Louis Faust , Xian Wu , Chao Huang , Nitesh V. Chawla

DOI: 10.1007/978-3-319-69775-8_9

关键词:

摘要: From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade marked long and arduous transformation bringing into digital age. Ranging from electronic health records, digitized imaging laboratory reports, public datasets, today, now generates an incredible amount of information. Such wealth data presents exciting opportunity for integrated machine learning solutions address problems across multiple facets practice administration. Unfortunately, ability derive accurate informative insights requires more than execute models. Rather, deeper understanding on which models are run is imperative their success. While significant effort been undertaken develop able process volume obtained during analysis millions digitalized patient it important remember that represents only one aspect data. In fact, drawing increasingly diverse set sources, incredibly complex attributes must be accounted throughout pipeline. This chapter focuses highlighting such challenges, broken down three distinct components, each representing phase We begin with preprocessing, then move considerations model building, end challenges interpretation output. For component, we present discussion around as relates domain offer insight may impose efficiency techniques.

参考文章(72)
Holger H. Hoos, Automated Algorithm Configuration and Parameter Tuning Autonomous Search. pp. 37- 71 ,(2011) , 10.1007/978-3-642-21434-9_3
John W. Graham, Missing Data Theory Missing Data. pp. 3- 46 ,(2012) , 10.1007/978-1-4614-4018-5_1
Marco Valtorta, Valerie Sessions, The Effects of Data Quality on Machine Learning Algorithms. ICIQ. pp. 485- 498 ,(2006)
Medical devices--measurement, quality assurance, and standards American Society for Testing and Materials. ,(1983) , 10.1520/STP800-EB
John D. Lafferty, Larry Wasserman, Challenges in Statistical Machine Learning Statistica Sinica. ,vol. 16, pp. 307- ,(2006)
L. D. Jackel, Corinna Cortes, Wan-Ping Chiang, Limits on learning machine accuracy imposed by data quality knowledge discovery and data mining. pp. 57- 62 ,(1995)
Cory D. Kidd, Robert Orr, Gregory D. Abowd, Christopher G. Atkeson, Irfan A. Essa, Blair MacIntyre, Elizabeth Mynatt, Thad E. Starner, Wendy Newstetter, The Aware Home: A Living Laboratory for Ubiquitous Computing Research Lecture Notes in Computer Science. pp. 191- 198 ,(1999) , 10.1007/10705432_17
Vipin Kumar, Pang-Ning Tan, Michael M. Steinbach, Introduction to Data Mining ,(2013)