Support Vector Feature Selection for Early Detection of Anastomosis Leakage From Bag-of-Words in Electronic Health Records

作者: Cristina Soguero-Ruiz , Kristian Hindberg , Jose Luis Rojo-Alvarez , Stein Olav Skrovseth , Fred Godtliebsen

DOI: 10.1109/JBHI.2014.2361688

关键词: Data miningSupport vector machineEntropy (information theory)Predictive modellingElective surgeryMargin classifierBag-of-words modelFeature extractionFeature selectionMedicine

摘要: The free text in electronic health records (EHRs) conveys a huge amount of clinical information about state and patient history. Despite rapidly growing literature on the use machine learning techniques for extracting this information, little effort has been invested toward feature selection features’ corresponding medical interpretation. In study, we focus task early detection anastomosis leakage (AL), severe complication after elective surgery colorectal cancer (CRC) surgery, using extracted from EHRs. We bag-of-words model to investigate potential strategies. purpose is earlier AL prediction with data generated EHR before actual occur. Due high dimensionality data, derive strategies robust support vector linear maximum margin classifier, by investigating: 1) simple statistical criterion (leave-one-out-based test); 2) an intensive-computation (Bootstrap resampling); 3) advanced (kernel entropy). Results reveal discriminatory power complications CRC (sensitivity 100%; specificity 72%). These results can be used develop models, based that surgeons patients preoperative decision making phase.

参考文章(46)
Alexandru Niculescu-Mizil, Suchi Saria, Chris Paxton, Developing predictive models using electronic medical records: challenges and pitfalls. american medical informatics association annual symposium. ,vol. 2013, pp. 1109- 1115 ,(2013)
Edda Leopold, Jörg Kindermann, Text Categorization with Support Vector Machines. How to Represent Texts in Input Space Machine Learning. ,vol. 46, pp. 423- 444 ,(2002) , 10.1023/A:1012491419635
Nello Cristianini, John Shawe-Taylor, Kernel Methods for Pattern Analysis ,(2004)
N Milic-Frayling, M Grobelnik, J Brank, D Mladenic, Feature Selection Using Support Vector Machines WIT Transactions on Information and Communication Technologies. ,vol. 28, ,(2002) , 10.2495/DATA020271
Marko Grobelnik, Dunja Mladenic, Feature Selection for Unbalanced Class Distribution and Naive Bayes international conference on machine learning. pp. 258- 267 ,(1999)
P. Knaup, E. J. S. Hovenga, S. Heard, S. Garde, Towards Semantic Interoperability for Electronic Health Records Methods of Information in Medicine. ,vol. 46, pp. 332- 343 ,(2007) , 10.1160/ME5001
Chaitanya Shivade, Preethi Raghavan, Eric Fosler-Lussier, Peter J Embi, Noemie Elhadad, Stephen B Johnson, Albert M Lai, A review of approaches to identifying patient phenotype cohorts using electronic health records Journal of the American Medical Informatics Association. ,vol. 21, pp. 221- 230 ,(2014) , 10.1136/AMIAJNL-2013-001935
Nir Friedman, Dan Geiger, Moises Goldszmidt, Bayesian Network Classifiers Machine Learning. ,vol. 29, pp. 131- 163 ,(1997) , 10.1023/A:1007465528199
I.M. Guyon, S.R. Gunn, L. Zadeh, M. Nikravesh, Feature extraction : foundations and applications Springer. ,(2006)
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0