Medical data mining: insights from winning two competitions

作者: Saharon Rosset , Claudia Perlich , Grzergorz Świrszcz , Prem Melville , Yan Liu

DOI: 10.1007/S10618-009-0158-X

关键词:

摘要: Two major data mining competitions in 2008 presented challenges medical domains: KDD Cup 2008, which concerned cancer detection from mammography data; and Informs Data Mining Challenge dealing with diagnosis of pneumonia based on patient information hospital files. Our team won both these competitions, this paper we share our lessons learned insights. We emphasize the aspects that pertain to general practice methodology mining, rather than specifics each modeling competition. concentrate three topics: leakage, its effect proof-of-concept projects; consideration real-life model performance measures construction evaluation; relational learning approaches tasks.

参考文章(35)
David H. Wolpert, Original Contribution: Stacked generalization Neural Networks. ,vol. 5, pp. 241- 259 ,(1992) , 10.1016/S0893-6080(05)80023-1
Claudia Perlich, Foster Provost, ACORA: Distribution-Based Aggregation for Relational Learning from Identifier Attributes Social Science Research Network. ,(2005)
José Hernández-Orallo, Peter A. Flach, César Ferri, Learning Decision Trees Using the Area Under the ROC Curve international conference on machine learning. pp. 139- 146 ,(2002)
William W. Cohen, Zhenzhen Kou, Stacked Graphical Models for Efficient Inference in Markov Random Fields. siam international conference on data mining. pp. 533- 538 ,(2007)
OL Mangasarian, A Smola, P Bartlett, B Schölkopf, D Schuurmans, Advances in Large Margin Classifiers MIT Press. ,(2000)
Thorsten Joachims, Making large scale SVM learning practical Technical reports. ,(1999) , 10.17877/DE290R-14262
Lise Getoor, Ben Taskar, Introduction to statistical relational learning MIT Press. ,(2007)
BSCH OLKOPF, C Burges, A Smola, Advances in kernel methods: support vector learning international conference on neural information processing. ,(1999) , 10.5555/299094