作者: Saharon Rosset , Claudia Perlich , Grzergorz Świrszcz , Prem Melville , Yan Liu
DOI: 10.1007/S10618-009-0158-X
关键词:
摘要: Two major data mining competitions in 2008 presented challenges medical domains: KDD Cup 2008, which concerned cancer detection from mammography data; and Informs Data Mining Challenge dealing with diagnosis of pneumonia based on patient information hospital files. Our team won both these competitions, this paper we share our lessons learned insights. We emphasize the aspects that pertain to general practice methodology mining, rather than specifics each modeling competition. concentrate three topics: leakage, its effect proof-of-concept projects; consideration real-life model performance measures construction evaluation; relational learning approaches tasks.