摘要: Our null hypothesis was that a computer algorithm will not predict breast cancer patients' 10-year survival with greater accuracy than the 64.3% baseline of Surveillance Epidemiology and End Results (SEER) database [3]. The aims this study were to (1) Build an infrastructure convert SEER data into machine readable format; (2) Train Machine Learning (ML) algorithms survival; (3) Measure predictive ML algorithms. We downloaded 657,711 clinical demographic characteristics from converted them machine-readable feature vectors. An oncologist generated list potential variables for trained WEKA package's Logistic Regression (LR), Naive Bayes, C4.5 Decision Tree on using ten-fold cross validation. LR, achieved accuracies 76.29%, 59.71%, 77.43% respectively. compared results LR those well-known website, Adjuvant! Online. rejected Tree, but failed reject Bayes. Of tested, proved be most accurate predictor patient in ten years. In addition, provided more predictions without Adjuvant!'s limitations.