An Evaluation of Discretization Methods for Learning Rules from Biomedical Datasets.

作者: Shyam Visweswaran , Jonathan L. Lustgarten , Himanshu Grover , Vanathi Gopalakrishnan

DOI:

关键词: Knowledge extractionMachine learningBayesian probabilityDiscretizationSelection (linguistics)Artificial intelligenceDomain (software engineering)Minimum description lengthComputer scienceStandard techniqueRule sets

摘要: Rule learning has the major advantage of understandability by human experts when performing knowledge discovery within biomedical domain. Many rule algorithms require discrete data in order to learn IF-THEN sets. This requirement makes selection a discretization technique an important step learning. We compare performance one standard technique, Fayyad and Irani’s Minimum Description Length Principle Criterion, which is defacto method many machine packages, that new Efficient Bayesian Discretization (EBD) show EBD leads significant gains especially as complexity learner increases.

参考文章(17)
Johannes Fürnkranz, Gerhard Widmer, Incremental Reduced Error Pruning Machine Learning Proceedings 1994. pp. 70- 77 ,(1994) , 10.1016/B978-1-55860-335-6.50017-9
J. Ross Quinlan, C4.5: Programs for Machine Learning ,(1992)
Huan Liu, Farhad Hussain, Chew Lim Tan, Manoranjan Dash, Discretization: An Enabling Technique Data Mining and Knowledge Discovery. ,vol. 6, pp. 393- 423 ,(2002) , 10.1023/A:1016304305535
U.M. Feyyad, Data mining and knowledge discovery: making sense out of data IEEE Intelligent Systems. ,vol. 11, pp. 20- 25 ,(1996) , 10.1109/64.539013
Scott H Clearwater, Foster J Provost, None, RL4: a tool for knowledge-based induction [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence. pp. 24- 30 ,(1990) , 10.1109/TAI.1990.130305
John M. Aronis, Poster J. Provost, Increasing the efficiency of data mining algorithms with Breadth-first marker propagation knowledge discovery and data mining. pp. 119- 122 ,(1997)
Ron Kohavi, Mehran Sahami, Error-based and entropy-based discretization of continuous features knowledge discovery and data mining. pp. 114- 119 ,(1996)
Vanathi Gopalakrishnan, Philip Ganchev, Srikanth Ranganathan, Robert Bowser, Rule learning for disease-specific biomarker discovery from clinical proteomic mass spectra international conference on data mining. pp. 93- 105 ,(2006) , 10.1007/11691730_10
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)