Non-Disjoint Discretization for Naive-Bayes Classifiers

作者: Geoffrey I. Webb , Ying Yang

DOI:

关键词: Naive Bayes classifierBayes classifierBayes error rateArtificial intelligencePattern recognitionDiscretizationComputer scienceValue (computer science)Interval (mathematics)Variance (accounting)Disjoint sets

摘要: Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals attribute, always locating value toward the middle of an interval obtain more reliable probability estimation. It also adjusts number and size training instances, seeking trade-off between bias variance justify in theory test it on wide cross-section datasets. Our experimental results suggest naiveBayes classifiers, works better than alternative approaches.

参考文章(13)
Hung-Ju Huang, Tzu-Tsung Wong, Why Discretization Works for Naive Bayesian Classifiers international conference on machine learning. pp. 399- 406 ,(2000)
Bojan Cestnik, Estimating probabilities: a crucial task in machine learning european conference on artificial intelligence. pp. 147- 149 ,(1990)
Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)
J. Catlett, On changing continuous attributes into ordered discrete attributes Lecture Notes in Computer Science. pp. 164- 178 ,(1991) , 10.1007/BFB0017012
James Dougherty, Ron Kohavi, Mehran Sahami, Supervised and Unsupervised Discretization of Continuous Features Machine Learning Proceedings 1995. pp. 194- 202 ,(1995) , 10.1016/B978-1-55860-377-6.50032-3
George H. John, Pat Langley, Estimating continuous distributions in Bayesian classifiers uncertainty in artificial intelligence. pp. 338- 345 ,(1995)
G. Bhattacharyya, R. A. Johnson, Statistics: Principles and Methods ,(1985)
Yiming Yang, Xin Liu, A re-examination of text categorization methods international acm sigir conference on research and development in information retrieval. pp. 42- 49 ,(1999) , 10.1145/312624.312647
C. L. Blake, UCI Repository of machine learning databases www.ics.uci.edu/〜mlearn/MLRepository.html. ,(1998)
Geoffrey I. Webb, MultiBoosting: A Technique for Combining Boosting and Wagging Machine Learning. ,vol. 40, pp. 159- 196 ,(2000) , 10.1023/A:1007659514849