作者: Geoffrey I. Webb , Ying Yang
DOI:
关键词: Naive Bayes classifier 、 Bayes classifier 、 Bayes error rate 、 Artificial intelligence 、 Pattern recognition 、 Discretization 、 Computer science 、 Value (computer science) 、 Interval (mathematics) 、 Variance (accounting) 、 Disjoint sets
摘要: Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals attribute, always locating value toward the middle of an interval obtain more reliable probability estimation. It also adjusts number and size training instances, seeking trade-off between bias variance justify in theory test it on wide cross-section datasets. Our experimental results suggest naiveBayes classifiers, works better than alternative approaches.