作者: Justin C.W. Debuse , Victor J. Rayward-Smith
关键词: Database 、 Simulated annealing 、 Computer science 、 Data mining algorithm 、 Overfitting 、 Data mining 、 Discretization
摘要: An introduction to the approaches used discretise continuous database features is given, together with a discussion of potential benefits such techniques. These are investigated by applying discretisation algorithms two large commercial databases; discretisations yielded then evaluated using simulated annealing based data mining algorithm. The results produced suggest that dramatic reductions in problem size may be achieved, yielding improvements speed However, it also demonstrated under certain circumstances give an increase or allow overfitting Such cases, within which often only small proportion belongs class interest, highlight need both for caution when producing and development more robust algorithms.