作者: Smith Tsang , Ben Kao , Kevin Y. Yip , Wai-Shing Ho , Sau Dan Lee
关键词: Machine learning 、 Data mining 、 Artificial intelligence 、 Decision tree 、 Tuple 、 Group method of data handling 、 Complete information 、 Incremental decision tree 、 Uncertain data 、 Computer science 、 Probability distribution 、 Decision tree learning
摘要: Traditional decision tree classifiers work with data whose values are known and precise. We extend such to handle uncertain information. Value uncertainty arises in many applications during the collection process. Example sources of include measurement/quantization errors, staleness, multiple repeated measurements. With uncertainty, value a item is often represented not by one single value, but forming probability distribution. Rather than abstracting statistical derivatives (such as mean median), we discover that accuracy classifier can be much improved if "complete information" (taking into account density function (pdf)) utilized. classical building algorithms tuples values. Extensive experiments have been conducted which show resulting more accurate those using averages. Since processing pdfs computationally costly (e.g., averages), construction on CPU demanding for certain data. To tackle this problem, propose series pruning techniques greatly improve efficiency.