Decision Trees for Uncertain Data

作者: Smith Tsang , Ben Kao , Kevin Y. Yip , Wai-Shing Ho , Sau Dan Lee

DOI: 10.1109/TKDE.2009.175

关键词: Machine learningData miningArtificial intelligenceDecision treeTupleGroup method of data handlingComplete informationIncremental decision treeUncertain dataComputer scienceProbability distributionDecision tree learning

摘要: Traditional decision tree classifiers work with data whose values are known and precise. We extend such to handle uncertain information. Value uncertainty arises in many applications during the collection process. Example sources of include measurement/quantization errors, staleness, multiple repeated measurements. With uncertainty, value a item is often represented not by one single value, but forming probability distribution. Rather than abstracting statistical derivatives (such as mean median), we discover that accuracy classifier can be much improved if "complete information" (taking into account density function (pdf)) utilized. classical building algorithms tuples values. Extensive experiments have been conducted which show resulting more accurate those using averages. Since processing pdfs computationally costly (e.g., averages), construction on CPU demanding for certain data. To tackle this problem, propose series pruning techniques greatly improve efficiency.

参考文章(32)
Michael Chau, Reynold Cheng, Ben Kao, Jackey Ng, Uncertain data mining: an example in clustering location data knowledge discovery and data mining. pp. 199- 204 ,(2006) , 10.1007/11731139_24
Lamis Hawarah, Ana Simonet, Michel Simonet, A Probabilistic Approach to Classify Incomplete Objects Using Decision Trees Lecture Notes in Computer Science. pp. 549- 558 ,(2004) , 10.1007/978-3-540-30075-5_53
Freed Gl, Fraley Jk, 25% "Error Rate" in Ear Temperature Sensing Device Pediatrics. ,vol. 87, pp. 414- 416 ,(1991)
Ouri Wolfson, Huabei Yin, Accuracy and Resource Consumption in Tracking and Location Prediction symposium on large spatial databases. pp. 325- 343 ,(2003) , 10.1007/978-3-540-45072-6_19
Oscar Ortega Lobo, Masayuki Numao, Ordered Estimation of Missing Values pacific asia conference on knowledge discovery and data mining. pp. 499- 503 ,(1999) , 10.1007/3-540-48912-6_67
Edward Hung, Lise Getoor, V. S. Subrahmanian, Probabilistic interval XML ACM Transactions on Computational Logic. ,vol. 8, pp. 24- ,(2007) , 10.1145/1276920.1276926
Hans-Peter Kriegel, Martin Pfeifle, Density-based clustering of uncertain data knowledge discovery and data mining. pp. 672- 677 ,(2005) , 10.1145/1081870.1081955
J.R. Quinlan, Learning Logical Definitions from Relations Machine Learning. ,vol. 5, pp. 239- 266 ,(1990) , 10.1023/A:1022699322624
Cristina Olaru, Louis Wehenkel, A complete fuzzy decision tree technique Fuzzy Sets and Systems. ,vol. 138, pp. 221- 254 ,(2003) , 10.1016/S0165-0114(03)00089-7