Decision Trees for Uncertain Data

作者： Smith Tsang , Ben Kao , Kevin Y. Yip , Wai-Shing Ho , Sau Dan Lee

DOI: 10.1109/TKDE.2009.175

关键词: Machine learning 、 Data mining 、 Artificial intelligence 、 Decision tree 、 Tuple 、 Group method of data handling 、 Complete information 、 Incremental decision tree 、 Uncertain data 、 Computer science 、 Probability distribution 、 Decision tree learning

摘要: Traditional decision tree classifiers work with data whose values are known and precise. We extend such to handle uncertain information. Value uncertainty arises in many applications during the collection process. Example sources of include measurement/quantization errors, staleness, multiple repeated measurements. With uncertainty, value a item is often represented not by one single value, but forming probability distribution. Rather than abstracting statistical derivatives (such as mean median), we discover that accuracy classifier can be much improved if "complete information" (taking into account density function (pdf)) utilized. classical building algorithms tuples values. Extensive experiments have been conducted which show resulting more accurate those using averages. Since processing pdfs computationally costly (e.g., averages), construction on CPU demanding for certain data. To tackle this problem, propose series pruning techniques greatly improve efficiency.

参考文章(32)

Michael Chau, Reynold Cheng, Ben Kao, Jackey Ng, Uncertain data mining: an example in clustering location data knowledge discovery and data mining. pp. 199- 204 ,(2006) , 10.1007/11731139_24

Lamis Hawarah, Ana Simonet, Michel Simonet, A Probabilistic Approach to Classify Incomplete Objects Using Decision Trees Lecture Notes in Computer Science. pp. 549- 558 ,(2004) , 10.1007/978-3-540-30075-5_53

Freed Gl, Fraley Jk, 25% "Error Rate" in Ear Temperature Sensing Device Pediatrics. ,vol. 87, pp. 414- 416 ,(1991)

General and Efficient Multisplitting of Numerical Attributes Machine Learning. ,vol. 36, pp. 201- 244 ,(1999) , 10.1023/A:1007674919412

Ouri Wolfson, Huabei Yin, Accuracy and Resource Consumption in Tracking and Location Prediction symposium on large spatial databases. pp. 325- 343 ,(2003) , 10.1007/978-3-540-45072-6_19

Oscar Ortega Lobo, Masayuki Numao, Ordered Estimation of Missing Values pacific asia conference on knowledge discovery and data mining. pp. 499- 503 ,(1999) , 10.1007/3-540-48912-6_67

Edward Hung, Lise Getoor, V. S. Subrahmanian, Probabilistic interval XML ACM Transactions on Computational Logic. ,vol. 8, pp. 24- ,(2007) , 10.1145/1276920.1276926

Hans-Peter Kriegel, Martin Pfeifle, Density-based clustering of uncertain data knowledge discovery and data mining. pp. 672- 677 ,(2005) , 10.1145/1081870.1081955

J.R. Quinlan, Learning Logical Definitions from Relations Machine Learning. ,vol. 5, pp. 239- 266 ,(1990) , 10.1023/A:1022699322624

10.

Cristina Olaru, Louis Wehenkel, A complete fuzzy decision tree technique Fuzzy Sets and Systems. ,vol. 138, pp. 221- 254 ,(2003) , 10.1016/S0165-0114(03)00089-7

Decision Trees for Uncertain Data

来源期刊

我的账户

Decision Trees for Uncertain Data

来源期刊

相似文章 10

我的账户