Learn++.MF: A random subspace approach for the missing feature problem

作者: Robi Polikar , Joseph DePasquale , Hussein Syed Mohammed , Gavin Brown , Ludmilla I Kuncheva

DOI: 10.1016/J.PATCOG.2010.05.028

关键词:

摘要: We introduce Learn^+^+.MF, an ensemble-of-classifiers based algorithm that employs random subspace selection to address the missing feature problem in supervised classification. Unlike most established approaches, Learn^+^+.MF does not replace values with estimated ones, and hence need specific assumptions on underlying data distribution. Instead, it trains ensemble of classifiers, each a subset available features. Instances are classified by majority voting those classifiers whose training did include show can accommodate substantial amount data, only gradual decline performance as increases. also analyze effect cardinality subsets, size performance. Finally, we discuss conditions under which proposed approach is effective.

参考文章(48)
David H. Wolpert, Original Contribution: Stacked generalization Neural Networks. ,vol. 5, pp. 241- 259 ,(1992) , 10.1016/S0893-6080(05)80023-1
Amit Gupta, Monica Lam, The weight decay backpropagation for generalizations with missing values Annals of Operations Research. ,vol. 78, pp. 165- 187 ,(1998) , 10.1023/A:1018945915940
Marina Skurichina, Robert P. W. Duin, Bagging and the Random Subspace Method for Redundant Feature Spaces multiple classifier systems. pp. 1- 10 ,(2001) , 10.1007/3-540-48219-9_1
Marco Ramoni, Paola Sebastiani, Robust Learning with Missing Data Machine Learning. ,vol. 45, pp. 147- 170 ,(2001) , 10.1023/A:1010968702992
Song-Yee Yoon, Soo-Young Lee, Training Algorithm with Incomplete Data for Feed-ForwardNeural Networks Neural Processing Letters. ,vol. 10, pp. 171- 179 ,(1999) , 10.1023/A:1018772122605
Joseph DePasquale, Robi Polikar, Random feature subset selection for ensemble based classification of data with missing features international conference on multiple classifier systems. pp. 251- 260 ,(2007) , 10.1007/978-3-540-72523-7_26
Niall Rooney, Alexey Tsymbal, Sarab S. Anand, David W. Patterson, Random subspacing for regression ensembles the florida ai research society. pp. 532- 537 ,(2004)
Yongsong Qin, Shichao Zhang, Empirical likelihood confidence intervals for differences between two datasets with missing data Pattern Recognition Letters. ,vol. 29, pp. 803- 812 ,(2008) , 10.1016/J.PATREC.2007.12.010
P. K. Sharpe, R. J. Solly, Dealing with missing values in neural network-based diagnostic systems Neural Computing and Applications. ,vol. 3, pp. 73- 77 ,(1995) , 10.1007/BF01421959
Yoav Freund, Robert E Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting conference on learning theory. ,vol. 55, pp. 119- 139 ,(1997) , 10.1006/JCSS.1997.1504