Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets

作者: Olga Krakovska , Gregory Christie , Andrew Sixsmith , Martin Ester , Sylvain Moreno

DOI: 10.1371/JOURNAL.PONE.0213584

关键词:

摘要: Large survey databases for aging-related analysis are often examined to discover key factors that affect a dependent variable of interest. Typically, this is performed with methods assuming linear dependencies between variables. Such assumptions however do not hold in many cases, wherein data linked by way non-linear dependencies. This turn requires applications analytic methods, which more accurate identifying potentially Here, we objectively compared the feature selection performance several frequently-used and three context large data. These were assessed using both synthetic real-world datasets, relationships features variables known advance. In contrast found offered better overall than all usage conditions. Moreover, was stable, being unaffected inclusion or exclusion from datasets. properties make preferable tool hypothesis-driven exploratory analyses

参考文章(46)
Paul R. Yarnold, Fred B. Bryant, Principal-components analysis and exploratory and confirmatory factor analysis. American Psychological Association. ,(1995)
Steven G. Gilmour, The Interpretation of Mallows's CP-Statistic The Statistician. ,vol. 45, pp. 49- 56 ,(1996) , 10.2307/2348411
Alan J. Miller, Subset Selection in Regression ,(2002)
Wassily Hoeffding, A Non-Parametric Test of Independence Annals of Mathematical Statistics. ,vol. 19, pp. 546- 557 ,(1948) , 10.1214/AOMS/1177730150
A.J. Alberg, Cigarette smoking: health effects and control strategies. Drugs of Today. ,vol. 44, pp. 895- 904 ,(2008) , 10.1358/DOT.2008.44.12.1308898
Jose Ferreira de Carvalho, N. R. Draper, H. Smith, Applied regression analysis 2nd ed. Journal of the American Statistical Association. ,vol. 76, pp. 1012- ,(1981) , 10.2307/2287608
Gábor J. Székely, Maria L. Rizzo, Brownian distance covariance The Annals of Applied Statistics. ,vol. 3, pp. 1236- 1265 ,(2009) , 10.1214/09-AOAS312
Peter Kennedy, A Guide to Econometrics ,(1979)
Catherine E. Ross, Chia-Ling Wu, Education, age, and the cumulative advantage in health Journal of Health and Social Behavior. ,vol. 37, pp. 104- 120 ,(1996) , 10.2307/2137234
PAMELA HERD, KAREN HOLDEN, YUNG TING SU, The Links between Early‐Life Cognition and Schooling and Late‐Life Financial Knowledge Journal of Consumer Affairs. ,vol. 46, pp. 411- 435 ,(2012) , 10.1111/J.1745-6606.2012.01235.X