作者: Luis Torgo , Paula Branco , Rita P. Ribeiro
DOI:
关键词:
摘要: Many real world data mining applications involve obtaining predictive models using sets with strongly imbalanced distributions of the target variable. Frequently, least common values this variable are associated events that highly relevant for end users (e.g. fraud detection, unusual returns on stock markets, anticipation catastrophes, etc.). Moreover, may have dierent costs and benets, which when rarity some them available training creates serious problems to modelling techniques. This paper presents a survey existing techniques handling these important analytics. Although most work addresses classication tasks (nominal variables), we also describe methods designed handle similar within regression (numeric variables). In discuss main challenges raised by distributions, approaches problems, propose taxonomy refer related modelling.