作者: Maria P. Tzamtzi , Fotis N. Koumboulis
关键词:
摘要: The performance of classification models can be negatively impacted if the data on which they are trained contains very rare events. While recent research has investigated issue class imbalance, few any studies address issues related to handling extreme imbalance (rare events), where minority account for as little 0.1% training data. This work investigates effect dataset size and distribution when examples from rare. In addition, we compare improvement achieved by acquiring additional that applying sampling. Our results demonstrate sampling is effective at alleviating problem