作者: M. Muksitul Haque , Michael K. Skinner , Lawrence B. Holder
关键词:
摘要: In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with large ratio between minority and majority classes face hindrance in learning using any classifier. having magnitude difference number instances target concept result an imbalanced class distribution. Such datasets can range from biological data, sensor medical diagnostics, or other domain where labeling be time-consuming costly data may not easily available. The current study investigates algorithms solving distribution present epigenetic datasets. Epigenetic (DNA methylation) inherently come few differentially DNA methylated regions (DMR) non-DMR sites. For this imbalance problem, are compared, including TAN+AdaBoost algorithm. Experiments performed on four several known show that dataset have similar as regular learner