作者: Marcus Suresh , Ronnie Taib , Yanchang Zhao , Warren Jin
DOI: 10.1007/978-3-030-35288-2_18
关键词:
摘要: Incomplete data are quite common which can deteriorate statistical inference, often affecting evidence-based policymaking. A typical example is the Business Longitudinal Analysis Data Environment (BLADE), an Australian Government’s national asset. In this paper, motivated by helping BLADE practitioners select and implement advanced imputation methods with a solid understanding of impact different will have on accuracy reliability, we examine performance techniques based 12 machine learning algorithms. They range from linear regression to neural networks. We compare these algorithms assess various settings, including number input features length time spans. To generalisability, also impute two distinct characteristics. Experimental results show that three ensemble algorithms: extra trees regressor, bagging regressor random forest consistently maintain high over benchmark across metrics. Among them, would recommend for its computational efficiency.