Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA

作者: Zhe Lin , Sharad Sinha , Wei Zhang

DOI: 10.1109/FCCM.2019.00032

关键词:

摘要: Decision trees are machine learning models commonly used in various application scenarios. In the era of big data, traditional decision tree induction algorithms not suitable for large-scale datasets due to their stringent data storage requirement. Online have been devised tackle this problem by concurrently training with incoming samples and providing inference results. However, even most up-to-date online still suffer from either high memory usage or computational intensity dependency long latency, making them challenging implement hardware. To overcome these difficulties, we introduce a new quantile-based algorithm improve Hoeffding tree, one state-of-the-art models. The proposed is light-weight terms both demand, while maintaining generalization ability. A series optimization techniques dedicated investigated hardware perspective, including coarse-grained fine-grained parallelism, dynamic memory-based resource sharing, pipelining forwarding. We further present high-performance, hardware-efficient scalable system on field-programmable gate array (FPGA) system-level techniques. Experimental results show that our outperforms method, leading 0.05% 12.3% improvement accuracy. Real implementation complete FPGA demonstrates 384x 1581x speedup execution time over design.

参考文章(35)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Simon Fong, Yang Hang, An experimental comparison of decision trees in traditional data mining and data stream mining advanced information management and service. pp. 442- 447 ,(2010)
João Gama, Ricardo Fernandes, Ricardo Rocha, Decision trees for mining data streams intelligent data analysis. ,vol. 10, pp. 23- 45 ,(2006) , 10.3233/IDA-2006-10103
Bernhard Pfahringer, Geoffrey Holmes, Richard Kirkby, Handling numeric attributes in hoeffding trees knowledge discovery and data mining. pp. 296- 307 ,(2008) , 10.1007/978-3-540-68125-0_27
Chuan Cheng, Christos-Savvas Bouganis, None, Memory optimisation for hardware induction of axis-parallel decision tree 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14). pp. 1- 5 ,(2014) , 10.1109/RECONFIG.2014.7032538
Chuan Cheng, Christos-Savvas Bouganis, None, Accelerating Random Forest training process using FPGA field-programmable logic and applications. pp. 1- 7 ,(2013) , 10.1109/FPL.2013.6645500
Rob J. Hyndman, Yanan Fan, Sample Quantiles in Statistical Packages The American Statistician. ,vol. 50, pp. 361- 365 ,(1996) , 10.1080/00031305.1996.10473566
Geoff Hulten, Laurie Spencer, Pedro Domingos, Mining time-changing data streams knowledge discovery and data mining. pp. 97- 106 ,(2001) , 10.1145/502512.502529
Yun R. Qu, Viktor K. Prasanna, Scalable and dynamically updatable lookup engine for decision-trees on FPGA ieee high performance extreme computing conference. pp. 1- 6 ,(2014) , 10.1109/HPEC.2014.7040952
Pedro Domingos, Geoff Hulten, Mining high-speed data streams knowledge discovery and data mining. pp. 71- 80 ,(2000) , 10.1145/347090.347107