作者: Włodzimierz Funika , Paweł Koperek
DOI: 10.1007/978-3-319-32149-3_46
关键词: Symbolic regression 、 Bottleneck 、 Evolutionary programming 、 Distributed computing 、 Service (systems architecture) 、 Resource (project management) 、 Speedup 、 Implementation 、 Spark (mathematics) 、 Computer science
摘要: Organizations across the globe gather more and data. Large datasets require new approaches to analysis processing, which include methods based on machine learning. In particular, symbolic regression can provide many useful insights. Unfortunately, due high resource requirements, use of this method for large might be unfeasible. paper we analyze a bottleneck in an open-source implementation method, call hubert. We identify that evaluation individuals is most costly operation. As solution problem, propose service Apache Spark framework, attempts speed up computations by distributing them cluster machines. compare performance analyzing execution time number samples with both implementations. Then discuss how computation improves increased amount resources. Finally draw conclusions outline plans further research.