作者: Mincheng Chen , Jingling Yuan , Lin Li , Dongling Liu , Yang He
DOI: 10.1007/S10115-018-1288-5
关键词: Data classification 、 Energy management 、 Missing data 、 Heuristic (computer science) 、 Data center 、 Data pre-processing 、 Algorithm 、 Computer science 、 Spark (mathematics) 、 Energy consumption
摘要: Energy data, which consist of energy consumption statistics and other related data in green centers, grow dramatically. The have great value, but many attributes within them are redundant unnecessary, they a serious impact on the performance center’s decision-making system. Thus, attribute reduction for has been conceived as critical step. However, existing algorithms often computationally time-consuming. To address these issues, firstly, we extend methodology rough sets to construct center knowledge representation will occur some degree exceptions caused by power failure, instability or factors; hence, design an integrated preprocessing method using Spark mainly includes sampling analysis, classification, missing filling, outlier prediction discretization. By taking good advantage in-memory computing, fast heuristic algorithm (FHARA-S) is proposed. In this algorithm, use efficient transforming decision table, formula measuring significance reduce search space, introduce correlation between condition attribute, further improve computational efficiency. We also adaptive management architecture based FHARA-S, can efficiency strengthen management. experimental results show speed our gains up 2.2X improvement over traditional MapReduce 0.61X Spark. Besides, saves more resources.