作者: Sibghat Ullah Bazai , Julian Jang-Jaccard
DOI: 10.1007/978-3-030-36938-5_40
关键词:
摘要: Recent proposals in data anonymization have mostly been focused around MapReduce, though the advantages of Spark well documented. To address this concern, we propose a new novel technique for Apache Spark. SparkDA, our proposal, takes full innovative features, such as better partition control, in-memory process, and cache management iterative operations, while providing high utility with privacy. These are achieved by proposing algorithms through Spark’s Resilient Distributed Dataset (RDD). Our implemented at two main processing RDD transformations, FlatMapRDD ReduceByKeyRDD, respectively. experimental results show that proposed approach provides required privacy levels scalability high-performance essential to many large datasets.