SparkDA: RDD-Based High-Performance Data Anonymization Technique for Spark Platform.

作者: Sibghat Ullah Bazai , Julian Jang-Jaccard

DOI: 10.1007/978-3-030-36938-5_40

关键词:

摘要: Recent proposals in data anonymization have mostly been focused around MapReduce, though the advantages of Spark well documented. To address this concern, we propose a new novel technique for Apache Spark. SparkDA, our proposal, takes full innovative features, such as better partition control, in-memory process, and cache management iterative operations, while providing high utility with privacy. These are achieved by proposing algorithms through Spark’s Resilient Distributed Dataset (RDD). Our implemented at two main processing RDD transformations, FlatMapRDD ReduceByKeyRDD, respectively. experimental results show that proposed approach provides required privacy levels scalability high-performance essential to many large datasets.

参考文章(15)
Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz, Challenges for MapReduce in Big Data world congress on services. pp. 182- 189 ,(2014) , 10.1109/SERVICES.2014.41
Xuyun Zhang, Chang Liu, Surya Nepal, Chi Yang, Wanchun Dou, Jinjun Chen, Combining Top-Down and Bottom-Up: Scalable Sub-tree Anonymization over Big Data Using MapReduce on Cloud 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. pp. 501- 508 ,(2013) , 10.1109/TRUSTCOM.2013.235
LATANYA SWEENEY, Achieving k -anonymity privacy protection using generalization and suppression International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. ,vol. 10, pp. 571- 588 ,(2002) , 10.1142/S021848850200165X
K. LeFevre, D.J. DeWitt, R. Ramakrishnan, Mondrian Multidimensional K-Anonymity international conference on data engineering. pp. 25- 25 ,(2006) , 10.1109/ICDE.2006.101
LATANYA SWEENEY, k -anonymity: a model for protecting privacy International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. ,vol. 10, pp. 557- 570 ,(2002) , 10.1142/S0218488502001648
Jeffrey Dean, Sanjay Ghemawat, MapReduce Communications of the ACM. ,vol. 51, pp. 107- 113 ,(2008) , 10.1145/1327452.1327492
Scott Shenker, Matei Zaharia, Ion Stoica, Mosharaf Chowdhury, Michael J. Franklin, Spark: cluster computing with working sets ieee international conference on cloud computing technology and science. pp. 10- 10 ,(2010)
Juwei Shi, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald, Fatma Özcan, Clash of the titans Proceedings of the VLDB Endowment. ,vol. 8, pp. 2110- 2121 ,(2015) , 10.14778/2831360.2831365
Vanessa Ayala-Rivera, Liam Murphy, Patrick McDonagh, Thomas Cerqueus, A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners Transactions on Data Privacy. ,vol. 7, pp. 337- 370 ,(2014)
Mohammed Al-Zobbi, Seyed Shahrestani, Chun Ruan, Sensitivity-Based Anonymization of Big Data 2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops). pp. 58- 64 ,(2016) , 10.1109/LCN.2016.029