SparkDA: RDD-Based High-Performance Data Anonymization Technique for Spark Platform.

作者： Sibghat Ullah Bazai , Julian Jang-Jaccard

关键词:

摘要: Recent proposals in data anonymization have mostly been focused around MapReduce, though the advantages of Spark well documented. To address this concern, we propose a new novel technique for Apache Spark. SparkDA, our proposal, takes full innovative features, such as better partition control, in-memory process, and cache management iterative operations, while providing high utility with privacy. These are achieved by proposing algorithms through Spark’s Resilient Distributed Dataset (RDD). Our implemented at two main processing RDD transformations, FlatMapRDD ReduceByKeyRDD, respectively. experimental results show that proposed approach provides required privacy levels scalability high-performance essential to many large datasets.

uni-trier.de 本地加速

springer.com 本地加速

sci-hub.st HTML 下载加速

参考文章(15)

Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz, Challenges for MapReduce in Big Data world congress on services. pp. 182- 189 ,(2014) , 10.1109/SERVICES.2014.41

Xuyun Zhang, Chang Liu, Surya Nepal, Chi Yang, Wanchun Dou, Jinjun Chen, Combining Top-Down and Bottom-Up: Scalable Sub-tree Anonymization over Big Data Using MapReduce on Cloud 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. pp. 501- 508 ,(2013) , 10.1109/TRUSTCOM.2013.235

LATANYA SWEENEY, Achieving k -anonymity privacy protection using generalization and suppression International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. ,vol. 10, pp. 571- 588 ,(2002) , 10.1142/S021848850200165X

K. LeFevre, D.J. DeWitt, R. Ramakrishnan, Mondrian Multidimensional K-Anonymity international conference on data engineering. pp. 25- 25 ,(2006) , 10.1109/ICDE.2006.101

LATANYA SWEENEY, k -anonymity: a model for protecting privacy International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. ,vol. 10, pp. 557- 570 ,(2002) , 10.1142/S0218488502001648

Jeffrey Dean, Sanjay Ghemawat, MapReduce Communications of the ACM. ,vol. 51, pp. 107- 113 ,(2008) , 10.1145/1327452.1327492

Scott Shenker, Matei Zaharia, Ion Stoica, Mosharaf Chowdhury, Michael J. Franklin, Spark: cluster computing with working sets ieee international conference on cloud computing technology and science. pp. 10- 10 ,(2010)

Juwei Shi, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald, Fatma Özcan, Clash of the titans Proceedings of the VLDB Endowment. ,vol. 8, pp. 2110- 2121 ,(2015) , 10.14778/2831360.2831365

Vanessa Ayala-Rivera, Liam Murphy, Patrick McDonagh, Thomas Cerqueus, A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners Transactions on Data Privacy. ,vol. 7, pp. 337- 370 ,(2014)

10.

Mohammed Al-Zobbi, Seyed Shahrestani, Chun Ruan, Sensitivity-Based Anonymization of Big Data 2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops). pp. 58- 64 ,(2016) , 10.1109/LCN.2016.029

SparkDA: RDD-Based High-Performance Data Anonymization Technique for Spark Platform.

来源期刊

我的账户

SparkDA: RDD-Based High-Performance Data Anonymization Technique for Spark Platform.

来源期刊

相似文章 2

In-Memory Data Anonymization Using Scalable and High Performance RDD Design

Scalable, High-Performance, and Generalized Subtree Data Anonymization Approach for Apache Spark

我的账户