MR-RBAT: Anonymizing Large Transaction Datasets Using MapReduce

作者: Neelam Memon , Jianhua Shao

DOI: 10.1007/978-3-319-20810-7_1

关键词:

摘要: Privacy is a concern when publishing transaction data for applications such as marketing research and biomedical studies. While methods anonymizing exist, they are designed to run on single machine, hence not scalable large datasets. Recently, MapReduce has emerged highly platform data-intensive applications. In the paper, we consider how may be used provide scalability in anonymization. More specifically, RBAT parallelized using MapReduce. sequential method that some desirable features anonymization, but its iterative nature makes parallelization challenging. A direct implementation of partitioning alone can result significant overhead, which offset gains from parallel processing. We propose MR-RBAT employs two parameters control overhead. Our experimental results show scale linearly datasets retain good utility.

参考文章(22)
Grigorios Loukides, Aris Gkoulalas-Divanis, Jianhua Shao, Anonymizing transaction data to eliminate sensitive inferences database and expert systems applications. pp. 400- 415 ,(2010) , 10.1007/978-3-642-15364-8_34
Vitaly Shmatikov, Arvind Narayanan, How To Break Anonymity of the Netflix Prize Dataset arXiv: Cryptography and Security. ,(2006)
Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, Geoffrey Fox, Twister: a runtime for iterative MapReduce high performance distributed computing. pp. 810- 818 ,(2010) , 10.1145/1851476.1851593
Yabo Xu, Ke Wang, Ada Wai-Chee Fu, Philip S Yu, None, Anonymizing transaction databases for publication Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 767- 775 ,(2008) , 10.1145/1401890.1401982
Loren D. Nelson, Data, data everywhere. Critical Care Medicine. ,vol. 25, pp. 1265- ,(1997) , 10.1097/00003246-199708000-00004
Jianneng Cao, Panagiotis Karras, Chedy Raïssi, Kian-Lee Tan, ρ-uncertainty Proceedings of the VLDB Endowment. ,vol. 3, pp. 1033- 1044 ,(2010) , 10.14778/1920841.1920971
Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, HaLoop Proceedings of the VLDB Endowment. ,vol. 3, pp. 285- 296 ,(2010) , 10.14778/1920841.1920881
Zijian Zheng, Ron Kohavi, Llew Mason, Real world performance of association rule algorithms knowledge discovery and data mining. pp. 401- 406 ,(2001) , 10.1145/502512.502572
Robson Leonardo Ferreira Cordeiro, Caetano Traina, Agma Juci Machado Traina, Julio López, U. Kang, Christos Faloutsos, Clustering very large multi-dimensional datasets with MapReduce Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 690- 698 ,(2011) , 10.1145/2020408.2020516
Xuyun Zhang, Chang Liu, Surya Nepal, Chi Yang, Wanchun Dou, Jinjun Chen, Combining Top-Down and Bottom-Up: Scalable Sub-tree Anonymization over Big Data Using MapReduce on Cloud 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. pp. 501- 508 ,(2013) , 10.1109/TRUSTCOM.2013.235