作者: Neelam Memon , Jianhua Shao
DOI: 10.1007/978-3-319-20810-7_1
关键词:
摘要: Privacy is a concern when publishing transaction data for applications such as marketing research and biomedical studies. While methods anonymizing exist, they are designed to run on single machine, hence not scalable large datasets. Recently, MapReduce has emerged highly platform data-intensive applications. In the paper, we consider how may be used provide scalability in anonymization. More specifically, RBAT parallelized using MapReduce. sequential method that some desirable features anonymization, but its iterative nature makes parallelization challenging. A direct implementation of partitioning alone can result significant overhead, which offset gains from parallel processing. We propose MR-RBAT employs two parameters control overhead. Our experimental results show scale linearly datasets retain good utility.