作者: Dimitrios Karapiperis , Vassilios S. Verykios
关键词:
摘要: In this paper, we propose a novel method for distributing the distance computations of record pairs generated by blocking mechanism to reduce tasks Map/Reduce system. The proposed solutions in literature analyze blocks and then construct profile, which contains number each block. However, deterministic process, including all its variants, might incur considerable overhead given massive data sets. contrast, our utilizes two jobs where first job formulates while second distributes these tasks, perform computations, using repetitive allocation rounds. such round, utilize available on random basis generating permutations their indexes. A series experiments demonstrate an almost-equal distribution pairs, or equivalently makes simple, yet efficient, solution applying