作者: Menezes Guilherme , Reza Abdullah
DOI:
关键词:
摘要: Clustering files in deduplication systems is based on an estimate of similarity between a file system. The estimates are how much content the share, where shared segments shared. segment offsets found files' bitmap vectors offsets. used to generate cluster definition approximating optimal data structure for clustering that share content. approximated defines clusters hierarchically arranged offset numbers