Using index partitioning and reconciliation for data deduplication

作者: Sudipta Sengupta , James Robert Benton , Ronakkumar N. Desai , Paul Adrian Oltean , Ran Kalach

DOI:

关键词: Data miningInformation retrievalData typeData deduplicationComputer scienceLinear subspaceCacheIndex (publishing)File formatHash functionSubspace topology

摘要: The subject disclosure is directed towards a data deduplication technology in which hash index service's partitioned into subspace indexes, with less than the entire cached to save memory. accessed determine whether chunk already exists or needs be indexed and stored. may divided subspaces based on criteria associated index, such as file type, time of last usage, so on. Also described reconciliation, duplicate entries are detected remove chunks from system. Subspace reconciliation performed at off-peak time, when more system resources available, interrupted if needed. Subspaces reconcile similarity, including via similarity signatures that each compactly represents subspace's hashes.

参考文章(173)
Bing Pan, Daniel R. Fesenmaier, Online information search: vacation planning process. Annals of Tourism Research. ,vol. 33, pp. 809- 832 ,(2006) , 10.1016/J.ANNALS.2006.03.006
June Clarke, Ronald D. Hodson, System and method for managing a travel itinerary ,(2009)
Kave Eshghi, Mark Lillibridge, Deepavali Bhagwat, Peter Camble, Vinay Deolalikar, Greg Trezise, Sparse indexing: large scale, inline deduplication using sampling and locality file and storage technologies. pp. 111- 123 ,(2009)
Wendy Belluomini, Biplob Debnath, Binny S. Gill, Michael Ko, STOW: a spatially and temporally optimized write caching algorithm usenix annual technical conference. pp. 26- 26 ,(2009)
Sameh Elnikety, Dushyanth Narayanan, Austin Donnelly, Antony Rowstron, Eno Thereska, Everest: scaling down peak loads through I/O off-loading operating systems design and implementation. pp. 15- 28 ,(2008) , 10.5555/1855741.1855743
William K. Hollis, Compression Using Hashes ,(2008)
Irfan Ahmad, Austin T. Clements, Murali Vilayannur, Jinyuan Li, Decentralized deduplication in SAN cluster file systems usenix annual technical conference. pp. 8- 8 ,(2009)
Kai Li, Hugo Patterson, Benjamin Zhu, Avoiding the disk bottleneck in the data domain deduplication file system file and storage technologies. pp. 18- ,(2008)
James E. Corter, Barbara Tversky, Doris Zahner, Jeffrey V. Nickerson, Yun Jin Rho, Lixiu Yu, MATCHING MECHANISMS TO SITUATIONS THROUGH THE WISDOM OF THE CROWD international conference on information systems. pp. 41- ,(2009)