作者: Lawrence L. You , Kristal T. Pollack , Darrell D. E. Long , K. Gopinath
关键词:
摘要: The ever-increasing volume of archival data that needs to be reliably retained for long periods time and the decreasing costs disk storage, memory, processing have motivated design low-cost, high-efficiency disk-based storage systems. However, managed is still expensive. To further lower cost, redundancy can eliminated with use interfile intrafile compression. it not clear what optimal strategy compressing is, given diverse collections data.To create a scalable system efficiently stores data, we present PRESIDIO, framework selects from different space-reduction efficent methods (ESMs) detect similarity reduce or eliminate when storing objects. In addition, uses virtualized content addressable store (VCAS) hides user complexity knowing which space-efficient techniques are used, including chunk-based deduplication delta Storing retrieving objects polymorphic operations independent their content-based address. A new technique, harmonic super-fingerprinting, also used obtaining successively more accurate (but costly) measures identify existing in very large set most similar an incoming object.The PRESIDIO design, reported earlier, had comprehensively introduced first notion deduplication, now being offered as service systems by major vendors. As aid such systems, evaluate various parameters affect efficiency using empirical data.