作者: Sudipta Sengupta , James Robert Benton , Ronakkumar N. Desai , Paul Adrian Oltean , Ran Kalach
DOI:
关键词: Data mining 、 Information retrieval 、 Data type 、 Data deduplication 、 Computer science 、 Linear subspace 、 Cache 、 Index (publishing) 、 File format 、 Hash function 、 Subspace topology
摘要: The subject disclosure is directed towards a data deduplication technology in which hash index service's partitioned into subspace indexes, with less than the entire cached to save memory. accessed determine whether chunk already exists or needs be indexed and stored. may divided subspaces based on criteria associated index, such as file type, time of last usage, so on. Also described reconciliation, duplicate entries are detected remove chunks from system. Subspace reconciliation performed at off-peak time, when more system resources available, interrupted if needed. Subspaces reconcile similarity, including via similarity signatures that each compactly represents subspace's hashes.