PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization

作者: Yingfan Liu , Hong Cheng , Jiangtao Cui

DOI: 10.1145/3132847.3132901

关键词:

摘要: Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role many multimedia applications. Recently, product quantization (PQ) based methods for ANN have attracted enormous attention the community of computer vision, due to its good balance between accuracy and requirement. PQ embed a vector into short binary code (called code), squared Euclidean distance is estimated by asymmetric quantizer (AQD) with pretty high precision. Thus, original can be converted similarity on AQD using approach. All existing are in-memory solutions, which may not handle massive data if they cannot fit entirely memory. In this paper, we propose I/O-efficient solution search. We design index called PQB+-forest support efficient AQD. first creates number partitions codes coarse then builds B+-tree, PQB+-tree, each partition. The process greatly expedited focusing few selected that closest query, as well pruning power PQB+-trees. According experiments conducted two large-scale sets containing up 1 billion vectors, our method outperforms competitors, including state-of-the-art LSH

参考文章(28)
Yingfan Liu, Jiangtao Cui, Zi Huang, Hui Li, Heng Tao Shen, SK-LSH Proceedings of the VLDB Endowment. ,vol. 7, pp. 745- 756 ,(2014) , 10.14778/2732939.2732947
Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, Xuemin Lin, SRS Proceedings of the VLDB Endowment. ,vol. 8, pp. 1- 12 ,(2014) , 10.14778/2735461.2735462
Hans-Jörg Schek, Stephen Blott, Roger Weber, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces very large data bases. pp. 194- 205 ,(1998)
Heng Tao Shen, Jingdong Wang, Jingkuan Song, Jianqiu Ji, Hashing for Similarity Search: A Survey arXiv: Data Structures and Algorithms. ,(2014)
Artem Babenko, Victor Lempitsky, Tree quantization for large-scale similarity search and classification computer vision and pattern recognition. pp. 4240- 4248 ,(2015) , 10.1109/CVPR.2015.7299052
Eduardo Valle, Matthieu Cord, Sylvie Philipp-Foliguet, High-dimensional descriptor indexing for large multimedia databases Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08. pp. 739- 748 ,(2008) , 10.1145/1458082.1458181
Jinyang Gao, H.V. Jagadish, Beng Chin Ooi, Sheng Wang, Selective Hashing: Closing the Gap between Radius Search and k-NN Search knowledge discovery and data mining. pp. 349- 358 ,(2015) , 10.1145/2783258.2783284
A. Babenko, V. Lempitsky, The inverted multi-index computer vision and pattern recognition. pp. 3069- 3076 ,(2012) , 10.1109/CVPR.2012.6248038
Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li, Optimized Cartesian K-Means IEEE Transactions on Knowledge and Data Engineering. ,vol. 27, pp. 180- 192 ,(2015) , 10.1109/TKDE.2014.2324592
Rina Panigrahy, Entropy based nearest neighbor search in high dimensions symposium on discrete algorithms. pp. 1186- 1195 ,(2006) , 10.5555/1109557.1109688