Neighbor-sensitive hashing

作者: Yongjoo Park , Michael Cafarella , Barzan Mozafari

DOI: 10.14778/2850583.2850589

关键词: Computer scienceData miningProcess (computing)Space (commercial competition)Machine learningJava hashCodeLocality-sensitive hashingHash functionArtificial intelligenceIdentification (information)

摘要: Approximate kNN (k-nearest neighbor) techniques using binary hash functions are among the most commonly used approaches for overcoming prohibitive cost of performing exact queries. However, success these largely depends on their functions' ability to distinguish items; that is, items retrieved based data items' hashcodes, should include as many true possible. A widely-adopted principle this process is ensure similar assigned same hashcode so with hashcodes a query's likely be neighbors.In work, we abandon heavily-utilized and pursue opposite direction generating more effective tasks. That aim increase distance between in space, instead reducing it. Our contribution begins by providing theoretical analysis why revolutionary seemingly counter-intuitive approach leads accurate identification items. followed proposal hashing algorithm embeds novel principle. empirical studies confirm idea significantly improves efficiency accuracy state-of-the-art techniques.

参考文章(57)
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, Ion Stoica, BlinkDB Proceedings of the 8th ACM European Conference on Computer Systems - EuroSys '13. pp. 29- 42 ,(2013) , 10.1145/2465351.2465355
David Arthur, Sergei Vassilvitskii, k-means++: the advantages of careful seeding symposium on discrete algorithms. pp. 1027- 1035 ,(2007) , 10.5555/1283383.1283494
Yunchao Gong, Svetlana Lazebnik, Iterative quantization: A procrustean approach to learning binary codes computer vision and pattern recognition. pp. 817- 824 ,(2011) , 10.1109/CVPR.2011.5995432
Bin Cui, Beng Chin Coi, Jianwen Su, K.-L. Tan, Indexing high-dimensional data for efficient in-memory similarity search IEEE Transactions on Knowledge and Data Engineering. ,vol. 17, pp. 339- 353 ,(2005) , 10.1109/TKDE.2005.46
Kai Zeng, Shi Gao, Barzan Mozafari, Carlo Zaniolo, The analytical bootstrap: a new method for fast error estimation in approximate query processing international conference on management of data. pp. 277- 288 ,(2014) , 10.1145/2588555.2588579
Antonin Guttman, R-trees Proceedings of the 1984 ACM SIGMOD international conference on Management of data - SIGMOD '84. ,vol. 14, pp. 47- 57 ,(1984) , 10.1145/602259.602266
M. Norouzi, A. Punjani, D. J. Fleet, Fast search in Hamming space with multi-index hashing computer vision and pattern recognition. pp. 3108- 3115 ,(2012) , 10.1109/CVPR.2012.6248043
Shrikant Kashyap, Panagiotis Karras, Scalable kNN search on vertically stored time series Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 1334- 1342 ,(2011) , 10.1145/2020408.2020607
H Jégou, M Douze, C Schmid, Product Quantization for Nearest Neighbor Search IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 33, pp. 117- 128 ,(2011) , 10.1109/TPAMI.2010.57
Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden, Pradeep Dubey, Streaming similarity search over one billion tweets using parallel locality-sensitive hashing Proceedings of the VLDB Endowment. ,vol. 6, pp. 1930- 1941 ,(2013) , 10.14778/2556549.2556574