Processing Incomplete k Nearest Neighbor Search

作者: Xiaoye Miao , Yunjun Gao , Gang Chen , Baihua Zheng , Huiyong Cui

DOI: 10.1109/TFUZZ.2016.2516562

关键词:

摘要: Given a set S of multidimensional objects and query object q , k nearest neighbor ( NN) finds from the closest to . This is fundamental problem in database, data mining, information retrieval research. It plays an important role wide spectrum real applications such as image recognition location-based services. However, due failure transmission devices, improper storage, accidental loss, incomplete exist widely those applications, where some dimensional values items are missing In this paper, we systematically study (I search which aims at NN for data. We formalize propose efficient lattice partition algorithm using our newly developed $L\alpha B$ index support exact I retrieval, with help two pruning heuristics, i.e., $\alpha $ value partial distance Furthermore, approximate algorithm, namely histogram improved efficiency guaranteed error bound. Extensive experiments both synthetic datasets demonstrate effectiveness designed indexes well performance presented algorithms under variety experimental settings.

参考文章(55)
John W. Graham, Missing Data: Analysis and Design ,(2012)
Hans-Peter Kriegel, Peter Kunath, Matthias Renz, Probabilistic Nearest-Neighbor Query on Uncertain Objects Advances in Databases: Concepts, Systems and Applications. pp. 337- 348 ,(2007) , 10.1007/978-3-540-71703-4_30
C. Wohlin, P. Jonsson, An evaluation of k-nearest neighbour imputation using Likert data ieee international software metrics symposium. pp. 108- 118 ,(2004) , 10.1109/METRICS.2004.10
Dursun Delen, David L. Olson, Advanced Data Mining Techniques ,(2008)
Beng Chin Ooi, Kian-Lee Tan, Cheng Hian Goh, Fast High-Dimensional Data Search in Incomplete Databases very large data bases. pp. 357- 367 ,(1998)
Ron Meyden, Logical Approaches to Incomplete Information: A Survey Logics for Databases and Information Systems. pp. 307- 356 ,(1998) , 10.1007/978-1-4615-5643-5_10
Apostolos Papadopoulos, Yannis Manolopoulos, Performance of Nearest Neighbor Queries in R-Trees international conference on database theory. pp. 394- 408 ,(1997) , 10.1007/3-540-62222-5_59
Guadalupe Canahuate, Michael Gibas, Hakan Ferhatosmanoglu, Indexing Incomplete Databases Lecture Notes in Computer Science. pp. 884- 901 ,(2006) , 10.1007/11687238_52
Gísli R. Hjaltason, Hanan Samet, Ranking in Spatial Databases SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases. pp. 83- 95 ,(1995) , 10.1007/3-540-60159-7_6
Doug Burdick, Prasad M. Deshpande, T. S. Jayram, Raghu Ramakrishnan, Shivakumar Vaithyanathan, OLAP over uncertain and imprecise data very large data bases. ,vol. 16, pp. 123- 144 ,(2007) , 10.1007/S00778-006-0033-Y