A density-based indexing method for efficient execution of high-dimensional nearest-neighbor queries on large databases

作者: Dan Geiger , Usama Fayyad , Kristin P. Bennett

DOI:

关键词:

摘要: Method and apparatus for efficiently performing nearest neighbor queries on a database of records wherein each record has large number attributes by automatically extracting multidimensional index from the data. The method is based first obtaining statistical model content data in form probability density function. This then used to decide how should be reorganized disk efficient queries. At query time, decides order which scanned. It also provides means evaluating correctness answer found so far partial scan determined model. In this invention clustering process performed produce multiple clusters. Each cluster characterized set clusters represent function mixture A new built having an augmented format that contains original additional attribute containing step. uses augmenting accomplished record's with respect cluster. Once are build as into analysis can very conducted using indexed look up process. As queried, determine or pages when scanning stop because been high probability.

参考文章(26)
Raghu Ramakrishnan, Tian Zhang, Miron Livny, Method and system for data clustering for very large databases ,(1996)
S. Shimoji, S. Lee, Data clustering with entropical scheduling world congress on computational intelligence. ,vol. 4, pp. 2423- 2428 ,(1994) , 10.1109/ICNN.1994.374600
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft, When Is ''Nearest Neighbor'' Meaningful? international conference on database theory. pp. 217- 235 ,(1999) , 10.1007/3-540-49257-7_15
Jon M. Kleinberg, Two algorithms for nearest-neighbor search in high dimensions symposium on the theory of computing. pp. 599- 608 ,(1997) , 10.1145/258533.258653
Stefan Berchtold, Daniel A. Keim, High-dimensional index structures database support for next decade's applications (tutorial) Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 501- ,(1998) , 10.1145/276304.276353
King-Ip Lin, H. V. Jagadish, Christos Faloutsos, The TV-tree: an index structure for high-dimensional data very large data bases. ,vol. 3, pp. 517- 542 ,(1994) , 10.1007/BF01231606