作者: Dan Geiger , Usama Fayyad , Kristin P. Bennett
DOI:
关键词:
摘要: Method and apparatus for efficiently performing nearest neighbor queries on a database of records wherein each record has large number attributes by automatically extracting multidimensional index from the data. The method is based first obtaining statistical model content data in form probability density function. This then used to decide how should be reorganized disk efficient queries. At query time, decides order which scanned. It also provides means evaluating correctness answer found so far partial scan determined model. In this invention clustering process performed produce multiple clusters. Each cluster characterized set clusters represent function mixture A new built having an augmented format that contains original additional attribute containing step. uses augmenting accomplished record's with respect cluster. Once are build as into analysis can very conducted using indexed look up process. As queried, determine or pages when scanning stop because been high probability.