作者: Stephen I. Gallant
DOI:
关键词: Context (language use) 、 Feature (machine learning) 、 Computer science 、 Word stem 、 Word (computer architecture) 、 Document retrieval 、 Centroid 、 k-nearest neighbors algorithm 、 Base (topology) 、 Artificial intelligence 、 Pattern recognition
摘要: A method for storing and searching documents also useful in disambiguating word senses a generating dictionary of context vectors. The vectors provides vector each stem the dictionary. is fixed length list component values corresponding to word-based features, being an approximate measure conceptual relationship between feature. Documents are stored by combining words remaining document after uninteresting removed. summary obtained adding all normalized. normalized document. data base searched using query identifying whose closest that vector. can be cluster trees according centroid consistent algorithm accelerate process. Said process gives efficient way finding nearest neighbor high-dimensional spaces.