作者: Leah S. Larkey
DOI:
关键词:
摘要: Abstract : The classification of US patents poses some special problems due to the enormous size corpus, and complex hierarchical structure system, patent documents. representation documents has not been a standard area research in text categorization, but we have found it be an important factor our previous work on classifying patient medical records (Larkey Croft, 1996) current patents. Our approach is combine results k-nearest-neighbor classifiers with those Bayesian classifiers. classifier allows us represent document using query operators Inquery information retrieval system. can use relations among subclasses select closely related negative examples train more discriminating