作者: Leah S. Larkey
DOI:
关键词: Information retrieval 、 Structure (mathematical logic) 、 Factor (programming language) 、 Classifier (linguistics) 、 Automation 、 Computer science 、 Document Structure Description 、 Bayesian probability 、 Text categorization 、 Representation (mathematics)
摘要: The classification of U.S. patents poses some special problems due to the enormous size corpus, and complex hierarchical structure system, patent documents. representation documents has not received a great deal previous attention, but we have found it be an important factor in our work. We are exploring ways use this relations among subclasses facilitate patents. Our approach is derive vector terms phrases from most parts represent each document. both k-nearest-neighbor classifiers Bayesian classifiers. classifier allows us document using query operators Inquery information retrieval system. can select closely related negative examples train more discriminating