Document similarity detection and classification system

作者: Jeffrey Glass , Elizabeth Derr

DOI:

关键词:

摘要: A document similarity detection and classification system is presented. The employs a case-based method of classifying electronically distributed documents in which content chunks an unclassified are compared to the sets comprising each set previously classified sample order determine highest level resemblance between any documents. have been manually reviewed annotated distinguish classifications significant from insignificant chunks. These annotations used comparison process. If exceeding predetermined threshold detected, most significantly resembling assigned document. Sample may be acquired build maintain repository by detecting that similar other subjecting at least some manual review In preferred embodiment invention classify email messages support message filtering or objective.

参考文章(29)
David Mun-Hien Choy, Federico Nmn Barbic, Text search system ,(1988)
David Kopans, William B. Mccormick, George Stojanoff, E-mail filter and method thereof ,(1998)
Niamh C. Scannell, Anthony J. Redmond, Serge Himbaut, Stuart D. Dawson, Pascale Bares, Alison Clark, Method and system for sorting and prioritizing electronic mail messages ,(1993)
Fujiki Fujii, Masaki Hori, Manabu Sakaguchi, Akira Sawada, Electronic mail determination method and system and storage medium ,(1998)