作者: Masaki Murata , Toshiyuki Kanamaru , Tamotsu Shirado , Hitoshi Isahara
DOI: 10.5715/JNLP.14.163
关键词:
摘要: Patent processing is important in various fields such as industry, business, and law. We used F-terms (Schellner 2002) to classify patent documents using the k-nearest neighborhood method. Because F-term categories are fine-grained, they useful when we documents. clarified following three points experiments: i) which variations of method best for classification, ii) methods calculating similarity iii) from regions a terms should be extracted. In our experiments, data categorization task NTCIR-5 Workshop (NTCIR committee 2005; Iwayama, Fujii, Kando 2005). found that adding scores k extracted was most effective among this study. also SMART (Singhal, Buckley, Mitra 1996; Singhal, Choi, Hindle, Pereira 1997), known information retrieval, similarity. Finally, extracting terms, abstract claim together all combinations abstract, claim, description regions. The results were confirmed statistical test. Moreover, experimented with changing amount training obtained better performance more data, limited provided Workshop.