Automatic F-term Classification of Japanese Patent Documents Using the k-Nearest Neighborhood Method and the SMART Weighting

作者: Masaki Murata , Toshiyuki Kanamaru , Tamotsu Shirado , Hitoshi Isahara

DOI: 10.5715/JNLP.14.163

关键词:

摘要: Patent processing is important in various fields such as industry, business, and law. We used F-terms (Schellner 2002) to classify patent documents using the k-nearest neighborhood method. Because F-term categories are fine-grained, they useful when we documents. clarified following three points experiments: i) which variations of method best for classification, ii) methods calculating similarity iii) from regions a terms should be extracted. In our experiments, data categorization task NTCIR-5 Workshop (NTCIR committee 2005; Iwayama, Fujii, Kando 2005). found that adding scores k extracted was most effective among this study. also SMART (Singhal, Buckley, Mitra 1996; Singhal, Choi, Hindle, Pereira 1997), known information retrieval, similarity. Finally, extracting terms, abstract claim together all combinations abstract, claim, description regions. The results were confirmed statistical test. Moreover, experimented with changing amount training obtained better performance more data, limited provided Workshop.

参考文章(14)
Makoto Iwayama, Noriko Kando, Atsushi Fujii, Overview of Classification Subtask at NTCIR-5 Patent Retrieval Task. NTCIR. ,(2005)
Seishi Okamoto, Ken Satoh, An Average-Case Analysis of k-Nearest Neighbor Classifier international conference on case based reasoning. pp. 253- 264 ,(1995) , 10.1007/3-540-60598-3_23
Gongde Guo, Hui Wang, David Bell, Yaxin Bi, Kieran Greer, An kNN Model-based Approach and its Application in Text Categorization conference on intelligent text processing and computational linguistics. pp. 559- 570 ,(2004) , 10.1007/978-3-540-24630-5_69
Amit Singhal, Chris Buckley, Manclar Mitra, Pivoted document length normalization international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 21- 29 ,(1996) , 10.1145/3130348.3130365
Yiming Yang, Xin Liu, A re-examination of text categorization methods international acm sigir conference on research and development in information retrieval. pp. 42- 49 ,(1999) , 10.1145/312624.312647
C. J. Fall, A. Törcsvári, K. Benzineb, G. Karetka, Automated categorization in the international patent classification international acm sigir conference on research and development in information retrieval. ,vol. 37, pp. 10- 25 ,(2003) , 10.1145/945546.945547
Makoto Iwayama, Atsushi Fujii, Noriko Kando, Yozo Marukawa, Evaluating patent retrieval in the third NTCIR workshop formal methods. ,vol. 42, pp. 207- 221 ,(2006) , 10.1016/J.IPM.2004.08.012
Leah S. Larkey, A patent search and classification system acm international conference on digital libraries. pp. 179- 187 ,(1999) , 10.1145/313238.313304
Irene Schellner, Japanese File Index classification and F-terms World Patent Information. ,vol. 24, pp. 197- 201 ,(2002) , 10.1016/S0172-2190(02)00019-4