Keywords, k-NN and Neural Networks: a Support for Hierarchical Categorization of Texts in Brazilian Portuguese

作者: Vera Lúcia Strube de Lima , Silvia Moraes , Susana Azeredo

DOI:

关键词:

摘要: A frequent problem in automatic categorization applications involving Portuguese language is the absence of large corpora previously classified documents, which permit validation experiments carried out. Generally, available are not or, when they are, contain a very reduced number documents. The general goal this study to contribute development aim at text for Brazilian Portuguese. Specifically, we point out that keywords selection associated with neural networks can improve results texts. corpus composed 30 thousand texts from Folha de Sao Paulo newspaper, organized 29 sections. In process categorization, k-Nearest Neighbor (k-NN) algorithm and Multilayer Perceptron trained backpropagation used. It also part our test identification parting log-likelihood statistical measure use them as features process. clearly show precision better using than k-NN.

参考文章(12)
Bernard Comrie, R. E. Asher, J. M. Y. Simpson, The Encyclopedia of Language and Linguistics Language. ,vol. 71, pp. 146- ,(1995) , 10.2307/415969
Marco Antonio Insaurriaga Gonzalez, Vera Lúcia Strube de Lima, José Valdeni de Lima, Tools for Nominalization: An Alternative for Lexical Normalization Lecture Notes in Computer Science. pp. 100- 109 ,(2006) , 10.1007/11751984_11
Jesús Vilares, Fco. Mario Barcala, Miguel A. Alonso, Using Syntactic Dependency-Pairs Conflation to Improve Retrieval Performance in Spanish international conference on computational linguistics. pp. 381- 390 ,(2002) , 10.1007/3-540-45715-1_40
Miguel E. Ruiz, Padmini Srinivasan, Hierarchical Text Categorization Using Neural Networks Information Retrieval. ,vol. 5, pp. 87- 118 ,(2002) , 10.1023/A:1012782908347
F. Sebastiani, Classification of Text, Automatic Encyclopedia of Language & Linguistics (Second Edition). pp. 457- 462 ,(2006) , 10.1016/B0-08-044854-2/00964-0
Alberto Lavelli, Fabrizio Sebastiani, Roberto Zanoli, Distributional term representations: an experimental comparison conference on information and knowledge management. pp. 615- 624 ,(2004) , 10.1145/1031171.1031284
Yiming Yang, Xin Liu, A re-examination of text categorization methods international acm sigir conference on research and development in information retrieval. pp. 42- 49 ,(1999) , 10.1145/312624.312647
Paul Rayson, Roger Garside, Comparing corpora using frequency profiling Proceedings of the workshop on Comparing corpora -. pp. 1- 6 ,(2000) , 10.3115/1117729.1117730
Aixin Sun, Ee-Peng Lim, Hierarchical text classification and evaluation international conference on data mining. pp. 521- 528 ,(2001) , 10.1109/ICDM.2001.989560