作者: Vera Lúcia Strube de Lima , Silvia Moraes , Susana Azeredo
DOI:
关键词:
摘要: A frequent problem in automatic categorization applications involving Portuguese language is the absence of large corpora previously classified documents, which permit validation experiments carried out. Generally, available are not or, when they are, contain a very reduced number documents. The general goal this study to contribute development aim at text for Brazilian Portuguese. Specifically, we point out that keywords selection associated with neural networks can improve results texts. corpus composed 30 thousand texts from Folha de Sao Paulo newspaper, organized 29 sections. In process categorization, k-Nearest Neighbor (k-NN) algorithm and Multilayer Perceptron trained backpropagation used. It also part our test identification parting log-likelihood statistical measure use them as features process. clearly show precision better using than k-NN.