Criação de um ambiente para o processamento de córpus de Português Histórico

作者: Arnaldo Candido Junior

DOI: 10.11606/D.55.2008.TDE-21052008-103237

关键词:

摘要: Corpora have been increasingly used within the areas of Linguistics and Natural Language Processing. As a result, new larger corpora compiled processing systems standards for encoding interchange electronic texts developed. However, when it comes to compilation historical corpora, methodology is different from ones compile contemporary language. Another drawback fact that most corpus provide few resources treatment although there are numerous this type. Similarly, dictionary creation do not satisfactorily meet needs dictionaries. The present study part project – Historical Dictionary Brazilian Portuguese (HDBP) which aims on basis sixteenth through eighteenth centuries (including some early nineteenth century). Here, we challenges HDPB establish criteria creating entries dictionary. This has developed computational environment corpus, building glossaries as well HDPB. system can be easily adapted scope other projects.

参考文章(7)
Izabella dos Santos Martins, Lingüística de Corpus DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada. ,vol. 23, pp. 383- 393 ,(2007) , 10.1590/S0102-44502007000200009
S. ATKINS, Corpus Design Criteria Literary and Linguistic Computing. ,vol. 7, pp. 1- 16 ,(1992) , 10.1093/LLC/7.1.1
Martin Wynne, Developing Linguistic Corpora: a Guide to Good Practice Oxbow Books on behalf of the Arts and Humanities Data Service , Available direct [in the U.S.] from David Brown Book Company. ,(2005)
T. Pilz, A. Ernst-Gerlach, S. Kempken, P. Rayson, D. Archer, The identification of spelling variants in English and German historical texts: manual or automatic? Literary and Linguistic Computing. ,vol. 23, pp. 65- 72 ,(2007) , 10.1093/LLC/FQM044
Gladis Almeida, Sandra Aluísio, Arnaldo Candido, Lívia Cucatto, Oto Vale, Abner Batista, Clarissa Bengtson, Maria C. Parreira, Marcelo Muniz, Maria Tereza Biderman, Building a large dictionary of abbreviations for named entity recognition in Portuguese historical corpora language resources and evaluation. ,(2008)