E ATOL퓟 – a tool for domain relevant terms extraction

作者: Renata Vieira , Daniel Martins , Lucelene Lopes , Guilherme Fedrizzi , Paulo Fernandes

DOI:

关键词:

摘要: This paper presents a software tool to extract relevant terms from Portuguese texts. E ATOLP extracts the most frequent noun phrases in an annotated corpus. The annotation is provided by PALAVRAS parser. offers different options improve quality of extraction that goes post-treatment parser application linguistic and statistical criteria. also some additional features compare extracted with reference lists, compute efficiency numerical indexes search for Term corpora usually basis many Natural Language Processing (NLP) task such as automatic glossary construction [7], text categorization [4] even ontology learning [3]. extraction, other NLP applications, can benefit both approaches, combination these two approaches often better results than each separately. [6] thus uses select domain significant From point view, based on syntactic performed [2]. candidate are according extra set discard transformation rules. those subject frequency analysis, i.e., order more ones. Figure 1(a) graphically architecture. basic input .xml files which texts process consider rules to, respectively, may be unwanted, e.g., numerals, or adapt purpose remove articles. user chose sets applied. 1(b) upper screenshot interface where choose all options. Once extracted, their frequencies corpus computed. Then, choice, selected criteria, keeping only 10%

参考文章(23)
Silvia Bernardini, Marco Baroni, BootCaT: Bootstrapping corpora and terms from the web language resources and evaluation. pp. 1313- 1316 ,(2004)
Renata Vieira, Paulo Fernandes, Lucelene Lopes, Guilherme Fedrizzi, EχATOLP – An Automatic Tool for Term Extraction from Portuguese Language Corpora ,(2009)
Roberto Navigli, Paola Velardi, GlossExtractor: A Web Application to Automatically Create a Domain Glossary congress of the italian association for artificial intelligence. pp. 339- 349 ,(2007) , 10.1007/978-3-540-74782-6_30
Christopher Wirbelauer, Christian Scholz, Hans Hoerauf, Duy Thoai Pham, Horst Laqua, Reginald Birngruber, Noncontact corneal pachymetry with slit lamp-adapted optical coherence tomography. American Journal of Ophthalmology. ,vol. 133, pp. 444- 450 ,(2002) , 10.1016/S0002-9394(01)01425-8
Christopher Wirbelauer, Henning Aurich, Jan Jaroszewski, Christian Hartmann, Duy Thoai Pham, Experimental evaluation of online optical coherence pachymetry for corneal refractive surgery. Graefes Archive for Clinical and Experimental Ophthalmology. ,vol. 242, pp. 24- 30 ,(2004) , 10.1007/S00417-003-0700-2
Jay W. McLaren, Cherie B. Nau, Jay C. Erie, William M. Bourne, Corneal thickness measurement by confocal microscopy, ultrasound, and scanning slit methods. American Journal of Ophthalmology. ,vol. 137, pp. 1011- 1020 ,(2004) , 10.1016/J.AJO.2004.01.049
Alberto Lavelli, Fabrizio Sebastiani, Roberto Zanoli, Distributional term representations: an experimental comparison conference on information and knowledge management. pp. 615- 624 ,(2004) , 10.1145/1031171.1031284
Lijuan Cai, Thomas Hofmann, Hierarchical document categorization with support vector machines conference on information and knowledge management. pp. 78- 87 ,(2004) , 10.1145/1031171.1031186
J. Martin Bland, DouglasG. Altman, Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet. ,vol. 327, pp. 307- 310 ,(1986) , 10.1016/S0140-6736(86)90837-8