Statistical Parsing of Spanish and Data Driven Lemmatization

作者: Djamé Seddah , Benoit Sagot , Joseph Le Roux

DOI:

关键词: ParsingNatural language processingTreebankBottom-up parsingData-drivenTop-down parsingMorphology (linguistics)Statistical parsingSpeech recognitionLemmatisationComputer scienceS-attributed grammarGrammar inferenceArtificial intelligence

摘要: Although parsing performances have greatly improved in the last years, grammar inference from treebanks for morphologically rich lan- guages, especially small treebanks, is still a challenging task. In this paper we in- vestigate how state-of-the-art perfor- mances can be achieved on Spanish, guage with verbal morphology, non-lexicalized parser trained treebank containing only around 2,800 trees. We rely accurate part-of-speech tagging and data- driven lemmatization order to cope lexical data sparseness. Providing state-of- the-art results our methodology applicable other languages.

参考文章(15)
Slav Petrov, Dan Klein, Improved Inference for Unlexicalized Parsing north american chapter of the association for computational linguistics. pp. 404- 411 ,(2007)
Josef van Genabith, Georgiana Dinu, Grzegorz Chrupala, Learning Morphology with Morfette language resources and evaluation. ,(2008)
Brooke Cowan, Michael Collins, Morphology and reranking for the statistical parsing of Spanish Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT '05. pp. 795- 802 ,(2005) , 10.3115/1220575.1220675
Montserrat Civit, Ma Antònia Martí, Building Cast3LB: A Spanish Treebank Research on Language and Computation. ,vol. 2, pp. 549- 574 ,(2004) , 10.1007/S11168-004-7429-X
Djamé Seddah, Marie Candito, Benoît Crabbé, Cross parser evaluation and tagset variation Proceedings of the 11th International Conference on Parsing Technologies - IWPT '09. pp. 150- 161 ,(2009) , 10.3115/1697236.1697266
Michael Collins, Terry Koo, Discriminative Reranking for Natural Language Parsing Computational Linguistics. ,vol. 31, pp. 25- 70 ,(2005) , 10.1162/0891201053630273
Slav Petrov, Dan Klein, Parsing German with Latent Variable Grammars Proceedings of the Workshop on Parsing German. pp. 33- 39 ,(2008) , 10.3115/1621401.1621406
Josef van Genabith, Jennifer Foster, Mohammed Attia, Deirdre Hogan, Joseph Le Roux, Lamia Tounsi, Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French north american chapter of the association for computational linguistics. pp. 67- 75 ,(2010)