作者: Djamé Seddah , Benoit Sagot , Joseph Le Roux
DOI:
关键词: Parsing 、 Natural language processing 、 Treebank 、 Bottom-up parsing 、 Data-driven 、 Top-down parsing 、 Morphology (linguistics) 、 Statistical parsing 、 Speech recognition 、 Lemmatisation 、 Computer science 、 S-attributed grammar 、 Grammar inference 、 Artificial intelligence
摘要: Although parsing performances have greatly improved in the last years, grammar inference from treebanks for morphologically rich lan- guages, especially small treebanks, is still a challenging task. In this paper we in- vestigate how state-of-the-art perfor- mances can be achieved on Spanish, guage with verbal morphology, non-lexicalized parser trained treebank containing only around 2,800 trees. We rely accurate part-of-speech tagging and data- driven lemmatization order to cope lexical data sparseness. Providing state-of- the-art results our methodology applicable other languages.