A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System

作者: Vilhjálmur Þorsteinsson , , Hulda Óladóttir , Hrafn Loftsson ,

DOI: 10.26615/978-954-452-056-4_160

关键词:

摘要: We present an open-source, wide-coverage context-free grammar (CFG) for Icelandic, and accompanying parsing system. The has over 5,600 nonterminals, 4,600 terminals 19,000 productions in fully expanded form, with feature agreement constraints case, gender, number person. system consists of enhanced Earley-based parser a mechanism to select best-scoring parse trees from shared packed forests. Our is able about 90% all sentences articles published on the main Icelandic news websites. Preliminary evaluation evalb shows F-measure 70.72% parsed sentences. demonstrates that morphologically rich language using CFG can be practical.

参考文章(19)
Ted Briscoe, Claire Grover, John Carroll, Bran Boguraav, A formalism and environment for the development of a large grammar of English international joint conference on artificial intelligence. pp. 703- 708 ,(1987)
Aravind Joshi, Martha Palmer, Fei Xia, Automatic grammar generation from two different perspectives ,(2001)
Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556
Daniel H. Younger, Recognition and parsing of context-free languages in time n3* Information & Computation. ,vol. 10, pp. 189- 208 ,(1967) , 10.1016/S0019-9958(67)80007-X
QAISER ABBAS, Morphologically rich Urdu grammar parsing using Earley algorithm Natural Language Engineering. ,vol. 22, pp. 775- 810 ,(2016) , 10.1017/S1351324915000133
Reut Tsarfaty, Djamé Seddah, Sandra Kübler, Joakim Nivre, None, Parsing morphologically rich languages: Introduction to the special issue Computational Linguistics. ,vol. 39, pp. 15- 22 ,(2013) , 10.1162/COLI_A_00133
Aoife Cahill, Treebank-Based Probabilistic Phrase Structure Parsing Language and Linguistics Compass. ,vol. 2, pp. 36- 58 ,(2008) , 10.1111/J.1749-818X.2007.00046.X
Robert Gaizauskas, Mark Hepple, Horacio Saggion, Mark A. Greenwood, Kevin Humphreys, SUPPLE Proceedings of the Ninth International Workshop on Parsing Technology - Parsing '05. pp. 200- 201 ,(2005) , 10.3115/1654494.1654521
Sabine Brants, Stefanie Dipper, Peter Eisenberg, Silvia Hansen-Schirra, Esther König, Wolfgang Lezius, Christian Rohrer, George Smith, Hans Uszkoreit, TIGER: Linguistic Interpretation of a German Corpus Research on Language and Computation. ,vol. 2, pp. 597- 620 ,(2004) , 10.1007/S11168-004-7431-3