Parsing the Arabic Treebank: Analysis and Improvements

作者: Seth Kulick , Ryan Gabbard , Mitchell Marcus

DOI:

关键词:

摘要: Previous work has demonstrated that the performance of current parsers on Arabic is far below their English or even Chinese, which in turn harms NLP tasks use parsing as an input. This paper exploration some issues involved this difference. We focus Collins model [3] implemented Bikel parser [1]. The corpus used for experiments Treebank [6] (ATB). cluster these three ways. First, it important when comparing to other languages comparison be a fair one; therefore we first discuss around evaluation and show not quite bad previously thought. Second, present modifications provide modest increases performance. Finally, explore deeper differences between Penn advance speculations why have difficulty with Arabic.

参考文章(3)
Mark Johnson, PCFG models of linguistic tree representations Computational Linguistics. ,vol. 24, pp. 613- 632 ,(1998)
Michael Collins, Head-Driven Statistical Models for Natural Language Parsing Computational Linguistics. ,vol. 29, pp. 589- 637 ,(2003) , 10.1162/089120103322753356
Ryan Gabbard, Mitchell Marcus, Seth Kulick, Fully Parsing the Penn Treebank language and technology conference. pp. 184- 191 ,(2006) , 10.3115/1220835.1220859