Irish dependency treebanking and parsing

作者: Teresa Lynn

DOI:

关键词:

摘要: Despite enjoying the status of an official EU language, Irish is considered a minority language. As with most languages, it `low-density' which means lacks important linguistic and Natural Language Processing (NLP) resources. Relative to better-resourced languages such as English or French, for example, little research has been carried out on computational analysis processing Irish. Parsing method analysing structure text, invaluable step that required many different types language technology applications. verb-initial several features are uncharacteristic previously studied in parsing research. Our work broadens application NLP methods less structures provides basis future possible. We report development dependency treebank serves training data first full parser. discuss Irish, motivation behind design our annotation scheme. also examines various employing semi-automated approaches development. overcome relatively small pool technological resources available these approaches, show even early stages development, results promising. What counts sufficient number trees parser varies according languages. Through empirical methods, we explore impact treebank's size content accuracy crosslingual studies through converting universal Finally extend unstructured user-generated text tweets. creation POS-tagged corpus tweets statistical POS-tagging models. how existing can be leveraged this domain-adapted resource

参考文章(142)
Stefan Riezler, Richard S. Crouch, Tracy Holloway King, Mary Dalrymple, Ronald M. Kaplan, The PARC 700 Dependency Bank Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003. ,(2003)
Dragomir R. Radev, Sanjeev Khudanpur, Daniel Gildea, Katherine Eng, Alexander M. Fraser, Shankar Kumar, Anoop Sarkar, Zhen Jin, Libin Shen, Franz Josef Och, Kenji Yamada, David Smith, Viren Jain, A Smorgasbord of Features for Statistical Machine Translation north american chapter of the association for computational linguistics. pp. 161- 168 ,(2004)
Saso Dzeroski, Petr Pajas, Zdenek Zabokrtský, Tomaz Erjavec, Nina Ledinek, Anreja Zele, Towards a Slovene Dependency Treebank language resources and evaluation. pp. 1388- 1391 ,(2006)
Elaine U'i Dhonnchadha, Jennifer Foster, Teresa Lynn, Mark Dras, Active Learning and the Irish Treebank Proceedings of the Australasian Language Technology Association Workshop 2012. ,vol. 10, pp. 23- 32 ,(2012)
Riyaz Ahmad Bhat, Dipti Misra Sharma, Dependency Treebank of Urdu and its Evaluation linguistic annotation workshop. pp. 157- 165 ,(2012)
Markus Dickinson, Amber Smith, Detecting Dependency Parse Errors with Minimal Resources international workshop/conference on parsing technologies. pp. 241- 252 ,(2011)
Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür, Building a Turkish Treebank Treebanks. pp. 261- 277 ,(2003) , 10.1007/978-94-010-0201-1_15
Dekang Lin, Dependency-Based Evaluation of Minipar Treebanks. pp. 317- 329 ,(2003) , 10.1007/978-94-010-0201-1_18
Miles Osborne, Jason Baldridge, Ensemble-based Active Learning for Parse Selection. north american chapter of the association for computational linguistics. pp. 89- 96 ,(2004)
Josef van Genabith, Elaine U'i Dhonnchadha, Ozlem Cetinoglu, Jennifer Foster, Teresa Lynn, Mark Dras, Irish Treebanking and Parsing: A Preliminary Evaluation language resources and evaluation. pp. 1939- 1946 ,(2012)