作者: Teresa Lynn
DOI:
关键词:
摘要: Despite enjoying the status of an official EU language, Irish is considered a minority language. As with most languages, it `low-density' which means lacks important linguistic and Natural Language Processing (NLP) resources. Relative to better-resourced languages such as English or French, for example, little research has been carried out on computational analysis processing Irish. Parsing method analysing structure text, invaluable step that required many different types language technology applications. verb-initial several features are uncharacteristic previously studied in parsing research. Our work broadens application NLP methods less structures provides basis future possible. We report development dependency treebank serves training data first full parser. discuss Irish, motivation behind design our annotation scheme. also examines various employing semi-automated approaches development. overcome relatively small pool technological resources available these approaches, show even early stages development, results promising. What counts sufficient number trees parser varies according languages. Through empirical methods, we explore impact treebank's size content accuracy crosslingual studies through converting universal Finally extend unstructured user-generated text tweets. creation POS-tagged corpus tweets statistical POS-tagging models. how existing can be leveraged this domain-adapted resource