Dependency Treebank of Urdu and its Evaluation

作者: Riyaz Ahmad Bhat , Dipti Misra Sharma

DOI:

关键词:

摘要: In this paper we describe a currently underway treebanking effort for Urdu-a South Asian language. The treebank is built from newspaper corpus and uses Karaka based grammatical framework inspired by Paninian theory. Thus far 3366 sentences (0.1M words) have been annotated with the linguistic information at morpho-syntactic (morphological, part-of-speech chunk information) syntactico-semantic (dependency) levels. This work also aims to evaluate correctness or reliability of manual dependency treebank. Evaluation done measuring inter-annotator agreement on manually data set 196 (5600 two annotators. We present qualitative analysis statistics identify possible reasons disagreement between show syntactic annotation some constructions specific Urdu like Ezafe discuss problem word segmentation (tokenization).

参考文章(27)
Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür, Building a Turkish Treebank Treebanks. pp. 261- 277 ,(2003) , 10.1007/978-94-010-0201-1_15
Owen Rambow, Rachel Szekely, Marilyn A. Walker, Harriet Taber, Cassandre Creswell, A Dependency Treebank for English language resources and evaluation. ,(2002)
Tina Bögel, Sebsatian Sulger, Miriam Butt, Urdu Ezafe and the Morphology-Syntax Interface ,(2008)
Chung Yong Lim, Hwee Tou Ng, Shou King Foo, A Case Study on Inter-Annotator Agreement for Word Sense Disambiguation SIGLEX99: Standardizing Lexical Resources. ,(1999)
Tara Warrier Mohanan, Arguments in Hindi University Microfilms International. ,(1990)
Miriam Butt, Proceedings of LFG08 ,(2008)
The Alpino Dependency Treebank computational linguistics in the netherlands. pp. 8- 22 ,(2002) , 10.1163/9789004334038_003
Frank Reichartz, Hannes Korte, Gerhard Paass, Dependency Tree Kernels for Relation Extraction from Natural Language Text european conference on machine learning. pp. 270- 285 ,(2009) , 10.1007/978-3-642-04174-7_18
K.V. Ramakrishnamacharyulu, Akshar Bharati, Vineet Chaitanya, Rajeev Sangal, Natural language processing : a Paninian perspective Prentice-Hall of India. ,(1996)