Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser

作者： Qaiser Abbas

DOI:

关键词: Phrase structure rules 、 Natural language processing 、 Treebank 、 Parsing 、 Artificial intelligence 、 Computer science 、 Part of speech 、 Context-free grammar 、 Grammar 、 Language identification 、 Dependency grammar

摘要: This work presents the development of URDU.KON-TB treebank, its annotation evaluation & guidelines and construction Urdu parser for a South Asian language Urdu. is comparatively an under-resourced reliable treebank will have significant impact on state-of-the-art automatic processing. The includes raw corpus containing 1400 sentences collected from Wikipedia Jang newspaper. contains text local international news, social stories, sports, culture, finance, religion, traveling, etc. hierarchal scheme adopted has combination phrase structure hyper dependency structure. A semi-semantic part speech tag set, syntactic set functional are proposed, which further revised during corpus. was performed manually. Due to addition morphology, speech, syntactical, semantical, clausal, grammatical miscellaneous features, linguistically rich. resulted in Urdu, called URDU.KON-TB. presented Chapter 3. For scheme, Krippendorff’s α co-efficient selected. statistical measure evaluate inter-annotator agreement. Randomly selected 100 were given five trained annotators annotation. annotated then evaluated using co-efficient. values agreement obtained syntactical 0.964, 0.817 0.806, respectively. 4. All three lie range perfect devised after this evaluation. updated version 2. parser, divided into 80% training data 20% test data. context free grammar extracted data, development. 10% held out 140 with average length 13.73 words per sentence. used parser. extended dynamic programming algorithm known as Earley parsing algorithm. extensions made discussed 5 along issues faced items can occur normal considered, e.g., punctuation, null elements, diacritics, headings, regard titles, Hadees (the statements prophets), anaphora sentence, others. PARSEVAL measures results By applying sufficiently rich model, gives 87% f-score outperforms multi-path-shift-reduce two stage Hindi simple 4.8%, 12.48% 22% increase recall, contribution overall computational resources By-products tagset, guidelines, sufficient encoded information morphologically tagged corpus, be taggers. These enhanced natural processing such probabilistic parsing, POS taggers, disambiguation spoken sentences, development, identification, sources linguistic inquiry psychological modeling, or pattern matching.

uni-konstanz.de 本地加速

uni-konstanz.de PDF 下载加速

google.com 下载加速

参考文章(105)

Wajid Ali, Sarmad Hussain, Urdu Dependency Parser: A Data-Driven approach ,(2010)

Rebecca J. Passonneau, Diane J. Litman, Empirical Analysis of Three Dimensions of Spoken Discourse: Segmentation, Coherence, and Linguistic Devices Springer, Berlin, Heidelberg. pp. 161- 194 ,(1996) , 10.1007/978-3-662-03293-0_7

Seth Kulick, Ryan Gabbard, Mitchell Marcus, Parsing the Arabic Treebank: Analysis and Improvements ,(2006)

Riyaz Ahmad Bhat, Dipti Misra Sharma, Dependency Treebank of Urdu and its Evaluation linguistic annotation workshop. pp. 157- 165 ,(2012)

Tafseer Ahmed Khan, Spatial Expressions and Case in South Asian Languages ,(2009)

Bhasha Agrawal, Rahul Agarwal, Samar Husain, Dipti M. Sharma, An automatic approach to treebank error detection using a dependency parser international conference on computational linguistics. pp. 294- 303 ,(2013) , 10.1007/978-3-642-37247-6_24

Miriam Butt, The Light Verb Jungle ,(2003)

Geoffrey Leech, Adding linguistic annotation. Oxbow Books. ,(2005)

Marie Mikulová, Jan Stepánek, Ways of Evaluation of the Annotators in Building the Prague Czech-English Dependency Treebank language resources and evaluation. ,(2010)

10.

Ghulam Raza, Subcategorization Acquisition and Classes of Predication in Urdu ,(2011)

Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser

来源期刊

我的账户

Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser

来源期刊

相似文章 1

Morphologically rich Urdu grammar parsing using Earley algorithm

我的账户