Using Cross-Lingual Projections to Generate Semantic Role Labeled Annotated Corpus for Urdu - A Resource Poor Language

作者： Smruthi Mukund , Rohini Srihari , Debanjan Ghosh

DOI:

关键词: Natural language processing 、 Syntax 、 Word (computer architecture) 、 Cross lingual 、 Artificial intelligence 、 PropBank 、 Resource poor 、 Annotation 、 Urdu 、 Scale (map) 、 Computer science

摘要: In this paper we explore the possibility of using cross lingual projections that help to automatically induce role-semantic annotations in PropBank paradigm for Urdu, a resource poor language. This technique provides annotation based on word alignments. It is relatively inexpensive and has potential reduce human effort involved creating semantic role resources. The projection model exploits lexical as well syntactic information an English-Urdu parallel corpus. We show our method generates reasonably good with accuracy 92% short structured sentences. Using generated annotated corpus, conduct preliminary experiments create labeler Urdu. results though modest, are promising indicate generate large scale

uni-trier.de 本地加速

aclweb.org 本地加速

aclweb.org PDF 下载加速

参考文章(26)

Philipp Koehn, Europarl: A Parallel Corpus for Statistical Machine Translation ,(2005)

Mariona Taulé, Maria Antònia Martí, Marta Recasens, AnCora: Multilevel Annotated Corpora for Catalan and Spanish language resources and evaluation. ,(2008)

Lawrence Philips, The double metaphone search algorithm The C Users Journal archive. ,vol. 18, pp. 38- 43 ,(2000)

Anette Frank, Aljoscha Burchardt, Approaching Textual Entailment with LFG and FrameNet Frames ,(2007)

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

Alessandro Moschitti, Kernel methods, syntax and semantics for relational text categorization Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08. pp. 253- 262 ,(2008) , 10.1145/1458082.1458118

Amitabha Mukerjee, Ankit Soni, Achla M. Raina, Detecting complex predicates in Hindi using POS projection across parallel corpora Proceedings of the Workshop on Multiword Expressions Identifying and Exploiting Underlying Properties - MWE '06. pp. 28- 35 ,(2006) , 10.3115/1613692.1613699

Chenhai Xi, Rebecca Hwa, A backoff model for bootstrapping resources for non-English languages Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT '05. pp. 851- 858 ,(2005) , 10.3115/1220575.1220682

David Yarowsky, Grace Ngai, Richard Wicentowski, Inducing multilingual text analysis tools via robust projection across aligned corpora Proceedings of the first international conference on Human language technology research - HLT '01. pp. 1- 8 ,(2001) , 10.3115/1072133.1072187

10.

Smruthi Mukund, Rohini Srihari, Erik Peterson, An Information-Extraction System for Urdu---A Resource-Poor Language ACM Transactions on Asian Language Information Processing. ,vol. 9, pp. 15- ,(2010) , 10.1145/1838751.1838754

Using Cross-Lingual Projections to Generate Semantic Role Labeled Annotated Corpus for Urdu - A Resource Poor Language

来源期刊

我的账户

Using Cross-Lingual Projections to Generate Semantic Role Labeled Annotated Corpus for Urdu - A Resource Poor Language

来源期刊

相似文章 10

我的账户