Identifying Urdu Complex Predication via Bigram Extraction

作者： Annette Hautli , Sebastian Sulger , Tina Bögel , Tafseer Ahmed , Miriam Butt

DOI:

关键词: Urdu 、 Noun 、 Process (engineering) 、 Computer science 、 Artificial intelligence 、 Hindi 、 Bigram 、 Range (mathematics) 、 Natural language processing 、 Component (UML)

摘要: A problem that crops up repeatedly in shallow and deep syntactic parsing approaches to South Asian languages like Urdu/Hindi is the proper treatment of complex predications. Problems for NLP predications are posed by their productiveness ill understood nature range combinatorial possibilities. This paper presents an investigation into whether fine-grained information about distributional properties nouns N+V CPs can be identified comparatively simple process extracting bigrams from a large “raw” corpus Urdu. In gathering relevant properties, we were aided visual analytics coupled our computational data analysis with interactive components sets. The visualization component proved essential part analysis, particular easy identification outliers false positives. Another turned out language-particular knowledge access existing resources. Overall, indeed able identify high frequency N-V as well pick combinations had not been aware before. However, manual inspection results also pointed sparsity, despite use corpus.

uni-konstanz.de PDF 下载加速

uni-konstanz.de LINK 下载加速

参考文章(36)

Tina Bögel, Urdu - Roman Transliteration via Finite State Transducers finite state methods and natural language processing. pp. 25- 29 ,(2012)

Ghulam Raza, Subcategorization Acquisition and Classes of Predication in Urdu ,(2011)

Miriam Butt, Tracy Holloway King, The Status of Case Springer, Dordrecht. pp. 153- 198 ,(2004) , 10.1007/978-1-4020-2719-2_6

Annette Hautli, Sebastian Sulger, Miriam Butt, Adding an Annotation Layer to the Hindi/Urdu Treebank Linguistic Issues in Language Technology. ,vol. 7, ,(2012)

Christian Rohrdantz, Frans Plank, Thomas Mayer, Daniel A. Keim, Miriam Butt, Visualizing vowel harmony Linguistic Issues in Language Technology. ,vol. 4, pp. 1- 33 ,(2011)

Tafseer Ahmed, Miriam Butt, Discovering semantic classes for Urdu N-V complex predicates IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics. pp. 305- 309 ,(2011)

Miriam Butt, The Structure of Complex Predicates in Urdu ,(1995)

Florian Mansmann, Jörn Kohlhammer, Daniel Keim, Geoffrey Ellis, Mastering the information age : solving problems with visual analytics Goslar : Eurographics Association. ,(2010) , 10.2312/14803

Ben Shneiderman, Stuart K Card, Jock Mackinlay, B Shneiderman, Readings in Information Visualization: Using Vision to Think ,(1999)

10.

Harald Hammarström, Muhammad Humayoun, Aarne Ranta, Urdu Morphology, Orthography and Lexicon Extraction CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages, July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University. ,(2007)

Identifying Urdu Complex Predication via Bigram Extraction

来源期刊

我的账户

Identifying Urdu Complex Predication via Bigram Extraction

来源期刊

相似文章 10

我的账户