Statistical Morph Analyzer (SMA++) for Indian Languages

作者: Saikrishna Srirampur , Ravi Chandibhamar , Radhika Mamidi

DOI: 10.3115/V1/W14-5312

关键词:

摘要: Statistical morph analyzers have proved to be highly accurate while being comparatively easier maintain than rule based approaches. Our analyzer (SMA++) is an improvement over the statistical (SMA) described in Malladi and Mannem (2013). SMA++ predicts gender, number, person, case (GNPC) lemma (L) of a given token. We modified SMA (2013), by adding some rich machine learning features. The feature set was chosen specifically suit characteristics Indian Languages. In this paper we apply four languages viz. Hindi, Urdu, Telugu Tamil. Hindi Urdu belong Indic 1 language family. Tamil Dravidian 2

参考文章(9)
Dipti Misra Sharma, Abhilash Inumella, Nikhil Kanuparthi, Hindi Derivational Morphological Analyzer Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology. pp. 10- 16 ,(2012)
Riyaz Ahmad Bhat, Dipti Misra Sharma, Dependency Treebank of Urdu and its Evaluation linguistic annotation workshop. pp. 157- 165 ,(2012)
Annette Hautli, Sebastian Sulger, Tina Bögel, Miriam Butt, Developing a finite-state morphological anlayzer for Urdu and Hindi finite state methods and natural language processing. pp. 86- 96 ,(2007)
K.V. Ramakrishnamacharyulu, Akshar Bharati, Vineet Chaitanya, Rajeev Sangal, Natural language processing : a Paninian perspective Prentice-Hall of India. ,(1996)
Josef van Genabith, Georgiana Dinu, Grzegorz Chrupala, Learning Morphology with Morfette language resources and evaluation. ,(2008)
Vishal Goyal, Gurpreet Singh Lehal, Hindi Morphological Analyzer and Generator international conference on emerging trends in engineering and technology. pp. 1156- 1159 ,(2008) , 10.1109/ICETET.2008.11
K.V. N. Sunitha, N. Kalyani, A Novel approach to improve rule based Telugu morphological analyzer nature and biologically inspired computing. pp. 1649- 1652 ,(2009) , 10.1109/NABIC.2009.5393637
Prashanth Mannem, Bharat Ambati, Samar Husain, Phani Gadde, The ICON-2010 tools contest on Indian language dependency parsing ,(2010)
Prashanth Mannem, Deepak Kumar Malladi, Context Based Statistical Morphological Analyzer and its Effect on Hindi Dependency Parsing empirical methods in natural language processing. pp. 119- 128 ,(2013)