作者: Saikrishna Srirampur , Ravi Chandibhamar , Radhika Mamidi
DOI: 10.3115/V1/W14-5312
关键词:
摘要: Statistical morph analyzers have proved to be highly accurate while being comparatively easier maintain than rule based approaches. Our analyzer (SMA++) is an improvement over the statistical (SMA) described in Malladi and Mannem (2013). SMA++ predicts gender, number, person, case (GNPC) lemma (L) of a given token. We modified SMA (2013), by adding some rich machine learning features. The feature set was chosen specifically suit characteristics Indian Languages. In this paper we apply four languages viz. Hindi, Urdu, Telugu Tamil. Hindi Urdu belong Indic 1 language family. Tamil Dravidian 2