Development of Nepali Character Database for Character Recognition based on Clustering

作者: Aadesh Neupane

DOI: 10.5120/18799-0315

关键词: Computer scienceDiscrete cosine transformArtificial intelligenceConsonantWavelet transformNatural language processingVowelCluster analysisDatabaseNepaliCharacter (mathematics)

摘要: dataset to apply recognition algorithms and generate efficient models out of them. In case Nepali language, no such character exists for research, at least in the public domain. language has 36 consonant characters, 12 vowels each vowel can modify characters. this regard, there be total 446 characters including numeric So, manually creating requires tons effort, cost time. paper, an elegant way using semi-supervised clustering approach is described which minimizes effort Also, optimization done on existing segmentation algorithm [1] segment both handwritten scanned text. Complex features are extracted from these segmented by applying Discrete Cosine Transform Wavelet transform. Thus, used create database phash k-means cluster. Presently, contains 38,493 distributed among 52 different clusters.

参考文章(12)
Mitrakshi B. Patil, Vaibhav Narawade, Recognition of Handwritten Devnagari Characters through Segmentation and Artificial neural networks International journal of engineering research and technology. ,vol. 1, ,(2012)
Mudit Agrawal, Huanfeng Ma, David Doermann, Generalization of Hindi OCR Using Adaptive Segmentation and Font Files Advances in Pattern Recognition. pp. 181- 207 ,(2009) , 10.1007/978-1-84800-330-9_10
Eugene Borovikov, A survey of modern optical character recognition techniques arXiv: Computer Vision and Pattern Recognition. ,(2014)
R.G. Casey, E. Lecolinet, A survey of methods and strategies in character segmentation IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 18, pp. 690- 706 ,(1996) , 10.1109/34.506792
Veena Bansal, R.M.K. Sinha, Segmentation of touching and fused Devanagari characters Pattern Recognition. ,vol. 35, pp. 875- 893 ,(2002) , 10.1016/S0031-3203(01)00081-4
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, The WEKA data mining software ACM SIGKDD Explorations Newsletter. ,vol. 11, pp. 10- 18 ,(2009) , 10.1145/1656274.1656278
Vijay Kumar, Pankaj K Sengar, None, Segmentation of Printed Text in Devanagari Script and Gurmukhi Script International Journal of Computer Applications. ,vol. 3, pp. 24- 29 ,(2010) , 10.5120/749-1058