Memory-based context-sensitive spelling correction at web scale

作者: Dennis J. Drown , Taghi M. Khoshgoftaar , Ramaswamy Narayanan

DOI: 10.1109/ICMLA.2007.73

关键词:

摘要: We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database token n-gram occurrences web as training data. Our approach uses context which an error appears to select most likely candidate from words might have been intended its place. Using novel correction algorithm massive data, we demonstrate higher accuracy on real- word errors than previous work, high at new task ranking corrections non-word given by standard package.

参考文章(12)
Vinci Liu, James R. Curran, Web Text Corpus for Natural Language Processing conference of the european chapter of the association for computational linguistics. ,(2006)
Miroslav Kubat, Robert C. Holte, Stan Matwin, Machine Learning for the Detection of Oil Spills in Satellite Radar Images Machine Learning. ,vol. 30, pp. 195- 215 ,(1998) , 10.1023/A:1007452223027
Andrew R. Golding, Dan Roth, Applying Winnow to Context-Sensitive Spelling Correction international conference on machine learning. pp. 182- 190 ,(1996)
Gary M. Weiss, Mining with rarity ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 7- 19 ,(2004) , 10.1145/1007730.1007734
Mirella Lapata, Frank Keller, Web-based models for natural language processing ACM Transactions on Speech and Language Processing. ,vol. 2, pp. 3- ,(2005) , 10.1145/1075389.1075392
Kenneth W. Church, William A. Gale, Probability scoring for spelling correction Statistics and Computing. ,vol. 1, pp. 93- 103 ,(1991) , 10.1007/BF01889984
KEVIN S. WOODS, CHRISTOPHER C. DOSS, KEVIN W. BOWYER, JEFFREY L. SOLKA, CAREY E. PRIEBE, W. PHILIP KEGELMEYER, COMPARATIVE EVALUATION OF PATTERN RECOGNITION TECHNIQUES FOR DETECTION OF MICROCALCIFICATIONS IN MAMMOGRAPHY International Journal of Pattern Recognition and Artificial Intelligence. ,vol. 07, pp. 1417- 1436 ,(1993) , 10.1142/S0218001493000698
Foster Provost, Tom Fawcett, Robust Classification for Imprecise Environments Machine Learning. ,vol. 42, pp. 203- 231 ,(2001) , 10.1023/A:1007601015854
Jason Van Hulse, Taghi M. Khoshgoftaar, Amri Napolitano, Experimental perspectives on learning from imbalanced data international conference on machine learning. pp. 935- 942 ,(2007) , 10.1145/1273496.1273614
Ricardo Barandela, Rosa M. Valdovinos, J. Salvador Sánchez, Francesc J. Ferri, The Imbalanced Training Sample Problem: Under or over Sampling? Lecture Notes in Computer Science. pp. 806- 814 ,(2004) , 10.1007/978-3-540-27868-9_88