Improvement of Korean Proofreading System Using Corpus and Collocation Rules

作者: Young-Soog Chae

DOI:

关键词:

摘要: This paper presents the ``techniques of correcting for spelling errors, orthographical and grammatical errors in computer-based text. And this addresses an extension that goes beyond normal checking isolated single word by taking multi-words as well a sentence. The candidate words are created applying function rules correction rule table contains heuristic information collocation. To prevent excessive creation improve accuracy, we use high frequency dictionary 300,000 derived from corpus. For constituent grammar based partialparsing rules, collocation between can be found. We make experiment with techniques on corpora final result SERI``s research, texts, newspaper materials, public materials. system has 98% accuracy rate when 8.5% caused unregistered were excluded. average number prospective candidates suggested is 1.12.

参考文章(6)
Koichi Takeda, Emiko Suzuki, Tetsuro Nishino, Tetsunosuke Fujisaki, CRITAC—an experimental system for Japanese text proofreading Ibm Journal of Research and Development. ,vol. 32, pp. 201- 216 ,(1988) , 10.1147/RD.322.0201
G. E. Heidorn, K. Jensen, L. A. Miller, R. J. Byrd, M. S. Chodorow, The EPISTLE text-critiquing system Ibm Systems Journal. ,vol. 21, pp. 305- 326 ,(1982) , 10.1147/SJ.213.0305
Thomas N. Turba, Checking for spelling and typographical errors in computer-based text Proceedings of the ACM SIGPLAN SIGOA symposium on Text manipulation -. ,vol. 16, pp. 51- 60 ,(1981) , 10.1145/800209.806454
N. Macdonald, L. Frase, P. Gingrich, S. Keenan, The Writer's Workbench: Computer Aids for Text Analysis IEEE Transactions on Communications. ,vol. 30, pp. 105- 110 ,(1982) , 10.1109/TCOM.1982.1095380
James L. Peterson, Computer programs for detecting and correcting spelling errors Communications of The ACM. ,vol. 23, pp. 676- 687 ,(1980) , 10.1145/359038.359041