System and method for identifying the language of written text having a plurality of different length n-gram profiles

作者： Miguel Cardoso de Campos

DOI:

关键词:

摘要: A window of letters is identified within a text sample input. If the contains matches to reference letter sequences (RLS) contained in multiple sets n-gram language profiles (profiles), then longest match kept and scored for each language. Scoring based on frequency parameters matched RLS The incrementally shifted through matching scoring done window. At end input, having highest cumulative score as sample's may be improved by restricting longer full words, using two passes where second pass disregards languages that are not near during first pass, favoring complete words scoring, increasing does frequently appear many languages. enhanced removing some if meet predefined threshold variable threshold.

google.com 本地加速

freepatentsonline.com 本地加速

freepatentsonline.com UNKNOWN 下载加速

lens.org UNKNOWN 下载加速

参考文章(12)

Philip J. Mullan, Walter S. Rosenbaum, Multi-channel recognition discriminator ,(1976)

John C. Schmitt, Trigram-based method of language identification ,(1990)

Lorin P. Netsch, Barbara J. Wheatley, Yeshwant K. Muthusamy, Periagaram K. Rajasekaran, Automatic language identification method and system ,(1994)

Michael S. Register, Narasimhan Kannan, Method and apparatus for text classification ,(1992)

Sean Erin Walton, Language identification system and method for a peripheral unit ,(1992)

Robert Charles Paulsen, Michael John Martino, Determining a natural language shift in a computer document ,(1996)

Koichi Ejiri, Method and apparatus for classifying text ,(1991)

William W. Luciw, Method and apparatus for processing natural language ,(1994)

Dean Sturtevant, Daniell Stevens, Joel M. Gould, Charles E. Ingold, Michael J. Newman, Allan Gold, David Abrahams, Robert Roth, Error correction in speech recognition ASAJ. ,vol. 109, pp. 30- ,(2001)

10.

Peter F. Brown, Speech recognition system for natural language translation Journal of the Acoustical Society of America. ,vol. 97, pp. 1365- 1365 ,(1993) , 10.1121/1.412155

System and method for identifying the language of written text having a plurality of different length n-gram profiles

来源期刊

我的账户

System and method for identifying the language of written text having a plurality of different length n-gram profiles

来源期刊

相似文章 10

我的账户