Text language identification

作者： Johannes Heinecke

DOI:

关键词:

摘要: After prestoring first character strings that occur frequently in words of languages and second are a typical therein, device for automatically identifying the language text from plurality extracts constructs all contained each extracted word. Each string an word is compared to particular language. If contains string, score increased by coefficient depending on position decreased associated with string. The highest scores corresponding predetermined identifies text.

google.com 本地加速

freepatentsonline.com 本地加速

lens.org UNKNOWN 下载加速

freepatentsonline.com UNKNOWN 下载加速

参考文章(32)

Erik Sparre, Alberto Jimenez Feltström, Text language detection ,(2001)

Shamim A Alpha, Methods and systems for determining a language of a document ,(2001)

Richard Allen Shaner, Method of identifying data type and locating in a file ,(1998)

David van den Akker, System and method for identifying language using morphologically-based techniques ,(1997)

Gerald John Balm, Method and apparatus for context-aided recognition ,(1974)

John C. Schmitt, Trigram-based method of language identification ,(1990)

Markku Mettälä, Juha Häkkinen, Determining language for character sequence ,(2002)

Bruno M. Schulze, Automatic language identification using both N-Gram and word information ,(1999)

Gregory T. Grefenstette, Xiang Tong, David A. Evans, Method of identifying the language of a textual passage using short word and/or n-gram comparisons ,(2004)

Robert David Powell, Identifying language and character set of data representing text ,(1998)