Method of identifying the language of a textual passage using short word and/or n-gram comparisons

作者: Gregory T. Grefenstette , Xiang Tong , David A. Evans

DOI:

关键词:

摘要: A method and system identifying the language of a textual passage is disclosed. The includes parsing into n-grams assigning an initial weight to each n-gram, adjusting initially assigned word or n-gram parsed from passage. adjusted in manner proportionate inverse number languages within which such words appear. Reducing diminishes—without completely eliminating—their importance comparison other same when determining present invention appropriately weighs short common multiple without affecting that are uncommon several languages.

参考文章(9)
William B. Dolan, John J. Messerly, Stephen D. Richardson, George E. Heidorn, Karen Jensen, Information retrieval utilizing semantic representation of text ,(1998)
Robert Charles Paulsen, Michael John Martino, Word storage table for natural language determination ,(1996)
Robert Charles Paulsen, Michael John Martino, Natural language determination using partial words ,(1996)
Robert Charles Paulsen, Michael John Martino, Language identification process using coded language words ,(1995)
Robert Charles Paulsen, Michael John Martino, Natural language determination using correlation between common words ,(1996)