作者: Gregory T. Grefenstette , Xiang Tong , David A. Evans
DOI:
关键词:
摘要: A method and system identifying the language of a textual passage is disclosed. The includes parsing into n-grams assigning an initial weight to each n-gram, adjusting initially assigned word or n-gram parsed from passage. adjusted in manner proportionate inverse number languages within which such words appear. Reducing diminishes—without completely eliminating—their importance comparison other same when determining present invention appropriately weighs short common multiple without affecting that are uncommon several languages.