Text language identification

作者: Johannes Heinecke

DOI:

关键词:

摘要: After prestoring first character strings that occur frequently in words of languages and second are a typical therein, device for automatically identifying the language text from plurality extracts constructs all contained each extracted word. Each string an word is compared to particular language. If contains string, score increased by coefficient depending on position decreased associated with string. The highest scores corresponding predetermined identifies text.