作者: Miguel Cardoso de Campos
DOI:
关键词:
摘要: A window of letters is identified within a text sample input. If the contains matches to reference letter sequences (RLS) contained in multiple sets n-gram language profiles (profiles), then longest match kept and scored for each language. Scoring based on frequency parameters matched RLS The incrementally shifted through matching scoring done window. At end input, having highest cumulative score as sample's may be improved by restricting longer full words, using two passes where second pass disregards languages that are not near during first pass, favoring complete words scoring, increasing does frequently appear many languages. enhanced removing some if meet predefined threshold variable threshold.