作者: Kuniaki Takizawa , Brendan P. Murray
DOI:
关键词:
摘要: An evaluator system accepts input textual messages in unknown languages and assesses which character sets, corresponding to languages, matches that message. Textual whose individual characters are encoded 16 bit Unicode or other universal format parsed, sets can express each the accumulated correspondence is logged. When against message being tested only provide partial matches, invention determine offers best fit, including by means of a weighting function. The evaluation technology be applied multipart documents, search engines indices.