作者: Eva Pettersson , Gerold Schneider , Michael Percillier
DOI: 10.5167/UZH-137462
关键词: Preprocessor 、 Drag and drop 、 Spelling 、 Natural language processing 、 Machine translation 、 Computational linguistics 、 Computer science 、 Parsing 、 Register (sociolinguistics) 、 Rule-based system 、 Artificial intelligence
摘要: To be able to use existing natural language processing tools for analysing historical text, an important preprocessing step is spelling normalisation, converting the original present-day spelling, before applying such as taggers and parsers. In this paper, we compare a probablistic, language-independent approach normalisation based on statistical machine translation (SMT) techniques, rule-based system combining dictionary lookup with rules non-probabilistic weights. The reaches best accuracy, up 94% precision at 74% recall, while SMT improves each tested period.