作者: Gerhard Jäger , Johann-Mattis List , Pavel Sofroniev
DOI: 10.18653/V1/E17-1113
关键词: Cognate 、 Identification (information) 、 Phonetic transcription 、 Scope (computer science) 、 Artificial intelligence 、 Natural language processing 、 Word (computer architecture) 、 Support vector machine 、 State (computer science) 、 Machine learning 、 Computer science
摘要: Most current approaches in phylogenetic linguistics require as input multilingual word lists partitioned into sets of etymologically related words (cognates). Cognate identification is so far done manually by experts, which time consuming and yet only available for a small number well-studied language families. Automatizing this step will greatly expand the empirical scope methods linguistics, raw wordlists (in phonetic transcription) are much easier to obtain than cognate have been fully identified annotated, even under-studied languages. A couple different proposed past, but they either disappointing regarding their performance or not applicable larger datasets. Here we present new approach that uses support vector machines unify state-of-the-art alignment detection within single framework. Training evaluating these method on typologically broad collection gold-standard data shows it be superior existing state art.