Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists

作者： Gerhard Jäger , Johann-Mattis List , Pavel Sofroniev

关键词: Cognate 、 Identification (information) 、 Phonetic transcription 、 Scope (computer science) 、 Artificial intelligence 、 Natural language processing 、 Word (computer architecture) 、 Support vector machine 、 State (computer science) 、 Machine learning 、 Computer science

摘要: Most current approaches in phylogenetic linguistics require as input multilingual word lists partitioned into sets of etymologically related words (cognates). Cognate identification is so far done manually by experts, which time consuming and yet only available for a small number well-studied language families. Automatizing this step will greatly expand the empirical scope methods linguistics, raw wordlists (in phonetic transcription) are much easier to obtain than cognate have been fully identified annotated, even under-studied languages. A couple different proposed past, but they either disappointing regarding their performance or not applicable larger datasets. Here we present new approach that uses support vector machines unify state-of-the-art alignment detection within single framework. Training evaluating these method on typologically broad collection gold-standard data shows it be superior existing state art.

uni-trier.de PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(31)

Ilia Peiros, Comparative linguistics in Southeast Asia ,(1985)

Hans Geisler, Akzent und Lautwandel in der Romania Gunter Narr Verlag. ,(1992)

Brett Kessler, The significance of word lists ,(2001)

Søren Wichmann, Eric W. Holman, Languages with longer words have more lexical change Approaches to Measuring Linguistic Differences. pp. 249- 281 ,(2013) , 10.1515/9783110305258.249

Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)

John C. Platt, Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods Advances in Large Margin Classifiers. ,(1999)

B. Andreopoulos, A. An, X. Wang, M. Schroeder, A roadmap of clustering algorithms: finding a match for a biomedical application Briefings in Bioinformatics. ,vol. 10, pp. 297- 314 ,(2008) , 10.1093/BIB/BBN058

Sean R Eddy, None, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids ,(1998)

Phylogenetic Inference from Word Lists Using Weighted Alignment with Empirically Determined Weights Language Dynamics and Change. ,vol. 3, pp. 245- 291 ,(2013) , 10.1163/9789004281523_007

10.

Mark Pagel, Quentin D. Atkinson, Andrew Meade, Frequency of word-use predicts rates of lexical evolution throughout Indo-European history Nature. ,vol. 449, pp. 717- 720 ,(2007) , 10.1038/NATURE06176

Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists

来源期刊

我的账户

Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists

来源期刊

相似文章 10

我的账户