Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists

关键词: Task (project management) 、 Word list 、 Computer science 、 Artificial intelligence 、 Cognate 、 Chinese restaurant process 、 Similarity (network science) 、 Identification (information) 、 Natural language processing 、 Cluster analysis 、 Language family

摘要: We present and evaluate two similarity dependent Chinese Restaurant Process (sd-CRP) algorithms at the task of automated cognate detection. The sd-CRP clustering do not require any predefined threshold for detecting sets in a multilingual word list. performance on six language families (more than 750 languages) find that both variants performs as well InfoMap better UPGMA inferring clusters. presented this paper are family agnostic can be applied to linguistically under-studied family.

uni-trier.de 本地加速

aclweb.org 本地加速

aclweb.org PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(39)

Søren Wichmann, Eric W. Holman, Languages with longer words have more lexical change Approaches to Measuring Linguistic Differences. pp. 249- 281 ,(2013) , 10.1515/9783110305258.249

Lyle Campbell, Historical Linguistics: An Introduction ,(1998)

Grzegorz Kondrak, Identification of Cognates and Recurrent Sound Correspondences in Word Lists Trait. Autom. des Langues. ,vol. 50, pp. 201- 235 ,(2009)

David Hall, Dan Klein, Large-Scale Cognate Recovery empirical methods in natural language processing. pp. 344- 354 ,(2011)

Johann-Mattis List, SCA: phonetic alignment based on sound classes ESSLLI'10 Proceedings of the 2010 international conference on New Directions in Logic, Language and Computation. ,vol. 7415, pp. 32- 51 ,(2010) , 10.1007/978-3-642-31467-4_3

R. D. Gray, A. J. Drummond, S. J. Greenhill, Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement Science. ,vol. 323, pp. 479- 483 ,(2009) , 10.1126/SCIENCE.1166858

Phylogenetic Inference from Word Lists Using Weighted Alignment with Empirically Determined Weights Language Dynamics and Change. ,vol. 3, pp. 245- 291 ,(2013) , 10.1163/9789004281523_007

Will Chang, Chundra Cathcart, David Hall, Andrew Garrett, Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis Language. ,vol. 91, pp. 194- 244 ,(2015) , 10.1353/LAN.2015.0005

Enrique Amigó, Julio Gonzalo, Javier Artiles, Felisa Verdejo, A comparison of extrinsic clustering evaluation metrics based on formal constraints Information Retrieval. ,vol. 12, pp. 461- 486 ,(2009) , 10.1007/S10791-008-9066-8

10.

Saul B. Needleman, Christian D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins Journal of Molecular Biology. ,vol. 48, pp. 443- 453 ,(1970) , 10.1016/0022-2836(70)90057-4

Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists

来源期刊

我的账户

Similarity Dependent Chinese Restaurant Process for Cognate Identification in Multilingual Wordlists

来源期刊

相似文章 2

An automated framework for fast cognate detection and Bayesian phylogenetic inference in computational historical linguistics

A test of Generalized Bayesian dating: A new linguistic dating method.

我的账户