CLDFBench: Give your cross-linguistic data a lift

作者: Robert Forkel , Johann Mattis List

DOI: 10.17613/8T0E-W639

关键词:

摘要: While the amount of cross-linguistic data is constantly increasing, most datasets produced today and in past cannot be considered FAIR (findable, accessible, interoperable, reproducible). To remedy this to increase comparability resources, it not enough set up standards best practices for collected future. We also need consistent workflows “retro-standardization” that has been published during decades centuries. With Cross-Linguistic Data Formats initiative, first have presented successfully tested. So far, however, CLDF creation was hampered by fact required a considerable degree computational proficiency. cldfbench, we introduce framework retro-standardization legacy curation new drastically simplifies providing consistent, reproducible workflow rigorously supports version control long term archiving research code. The distributed form Python package along with usage information examples practice. This study introduces illustrates how can applied showing resource containing structural lexical Sinitic languages efficiently retro-standardized analyzed.

参考文章(23)
Robert Östling, Bayesian Word Alignment for Massively Parallel Texts conference of the european chapter of the association for computational linguistics. pp. 123- 127 ,(2014) , 10.3115/V1/E14-4024
Matthew S Dryer, Martin Haspelmath, None, The World Atlas of Language Structures Online Max Planck Digital Library. ,(2013)
Bernard Comrie, The Intercontinental Dictionary Series Max Planck Institute for Evolutionary Anthropology, Leipzig. ,(2011)
D. H. Huson, SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics. ,vol. 14, pp. 68- 73 ,(1998) , 10.1093/BIOINFORMATICS/14.1.68
David R. Maddison, David L. Swofford, Wayne P. Maddison, Nexus: An extensible file format for systematic information Systematic Biology. ,vol. 46, pp. 590- 621 ,(1997) , 10.1093/SYSBIO/46.4.590
W. D. Lewis, F. Xia, Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages Literary and Linguistic Computing. ,vol. 25, pp. 303- 319 ,(2010) , 10.1093/LLC/FQQ006
Simon J. Greenhill, Robert Blust, Russell D. Gray, The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics Evolutionary Bioinformatics. ,vol. 4, pp. 271- 283 ,(2008) , 10.4137/EBO.S893
Erik M van Mulligen, Philippe Rocca-Serra, Maryann E Martone, Mercè Crosas, Erik A Schultes, Olivier Dumon, Morris A Swertz, Ingrid Dillo, Barend Mons, Barend Mons, Susanna-Assunta Sansone, Jeffrey S. Grethe, Jan Velterop, Scott Edmunds, Bengt L. Persson, Andra Waagmeester, Johan van der Lei, Anthony J. Brookes, Michel Dumontier, Thierry Sengstag, Jildau Bouwman, Abel L Packer, Alejandra González-Beltrán, Peter A.C ’t Hoen, Luiz Bonino da Silva Santos, Chris T. Evelo, Jaap Heringa, IJsbrand Jan Aalbersberg, Niklas Blomberg, Carole Goble, Alasdair J.G. Gray, Joost N. Kok, Richard Finkers, Philip E. Bourne, Mark Thompson, Rob W.W. Hooft, Tobias Kuhn, Jan-Willem Boiten, Paul Groth, Jun Zhao, Albert Mons, Ruben Kok, Arie Baak, Tim Clark, Gabrielle Appleton, Rene van Schaik, Mark D Wilkinson, Scott J Lusher, George Strawn, Marco Roos, Peter Wittenburg, Ted Slater, Katherine Wolstencroft, Myles Axton, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data. ,vol. 3, pp. 160018- 160018 ,(2016) , 10.1038/SDATA.2016.18
Sébastien Flavier, Christophe Coupé, Egidio Marsico, François Pellegrino, Ian Maddieson, LAPSyD: Lyon-Albuquerque Phonological Systems Database conference of the international speech communication association. pp. 3022- 3026 ,(2013)