作者: Robert Forkel , Johann Mattis List
DOI: 10.17613/8T0E-W639
关键词:
摘要: While the amount of cross-linguistic data is constantly increasing, most datasets produced today and in past cannot be considered FAIR (findable, accessible, interoperable, reproducible). To remedy this to increase comparability resources, it not enough set up standards best practices for collected future. We also need consistent workflows “retro-standardization” that has been published during decades centuries. With Cross-Linguistic Data Formats initiative, first have presented successfully tested. So far, however, CLDF creation was hampered by fact required a considerable degree computational proficiency. cldfbench, we introduce framework retro-standardization legacy curation new drastically simplifies providing consistent, reproducible workflow rigorously supports version control long term archiving research code. The distributed form Python package along with usage information examples practice. This study introduces illustrates how can applied showing resource containing structural lexical Sinitic languages efficiently retro-standardized analyzed.