作者: Thomas Eckart , Uwe Quasthoff
DOI: 10.1007/978-3-642-20128-8_8
关键词:
摘要: With the wide availability of textual data in various languages, domains and registers it is easy to create text corpora for a variety applications. These include, among many others, field Natural Language Processing. The Leipzig Corpora Collection creates uses such more than fifteen years. However, work on preprocessing distributed resources ensure homogeneity thus comparability steady process. As result created identical formats allow use different statistical methods generate manual or automatic analysis. are basis applications intra- inter-language comparison quality assurance stocks.