Statistical Corpus and Language Comparison on Comparable Corpora

作者： Thomas Eckart , Uwe Quasthoff

DOI: 10.1007/978-3-642-20128-8_8

关键词:

摘要: With the wide availability of textual data in various languages, domains and registers it is easy to create text corpora for a variety applications. These include, among many others, field Natural Language Processing. The Leipzig Corpora Collection creates uses such more than fifteen years. However, work on preprocessing distributed resources ensure homogeneity thus comparability steady process. As result created identical formats allow use different statistical methods generate manual or automatic analysis. are basis applications intra- inter-language comparison quality assurance stocks.

springer.com 本地加速

uni-trier.de PDF 下载加速

springer.com PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(8)

Peter Grzybek, History and Methodology of Word Length Studies Springer, Dordrecht. pp. 15- 90 ,(2007) , 10.1007/978-1-4020-4068-9_2

Adam Kilgarriff, Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity between Corpora Journal of Visual Languages and Computing. ,(1997)

Duncan J. Watts, Steven H. Strogatz, Collective dynamics of small-world networks Nature. ,vol. 393, pp. 440- 442 ,(1998) , 10.1038/30918

Ted Dunning, Accurate methods for the statistics of surprise and coincidence Computational Linguistics. ,vol. 19, pp. 61- 74 ,(1993)

Y. Li, D. McLean, Z.A. Bandar, J.D. O'Shea, K. Crockett, Sentence similarity based on semantic nets and corpus statistics IEEE Transactions on Knowledge and Data Engineering. ,vol. 18, pp. 1138- 1150 ,(2006) , 10.1109/TKDE.2006.130

Yuen Ren Chao, George Kingsley Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology Language. ,vol. 26, pp. 394- ,(1950) , 10.2307/409735

W. Detmar Meurers, Markus Dickinson, Detecting annotation errors in spoken language corpora Copenhagen studies in language. pp. 53- 66 ,(2006)

Christian Biemann, Uwe Quasthoff, Matthias Richter, Corpus Portal for Search in Monolingual Corpora language resources and evaluation. pp. 1799- 1802 ,(2006)

Statistical Corpus and Language Comparison on Comparable Corpora

来源期刊

我的账户

Statistical Corpus and Language Comparison on Comparable Corpora

来源期刊

相似文章 9

我的账户