Large-Scale Analysis of Zipf’s Law in English Texts

作者： Isabel Moreno-Sánchez , Francesc Font-Clos , Álvaro Corral

关键词: Computer science 、 Artificial intelligence 、 Zipf's law 、 Point (typography) 、 Natural language processing 、 Cumulative distribution function 、 Statistical significance 、 Probability distribution 、 Probability density function 、 Random variable 、 Monte Carlo method

摘要: Despite being a paradigm of quantitative linguistics, Zipf’s law for words suffers from three main problems: its formulation is ambiguous, validity has not been tested rigorously statistical point view, and it confronted to representatively large number texts. So, we can summarize the current support in texts as anecdotic. We try solve these issues by studying different versions fitting them all available English Project Gutenberg database (consisting more than 30 000 texts). To do so use state-of-the art tools goodness-of-fit tests, carefully tailored peculiarities text statistics. Remarkably, one law, consisting pure power-law form complementary cumulative distribution function word frequencies, able fit 40% (at 0.05 significance level), whole domain frequencies (from 1 maximum value), with only free parameter (the exponent).

nih.gov LINK 下载加速

arxiv.org PDF 下载加速

core.ac.uk UNKNOWN 下载加速

sci-hub.st HTML 下载加速

参考文章(65)

Damián H. Zanette, Statistical Patterns in Written Language arXiv: Computation and Language. ,(2014)

M Backmann, None, Lectu re Notes in Economics and Mathematical Systems ,(1975)

Corpus linguistics : an international handbook Walter de Gruyter. ,(2009) , 10.1515/9783110213881.2

Neil J. Salkind, Encyclopedia of Measurement and Statistics ,(2006)

Francesc Font-Clos, Álvaro Corral, Log-Log Convexity of Type-Token Growth in Zipf's Systems Physical Review Letters. ,vol. 114, pp. 238701- 238701 ,(2015) , 10.1103/PHYSREVLETT.114.238701

Pamela Morris, Yudi Pawitan, In all likelihood : statistical modelling and inference using likelihood The Mathematical Gazette. ,vol. 86, pp. 375- 376 ,(2002) , 10.2307/3621915

Jake Ryland Williams, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds, Text mixing shapes the anatomy of rank-frequency distributions Physical Review E. ,vol. 91, pp. 052811- ,(2015) , 10.1103/PHYSREVE.91.052811

Álvaro Corral, Gemma Boleda, Ramon Ferrer-i-Cancho, Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts PLOS ONE. ,vol. 10, pp. 1- 23 ,(2015) , 10.1371/JOURNAL.PONE.0129031

Ramon Ferrer-i-Cancho, Anna Deluca, Alvaro Corral, A practical recipe to fit discrete power-law distributions arXiv: Applications. ,(2012)

10.

Andrey Kolmogorov, Foundations of the theory of probability ,(1960)

Large-Scale Analysis of Zipf’s Law in English Texts

来源期刊

我的账户

Large-Scale Analysis of Zipf’s Law in English Texts

来源期刊

相似文章 10

我的账户