Long-range correlations and burstiness in written texts: Universal and language-specific aspects

作者： Vassilios Constantoudis , Maria Kalimeri , Fotis Diakonos , Konstantinos Karamanos , Constantinos Papadimitriou

关键词:

摘要: Recently, methods from the statistical physics of complex systems have been applied successfully to identify universal features in long-range correlations (LRCs) written texts. However, real texts, these are being intermingled with language-specific influences. This paper aims at characterization and further understanding interplay between effects on LRCs To this end, we apply language-sensitive mapping texts word-length series (wls) analyse large parallel (of same content) corpora 10 languages classified four families (Romanic, Germanic, Greek Uralic). The autocorrelation functions wls reveal tiny but persistent decaying scales following a power-law language-independent exponent ∼0.60–0.65. impact language is displayed amplitude where relative standard deviation >40% among analyzed observed. classification seems play significant role since, Finnish Germanic exhibit more than Roman families. origins LRCs, focus long words perform burst correlation analysis their positions along corpora. We find that linked inter-long word distances while aspects related distributions.

worldscientific.com 本地加速

harvard.edu 本地加速

worldscientific.com 本地加速

sci-hub.se PDF 下载加速

参考文章(22)

MARCELO A. MONTEMURRO, PEDRO A. PURY, LONG-RANGE FRACTAL CORRELATIONS IN LITERARY CORPORA Fractals. ,vol. 10, pp. 451- 461 ,(2002) , 10.1142/S0218348X02001257

Marcelo Montemurro, Damián Zanette, The statistics of meaning: Darwin, Gibbon and Moby Dick Significance. ,vol. 6, pp. 165- 169 ,(2009) , 10.1111/J.1740-9713.2009.00390.X

J. P. Herrera, P. A. Pury, Statistical keyword detection in literary corpora European Physical Journal B. ,vol. 63, pp. 135- 146 ,(2008) , 10.1140/EPJB/E2008-00206-X

Marcelo A. Montemurro, Quantifying the information in the long-range order of words: semantic structures and universal linguistic constraints. Cortex. ,vol. 55, pp. 5- 16 ,(2014) , 10.1016/J.CORTEX.2013.08.008

S. S. Melnyk, O. V. Usatenko, V. A. Yampol’skii, V. A. Golick, Competition between two kinds of correlations in literary texts. Physical Review E. ,vol. 72, pp. 026140- 026140 ,(2005) , 10.1103/PHYSREVE.72.026140

M Ortuño, P Carpena, P Bernaola-Galván, E Muñoz, A. M Somoza, Keyword detection in natural languages and DNA EPL. ,vol. 57, pp. 759- 764 ,(2002) , 10.1209/EPL/I2002-00528-3

Eduardo G. Altmann, Janet B. Pierrehumbert, Adilson E. Motter, Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words PLoS ONE. ,vol. 4, pp. e7678- ,(2009) , 10.1371/JOURNAL.PONE.0007678

E. Rodriguez, M. Aguilar-Cornejo, R. Femat, J. Alvarez-Ramirez, Scale and time dependence of serial correlations in word-length time series of written texts Physica A-statistical Mechanics and Its Applications. ,vol. 414, pp. 378- 386 ,(2014) , 10.1016/J.PHYSA.2014.07.063

Werner Ebeling, Alexander Neiman, Long-range correlations between letters and sentences in texts Physica A-statistical Mechanics and Its Applications. ,vol. 215, pp. 233- 241 ,(1995) , 10.1016/0378-4371(95)00025-3

10.

Luigi Palatella, Paolo Allegrini, Paolo Grigolini, Paolo Grigolini, Intermittency and scale-free networks: a dynamical model for human language complexity Chaos Solitons & Fractals. ,vol. 20, pp. 95- 105 ,(2004) , 10.1016/S0960-0779(03)00432-6

Long-range correlations and burstiness in written texts: Universal and language-specific aspects

来源期刊

我的账户

Long-range correlations and burstiness in written texts: Universal and language-specific aspects

来源期刊

相似文章 2

Multifractal correlations in natural language written texts: Effects of language family and long word statistics

Enriching feature engineering for short text samples by language time series analysis

我的账户