Long-range correlations and burstiness in written texts: Universal and language-specific aspects

作者: Vassilios Constantoudis , Maria Kalimeri , Fotis Diakonos , Konstantinos Karamanos , Constantinos Papadimitriou

DOI: 10.1142/S0217979215410052

关键词:

摘要: Recently, methods from the statistical physics of complex systems have been applied successfully to identify universal features in long-range correlations (LRCs) written texts. However, real texts, these are being intermingled with language-specific influences. This paper aims at characterization and further understanding interplay between effects on LRCs To this end, we apply language-sensitive mapping texts word-length series (wls) analyse large parallel (of same content) corpora 10 languages classified four families (Romanic, Germanic, Greek Uralic). The autocorrelation functions wls reveal tiny but persistent decaying scales following a power-law language-independent exponent ∼0.60–0.65. impact language is displayed amplitude where relative standard deviation >40% among analyzed observed. classification seems play significant role since, Finnish Germanic exhibit more than Roman families. origins LRCs, focus long words perform burst correlation analysis their positions along corpora. We find that linked inter-long word distances while aspects related distributions.

参考文章(22)
MARCELO A. MONTEMURRO, PEDRO A. PURY, LONG-RANGE FRACTAL CORRELATIONS IN LITERARY CORPORA Fractals. ,vol. 10, pp. 451- 461 ,(2002) , 10.1142/S0218348X02001257
Marcelo Montemurro, Damián Zanette, The statistics of meaning: Darwin, Gibbon and Moby Dick Significance. ,vol. 6, pp. 165- 169 ,(2009) , 10.1111/J.1740-9713.2009.00390.X
J. P. Herrera, P. A. Pury, Statistical keyword detection in literary corpora European Physical Journal B. ,vol. 63, pp. 135- 146 ,(2008) , 10.1140/EPJB/E2008-00206-X
S. S. Melnyk, O. V. Usatenko, V. A. Yampol’skii, V. A. Golick, Competition between two kinds of correlations in literary texts. Physical Review E. ,vol. 72, pp. 026140- 026140 ,(2005) , 10.1103/PHYSREVE.72.026140
M Ortuño, P Carpena, P Bernaola-Galván, E Muñoz, A. M Somoza, Keyword detection in natural languages and DNA EPL. ,vol. 57, pp. 759- 764 ,(2002) , 10.1209/EPL/I2002-00528-3
Eduardo G. Altmann, Janet B. Pierrehumbert, Adilson E. Motter, Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words PLoS ONE. ,vol. 4, pp. e7678- ,(2009) , 10.1371/JOURNAL.PONE.0007678
E. Rodriguez, M. Aguilar-Cornejo, R. Femat, J. Alvarez-Ramirez, Scale and time dependence of serial correlations in word-length time series of written texts Physica A-statistical Mechanics and Its Applications. ,vol. 414, pp. 378- 386 ,(2014) , 10.1016/J.PHYSA.2014.07.063
Werner Ebeling, Alexander Neiman, Long-range correlations between letters and sentences in texts Physica A-statistical Mechanics and Its Applications. ,vol. 215, pp. 233- 241 ,(1995) , 10.1016/0378-4371(95)00025-3
Luigi Palatella, Paolo Allegrini, Paolo Grigolini, Paolo Grigolini, Intermittency and scale-free networks: a dynamical model for human language complexity Chaos Solitons & Fractals. ,vol. 20, pp. 95- 105 ,(2004) , 10.1016/S0960-0779(03)00432-6