作者: Vassilios Constantoudis , Maria Kalimeri , Fotis Diakonos , Konstantinos Karamanos , Constantinos Papadimitriou
DOI: 10.1142/S0217979215410052
关键词:
摘要: Recently, methods from the statistical physics of complex systems have been applied successfully to identify universal features in long-range correlations (LRCs) written texts. However, real texts, these are being intermingled with language-specific influences. This paper aims at characterization and further understanding interplay between effects on LRCs To this end, we apply language-sensitive mapping texts word-length series (wls) analyse large parallel (of same content) corpora 10 languages classified four families (Romanic, Germanic, Greek Uralic). The autocorrelation functions wls reveal tiny but persistent decaying scales following a power-law language-independent exponent ∼0.60–0.65. impact language is displayed amplitude where relative standard deviation >40% among analyzed observed. classification seems play significant role since, Finnish Germanic exhibit more than Roman families. origins LRCs, focus long words perform burst correlation analysis their positions along corpora. We find that linked inter-long word distances while aspects related distributions.