On variation of word frequencies in Russian literary texts

作者: Vladislav Kargin

DOI: 10.1016/J.PHYSA.2015.11.014

关键词:

摘要: Abstract We study the variation of word frequencies in Russian literary texts. Our findings indicate that standard deviation a word’s frequency across texts depends on its average according to power law with exponent 1 2 α , which shows rarer words have relatively larger degree volatility (that is, higher “burstiness”). A latent factor model has been estimated investigate structure distribution. The suggest dependence can be explained by asymmetry distribution factors.

参考文章(18)
Damián H. Zanette, Statistical Patterns in Written Language arXiv: Computation and Language. ,(2014)
Eduardo G. Altmann, Martin Gerlach, Statistical Laws in Linguistics Lecture Notes in Morphogenesis. pp. 7- 26 ,(2016) , 10.1007/978-3-319-24403-7_2
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Shlomo Argamon, Jonathan Schler, Moshe Koppel, Computational methods in authorship attribution Journal of the Association for Information Science and Technology. ,vol. 60, pp. 9- 26 ,(2009) , 10.1002/ASI.V60:1
Steven T. Piantadosi, Zipf's word frequency law in natural language: a critical review and future directions. Psychonomic Bulletin & Review. ,vol. 21, pp. 1112- 1130 ,(2014) , 10.3758/S13423-014-0585-6
Thomas Hofmann, Probabilistic latent semantic indexing international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 50- 57 ,(1999) , 10.1145/3130348.3130370
D. I. HOLMES, The Evolution of Stylometry in Humanities Scholarship Literary and Linguistic Computing. ,vol. 13, pp. 111- 117 ,(1998) , 10.1093/LLC/13.3.111
Efstathios Stamatatos, A survey of modern authorship attribution methods Journal of the Association for Information Science and Technology. ,vol. 60, pp. 538- 556 ,(2009) , 10.1002/ASI.V60:3