Quantitative analysis of large amounts of journalistic texts using topic modelling

作者: Carina Jacobi , Wouter van Atteveldt , Kasper Welbers

DOI: 10.1080/21670811.2015.1093271

关键词: Nuclear technologyContent analysisJournalismQuantitative analysis (finance)Data scienceWarrantComputer scienceFace (sociological concept)Latent Dirichlet allocationTopic model

摘要: The huge collections of news content which have become available through digital technologies both enable and warrant scientific inquiry, challenging journalism scholars to analyse unprecedented amounts texts. We propose Latent Dirichlet Allocation (LDA) topic modelling as a tool face this challenge. LDA is cutting edge technique for analysis, designed automatically organize large archives documents based on latent topics, measured patterns word (co-)occurrence. explain how works, different choices by the researcher affect results can be meaningfully interpreted. To demonstrate its usefulness research, we conducted case study New York Times coverage nuclear technology from 1945 present, partially replicating Gamson Modigliani. This shows that useful analysing trends in relatively quickly.

参考文章(18)
Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning, Generating Typed Dependency Parses from Phrase Structure Parses language resources and evaluation. pp. 449- 454 ,(2006)
Björn Burscher, Daan Odijk, Rens Vliegenthart, Maarten de Rijke, Claes H. de Vreese, Teaching the Computer to Code Frames in News: Comparing Two Supervised Machine Learning Approaches to Frame Analysis Communication Methods and Measures. ,vol. 8, pp. 190- 206 ,(2014) , 10.1080/19312458.2014.937527
Robert M. Entman, Framing: Toward Clarification of a Fractured Paradigm Journal of Communication. ,vol. 43, pp. 51- 58 ,(1993) , 10.1111/J.1460-2466.1993.TB01304.X
Jrg Matthes, Matthias Kohring, The Content Analysis of Media Frames: Toward Improving Reliability and Validity Journal of Communication. ,vol. 58, pp. 258- 279 ,(2008) , 10.1111/J.1460-2466.2008.00384.X
Eduardo H. Ramirez, Ramon Brena, Davide Magatti, Fabio Stella, Topic model validation Neurocomputing. ,vol. 76, pp. 125- 133 ,(2012) , 10.1016/J.NEUCOM.2011.04.032
William A. Gamson, Andre Modigliani, Media Discourse and Public Opinion on Nuclear Power: A Constructionist Approach American Journal of Sociology. ,vol. 95, pp. 1- 37 ,(1989) , 10.1086/229213
David M. Blei, John D. Lafferty, Dynamic topic models Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 113- 120 ,(2006) , 10.1145/1143844.1143859
Justin Grimmer, Brandon M. Stewart, Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts Political Analysis. ,vol. 21, pp. 267- 297 ,(2013) , 10.1093/PAN/MPS028
Margaret E. Roberts, Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, David G. Rand, Structural Topic Models for Open-Ended Survey Responses American Journal of Political Science. ,vol. 58, pp. 1064- 1082 ,(2014) , 10.1111/AJPS.12103
Chenghua Lin, Yulan He, Joint sentiment/topic model for sentiment analysis conference on information and knowledge management. pp. 375- 384 ,(2009) , 10.1145/1645953.1646003