Identifying Personal Narratives in Chinese Weblog Posts

作者: Wenji Mao , Andrew S. Gordon , Kenji Sagae , Wen Chen , Luwen Huangfu

DOI:

关键词:

摘要: Automated text classification technologies have enabled researchers to amass enormous collections of personal narratives posted English-language weblogs. In this paper, we explore analogous approaches identify in Chinese weblog posts as a precursor the future empirical studies cross-cultural differences narrative structure. We describe collection over half million from popular hosting service, and manual annotation story nonstory content sampled posts. Using supervised machine learning methods, developed an automated classifier for posts, achieving accuracy comparable previous work English. classifier, automatically sixty-four thousand use analyses Chinese-language applications corpora.

参考文章(25)
Daniel Jurafsky, Christopher D. Manning, Huihsin Tseng, Morphological features help POS tagging of unknown words across language varieties. Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. ,(2005)
Eric David Brill, A corpus-based approach to language learning University of Pennsylvania. ,(1993)
Richard Sproat, Chilin Shih, William Gale, Nancy Chang, A stochastic finite-state word-segmentation algorithm for Chinese Computational Linguistics. ,vol. 22, pp. 377- 404 ,(1996)
Andrew S. Gordon, Christopher Wienberg, Sara Owsley Sood, Different Strokes of Different Folks: Searching for Health Narratives in Weblogs privacy security risk and trust. pp. 490- 495 ,(2012) , 10.1109/SOCIALCOM-PASSAT.2012.43
Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer, Feature-rich part-of-speech tagging with a cyclic dependency network Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 173- 180 ,(2003) , 10.3115/1073445.1073478
WILLIAM C. MANN, SANDRA A. THOMPSON, Rhetorical Structure Theory : Toward a Functional Theory of Text Organization Text - Interdisciplinary Journal for the Study of Discourse. ,vol. 8, pp. 243- 281 ,(1988) , 10.1515/TEXT.1.1988.8.3.243
Kenneth Ward Church, A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text conference on applied natural language processing. pp. 136- 143 ,(1988) , 10.3115/974235.974260
Vincent J. Della Pietra, Adam L. Berger, Stephen A. Della Pietra, A maximum entropy approach to natural language processing Computational Linguistics. ,vol. 22, pp. 39- 71 ,(1996) , 10.5555/234285.234289