作者: Wenji Mao , Andrew S. Gordon , Kenji Sagae , Wen Chen , Luwen Huangfu
DOI:
关键词:
摘要: Automated text classification technologies have enabled researchers to amass enormous collections of personal narratives posted English-language weblogs. In this paper, we explore analogous approaches identify in Chinese weblog posts as a precursor the future empirical studies cross-cultural differences narrative structure. We describe collection over half million from popular hosting service, and manual annotation story nonstory content sampled posts. Using supervised machine learning methods, developed an automated classifier for posts, achieving accuracy comparable previous work English. classifier, automatically sixty-four thousand use analyses Chinese-language applications corpora.