Exploiting the Human Computational Effort Dedicated to Message Reply Formatting for Training Discursive Email Segmenters

作者: Nicolas Hernandez , Soufian Salim

DOI: 10.3115/V1/W14-4917

关键词:

摘要: In the context of multi-domain and multimodal online asynchronous discussion analysis, we propose an innovative strategy for manual annotation dialog act (DA) segments. The process aims at supporting analysis messages in terms DA. Our objective is to train a sequence labelling system detect segment boundaries. originality proposed approach avoid manually annotating training data instead exploit human computational efforts dedicated message reply formatting when writer replies by inserting his response just after quoted text appropriate intervention. We describe approach, new electronic mail corpus report evaluation segmentation models built.

参考文章(24)
William A. Gale, Kenneth W. Church, A program for aligning sentences in bilingual corpora Computational Linguistics. ,vol. 19, pp. 75- 102 ,(1993) , 10.5555/972450.972455
Abraham Bookstein, Vladimir A. Kulyukin, Timo Raita, Generalized Hamming Distance Information Retrieval. ,vol. 5, pp. 353- 375 ,(2002) , 10.1023/A:1020499411651
Doug Beeferman, Adam Berger, John Lafferty, Statistical Models for Text Segmentation Machine Learning. ,vol. 34, pp. 177- 210 ,(1999) , 10.1023/A:1007506220214
Bryan Klimt, Yiming Yang, The enron corpus: a new dataset for email classification research european conference on machine learning. pp. 217- 226 ,(2004) , 10.1007/978-3-540-30115-8_22
Marti A. Hearst, TextTiling: segmenting text into multi-paragraph subtopic passages Computational Linguistics. ,vol. 23, pp. 33- 64 ,(1997)
Fabien Poulard, Nicolas Hernandez, Béatrice Daille, Detecting Derivatives using Specific and Invariant Descriptors Polytech. Open Libr. Int. Bull. Inf. Technol. Sci.. ,vol. 43, pp. 7- 13 ,(2011) , 10.17562/PB-43-1
Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer, Feature-rich part-of-speech tagging with a cyclic dependency network Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 173- 180 ,(2003) , 10.3115/1073445.1073478
Lev Ratinov, Dan Roth, Design Challenges and Misconceptions in Named Entity Recognition conference on computational natural language learning. pp. 147- 155 ,(2009) , 10.3115/1596374.1596399