Experiments with Sentence Classification

作者: Anthony Khoo , Yuval Marom , David Albrecht

DOI:

关键词:

摘要: We present a set of experiments involving sentence classification, addressing issues representation and feature selection, we compare our findings with similar results from work on the more general text classification task. The domain investigation is an email-based help-desk corpus. Our investigations use various popular algorithms selection methods. highlight similarities between such as superiority Support Vector Machines, well differences, lesser extent usefulness features detrimental effect common preprocessing techniques (stop-word removal lemmatization).

参考文章(18)
William Cohen, Vitor Carvalho, Tom Mitchell, None, Learning to Classify Email into "Speech Acts". empirical methods in natural language processing. pp. 309- 316 ,(2004)
Padmini Srinivasan, Larry McKnight, Categorization of sentence types in medical abstracts. american medical informatics association annual symposium. ,vol. 2003, pp. 440- 444 ,(2003)
Eduard H. Hovy, Liang Zhou, Miruna Ticrea, Multi-document Biography Summarization empirical methods in natural language processing. pp. 434- 441 ,(2005)
Sam Scott, Stan Matwin, Feature Engineering for Text Classification international conference on machine learning. pp. 379- 388 ,(1999)
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
Simon Corston-Oliver, Michael Gamon, Richard Campbell, Eric Ringger, Task-Focused Summarization of Email Text Summarization Branches Out. pp. 43- 50 ,(2004)
Ana Cardoso-Cachopo, Arlindo L. Oliveira, An Empirical Comparison of Text Categorization Methods string processing and information retrieval. pp. 183- 196 ,(2003) , 10.1007/978-3-540-39984-1_14
D. Jurafsky, R. Bates, N. Coccaro, R. Martin, M. Meteer, K. Ries, E. Shriberg, A. Stolcke, P. Taylor, C. Van Ess-Dykema, Automatic detection of discourse structure for speech recognition and understanding ieee automatic speech recognition and understanding workshop. pp. 88- 95 ,(1997) , 10.1109/ASRU.1997.658992
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Evgeniy Gabrilovich, Shaul Markovitch, Text categorization with many redundant features Twenty-first international conference on Machine learning - ICML '04. pp. 41- ,(2004) , 10.1145/1015330.1015388