Evaluating unsupervised learning for natural language processing tasks

作者: Andreas Vlachos

DOI:

关键词:

摘要: The development of unsupervised learning methods for natural language processing tasks has become an important and popular area research. primary advantage these is that they do not require annotated data to learn a model. However, this makes them difficult evaluate against manually labeled gold standard. Using part-of-speech tagging as our case study, we discuss the reasons render evaluation paradigm unsuitable methods. Instead, argue rarely used in-context more appropriate informative, it takes into account way are likely be applied. Finally, bearing issue in mind, propose directions future work processing.

参考文章(35)
Chris Biemann, A. Gliozzo, C. Giuliano, Unsupervised Part of Speech Tagging Supporting Supervised Methods ,(2007)
Benjamin Recht, Xiaojin Zhu, Mark Craven, David Andrzejewski, A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic international joint conference on artificial intelligence. pp. 1171- 1177 ,(2011) , 10.5591/978-1-57735-516-8/IJCAI11-200
Sajib Dasgupta, Vincent Ng, Mining Clustering Dimensions international conference on machine learning. pp. 263- 270 ,(2010)
Javier Artiles, Andrew Borthwick, Satoshi Sekine, Julio Gonzalo, Enrique Amigó, WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks. CLEF (Notebook Papers/LABs/Workshops). ,(2010)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Daniel Ramage, David Hall, Ramesh Nallapati, Christopher D. Manning, Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora empirical methods in natural language processing. pp. 248- 256 ,(2009) , 10.3115/1699510.1699543
Rion Snow, Brendan O'Connor, Daniel Jurafsky, Andrew Y. Ng, Cheap and fast---but is it good? Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 254- 263 ,(2008) , 10.3115/1613715.1613751
Roi Reichart, Ari Rappoport, The NVI Clustering Evaluation Measure conference on computational natural language learning. pp. 165- 173 ,(2009) , 10.3115/1596374.1596401
Asa Ben-Hur, Andre Elisseeff, Isabelle Guyon, A stability based method for discovering structure in clustered data. pacific symposium on biocomputing. pp. 6- 17 ,(2001) , 10.1142/9789812799623_0002
Alexander Clark, Combining distributional and morphological information for part of speech induction conference of the european chapter of the association for computational linguistics. pp. 59- 66 ,(2003) , 10.3115/1067807.1067817