Named Entity Recognition in Tweets: An Experimental Study

作者： Alan Ritter , Sam Clark , Mausam , Oren Etzioni

DOI:

关键词: Computer science 、 Pipeline (software) 、 Named-entity recognition 、 F1 score 、 Artificial intelligence 、 Natural language processing 、 Chunking (computing) 、 Chunking (psychology)

摘要: People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F1 score compared Stanford NER system. leverages redundancy inherent tweets achieve performance, using LabeledLDA exploit Freebase dictionaries as source distant supervision. outperforms co-training, increasing 25% over ten common entity types. Our are available at: http://github.com/aritter/twitter_nlp

aclweb.org 本地加速

uni-trier.de 本地加速

uni-heidelberg.de 本地加速

aclweb.org PDF 下载加速

washington.edu PDF 下载加速

aritter.github.io PDF 下载加速

washington.edu PDF 下载加速

inesc-id.pt PDF 下载加速

acm.org LINK 下载加速

uni-heidelberg.de PDF 下载加速

aclanthology.org PDF 下载加速

参考文章(39)

Tara McIntosh, Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping empirical methods in natural language processing. pp. 356- 365 ,(2010)

Oren Etzioni, Doug Downey, Matthew Broadhead, Locating complex named entities in web text international joint conference on artificial intelligence. pp. 2733- 2739 ,(2007)

Eduard Hovy, Zornitsa Kozareva, Not All Seeds Are Equal: Measuring the Quality of Text Mining Seeds north american chapter of the association for computational linguistics. pp. 618- 626 ,(2010)

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

Eduard Hovy, Congxing Cai, Donald Metzler, Stephan Gouws, Contextual Bearing on Linguistic Variation in Social Media Proceedings of the Workshop on Language in Social Media (LSM 2011). pp. 20- 29 ,(2011)

Dustin Hillard, Sameer Singh, Chris Leggetter, Minimally-Supervised Extraction of Entities from Text Advertisements north american chapter of the association for computational linguistics. pp. 73- 81 ,(2010)

David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937

Daniel Ramage, David Hall, Ramesh Nallapati, Christopher D. Manning, Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora empirical methods in natural language processing. pp. 248- 256 ,(2009) , 10.3115/1699510.1699543

Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer, Feature-rich part-of-speech tagging with a cyclic dependency network Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 173- 180 ,(2003) , 10.3115/1073445.1073478

10.

T. L. Griffiths, M. Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5228- 5235 ,(2004) , 10.1073/PNAS.0307752101

Named Entity Recognition in Tweets: An Experimental Study

来源期刊

我的账户

Named Entity Recognition in Tweets: An Experimental Study

来源期刊

相似文章 10

我的账户