TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text

作者： Kalina Bontcheva , Niraj Aswani , Leon Derczynski , Adam Funk , Diana Maynard

关键词:

摘要: Twitter is the largest source of microblog text, responsible for gigabytes human discourse every day. Processing text difficult: genre noisy, documents have little context, and utterances are very short. As such, conventional NLP tools fail when faced with tweets other text. We present TwitIE, an open-source pipeline customised to at stage. Additionally, it includes Twitter-specific data import metadata handling. This paper introduces each stage TwitIE pipeline, which a modification GATE ANNIE news An evaluation against some state-of-the-art systems also presented.

参考文章(29)

Deepayan Chakrabarti, Kunal Punera, Event Summarization Using Tweets international conference on weblogs and social media. ,(2011)

Patrick Paroubek, Alexander Pak, Twitter as a Corpus for Sentiment Analysis and Opinion Mining language resources and evaluation. ,(2010)

Robert J. Gaizauskas, Mark Hepple, Yikun Guo, Angus Roberts, Combining Terminology Resources and Statistical Methods for Entity Recognition: an Evaluation language resources and evaluation. ,(2008)

Paul Cook, Timothy Baldwin, Bo Han, Automatically Constructing a Normalisation Dictionary for Microblogs empirical methods in natural language processing. pp. 421- 432 ,(2012)

J.M. Trenkle, W.B. Cavnar, N-gram-based text categorization ,(1994)

Mor Naaman, Hila Becker, Luis Gravano, Beyond Trending Topics: Real-World Event Identification on Twitter international conference on weblogs and social media. ,(2011) , 10.7916/D81V5NVX

Pascal Hitzler, Krzysztof Janowicz, None, Semantic Web - Interoperability, Usability, Applicability Social Work. ,vol. 1, pp. 1- 2 ,(2010) , 10.3233/SW-2010-0017

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

Simon Carter, Wouter Weerkamp, Manos Tsagkias, Microblog language identification: overcoming the limitations of short, unedited and idiomatic text language resources and evaluation. ,vol. 47, pp. 195- 215 ,(2013) , 10.1007/S10579-012-9195-Y

10.

Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer, Feature-rich part-of-speech tagging with a cyclic dependency network Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 173- 180 ,(2003) , 10.3115/1073445.1073478

TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text

来源期刊

我的账户

TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text

来源期刊

相似文章 10

我的账户