作者: Kalina Bontcheva , Niraj Aswani , Leon Derczynski , Adam Funk , Diana Maynard
DOI: 10.6084/M9.FIGSHARE.1003767.V2
关键词:
摘要: Twitter is the largest source of microblog text, responsible for gigabytes human discourse every day. Processing text difficult: genre noisy, documents have little context, and utterances are very short. As such, conventional NLP tools fail when faced with tweets other text. We present TwitIE, an open-source pipeline customised to at stage. Additionally, it includes Twitter-specific data import metadata handling. This paper introduces each stage TwitIE pipeline, which a modification GATE ANNIE news An evaluation against some state-of-the-art systems also presented.