Improving recall of regular expressions for information extraction

作者： Karin Murthy , Deepak P. , Prasad M. Deshpande

关键词:

摘要: Learning or writing regular expressions to identify instances of a specific concept within text documents with high precision and recall is challenging. It relatively easy improve the an initial expression by identifying false positives covered tweaking avoid positives. However, modifying difficult since negatives can only be identified manually analyzing all documents, in absence any tools missing instances. We focus on partially automating discovery soliciting minimal user feedback. present technique good generalizations that have improved while retaining precision. empirically demonstrate effectiveness proposed as compared existing methods show results for variety tasks such identification dates, phone numbers, product names, course numbers real world datasets.

springer.com 本地加速

uni-trier.de 本地加速

springer.com 本地加速

researchgate.net LINK 下载加速

sci-hub.st HTML 下载加速

参考文章(16)

Yiming Yang, Bryan Klimt, Introducing the Enron Corpus. conference on email and anti-spam. ,(2004)

Tianhao Wu, William M. Pottenger, A semi-supervised active learning algorithm for information extraction from textual data: Research Articles intelligence and security informatics. ,vol. 56, pp. 258- 271 ,(2005) , 10.1002/ASI.V56:3

Fabio Ciravegna, Adaptive information extraction from text by rule induction and generalisation international joint conference on artificial intelligence. pp. 1251- 1256 ,(2001)

François Denis, Learning Regular Languages from Simple Positive Examples Machine Learning. ,vol. 44, pp. 37- 66 ,(2001) , 10.1023/A:1010826628977

Judea Pearl, Heuristics : intelligent search strategies for computer problem solving ,(1984)

Douglas E. Appelt, Introduction to information extraction Ai Communications. ,vol. 12, pp. 161- 172 ,(1999) , 10.5555/1216155.1216161

Tom M Mitchell, None, Generalization as search Artificial Intelligence. ,vol. 18, pp. 203- 226 ,(1982) , 10.1016/0004-3702(82)90040-6

Tianhao Wu, William M. Pottenger, A semi‐supervised active learning algorithm for information extraction from textual data Journal of the Association for Information Science and Technology. ,vol. 56, pp. 258- 271 ,(2005) , 10.1002/ASI.20119

Rohit Babbar, Nidhi Singh, Clustering based approach to learning regular expressions over large alphabet for noisy unstructured text Proceedings of the fourth workshop on Analytics for noisy unstructured text data - AND '10. pp. 43- 50 ,(2010) , 10.1145/1871840.1871848

10.

Henning Fernau, Algorithms for learning regular expressions from positive data Information & Computation. ,vol. 207, pp. 521- 541 ,(2009) , 10.1016/J.IC.2008.12.008

Improving recall of regular expressions for information extraction

来源期刊

我的账户

Improving recall of regular expressions for information extraction

来源期刊

相似文章 10

我的账户