作者: Karin Murthy , Deepak P. , Prasad M. Deshpande
DOI: 10.1007/978-3-642-35063-4_33
关键词:
摘要: Learning or writing regular expressions to identify instances of a specific concept within text documents with high precision and recall is challenging. It relatively easy improve the an initial expression by identifying false positives covered tweaking avoid positives. However, modifying difficult since negatives can only be identified manually analyzing all documents, in absence any tools missing instances. We focus on partially automating discovery soliciting minimal user feedback. present technique good generalizations that have improved while retaining precision. empirically demonstrate effectiveness proposed as compared existing methods show results for variety tasks such identification dates, phone numbers, product names, course numbers real world datasets.