The use of web-based statistics to validate, information extraction

作者： Oren Etzioni , Daniel S. Weld , Tal Shaked , Stephen Soderland

DOI:

关键词:

摘要: The World Wide Web is a powerful and readily available text corpus that can be used effectively to validate the output of an information extraction system. We present experiments explore how pointwise mutual (PMI) from search engine hit counts in Assessor module assigns probability extracted fact or relationship correct, thus boosting precision. find thresholding on PMI scores more effective creating features for than using density models. Bootstrapping finding both positive negative seeds train Assessor, performing better hand-tagging sample actual extractions.

sri.com PDF 下载加速

aaai.org PDF 下载加速

washington.edu PDF 下载加速

talshaked.com PDF 下载加速

参考文章(12)

Ellen Riloff, Rosie Jones, Learning dictionaries for information extraction by multi-level bootstrapping national conference on artificial intelligence. pp. 474- 479 ,(1999)

dave beckett, World Wide Web Conference 2004 Ariadne. ,(2004)

Sergey Brin, Extracting Patterns and Relations from the World Wide Web Lecture Notes in Computer Science. pp. 172- 183 ,(1999) , 10.1007/10704656_11

Michael Cafarella, Oren Etzioni, Daniel S. Weld, Tal Shaked, Stephen Soderland, Alexander Yates, Doug Downey, Ana-Maria Popescu, Methods for domain-independent information extraction from the web: an experimental comparison national conference on artificial intelligence. pp. 391- 398 ,(2004)

Oren Etzioni, Daniel S. Weld, Stephen Soderland, Doug Downey, Learning text patterns for web information extraction and assessment national conference on artificial intelligence. pp. 50- 55 ,(2004)

Peter D. Turney, Mining the web for synonyms: PMI-IR versus LSA on TOEFL european conference on machine learning. pp. 491- 502 ,(2001) , 10.1007/3-540-44795-4_42

Marti A. Hearst, Automatic acquisition of hyponyms from large text corpora Proceedings of the 14th conference on Computational linguistics -. pp. 539- 545 ,(1992) , 10.3115/992133.992154

Eugene Agichtein, Luis Gravano, Snowball: extracting relations from large plain-text collections acm international conference on digital libraries. pp. 85- 94 ,(2000) , 10.1145/336597.336644

Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates, Web-scale information extraction in knowitall Proceedings of the 13th conference on World Wide Web - WWW '04. pp. 100- 110 ,(2004) , 10.1145/988672.988687

10.

Bernardo Magnini, Matteo Negri, Roberto Prevete, Hristo Tanev, Is it the right answer? Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02. pp. 425- 432 ,(2001) , 10.3115/1073083.1073154

The use of web-based statistics to validate, information extraction

来源期刊

我的账户

The use of web-based statistics to validate, information extraction

来源期刊

相似文章 10

我的账户