The use of web-based statistics to validate, information extraction

作者: Oren Etzioni , Daniel S. Weld , Tal Shaked , Stephen Soderland

DOI:

关键词:

摘要: The World Wide Web is a powerful and readily available text corpus that can be used effectively to validate the output of an information extraction system. We present experiments explore how pointwise mutual (PMI) from search engine hit counts in Assessor module assigns probability extracted fact or relationship correct, thus boosting precision. find thresholding on PMI scores more effective creating features for than using density models. Bootstrapping finding both positive negative seeds train Assessor, performing better hand-tagging sample actual extractions.

参考文章(12)
Ellen Riloff, Rosie Jones, Learning dictionaries for information extraction by multi-level bootstrapping national conference on artificial intelligence. pp. 474- 479 ,(1999)
dave beckett, World Wide Web Conference 2004 Ariadne. ,(2004)
Sergey Brin, Extracting Patterns and Relations from the World Wide Web Lecture Notes in Computer Science. pp. 172- 183 ,(1999) , 10.1007/10704656_11
Michael Cafarella, Oren Etzioni, Daniel S. Weld, Tal Shaked, Stephen Soderland, Alexander Yates, Doug Downey, Ana-Maria Popescu, Methods for domain-independent information extraction from the web: an experimental comparison national conference on artificial intelligence. pp. 391- 398 ,(2004)
Oren Etzioni, Daniel S. Weld, Stephen Soderland, Doug Downey, Learning text patterns for web information extraction and assessment national conference on artificial intelligence. pp. 50- 55 ,(2004)
Peter D. Turney, Mining the web for synonyms: PMI-IR versus LSA on TOEFL european conference on machine learning. pp. 491- 502 ,(2001) , 10.1007/3-540-44795-4_42
Marti A. Hearst, Automatic acquisition of hyponyms from large text corpora Proceedings of the 14th conference on Computational linguistics -. pp. 539- 545 ,(1992) , 10.3115/992133.992154
Eugene Agichtein, Luis Gravano, Snowball: extracting relations from large plain-text collections acm international conference on digital libraries. pp. 85- 94 ,(2000) , 10.1145/336597.336644
Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates, Web-scale information extraction in knowitall Proceedings of the 13th conference on World Wide Web - WWW '04. pp. 100- 110 ,(2004) , 10.1145/988672.988687
Bernardo Magnini, Matteo Negri, Roberto Prevete, Hristo Tanev, Is it the right answer? Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02. pp. 425- 432 ,(2001) , 10.3115/1073083.1073154