作者: Oren Etzioni , Daniel S. Weld , Tal Shaked , Stephen Soderland
DOI:
关键词:
摘要: The World Wide Web is a powerful and readily available text corpus that can be used effectively to validate the output of an information extraction system. We present experiments explore how pointwise mutual (PMI) from search engine hit counts in Assessor module assigns probability extracted fact or relationship correct, thus boosting precision. find thresholding on PMI scores more effective creating features for than using density models. Bootstrapping finding both positive negative seeds train Assessor, performing better hand-tagging sample actual extractions.