作者: Pallika Kanani , Andrew McCallum , Shaohan Hu
DOI: 10.1007/978-3-642-13657-3_45
关键词: Resource (project management) 、 Missing data 、 Feature (computer vision) 、 Information extraction 、 Task (project management) 、 Specific-information 、 Probabilistic logic 、 Information retrieval 、 Computer science 、 Data mining
摘要: We present a general framework for the task of extracting specific information “on demand” from large corpus such as Web under resource-constraints. Given database with missing or uncertain information, proposed system automatically formulates queries, issues them to search interface, selects subset documents, extracts required them, and fills values in original database. also exploit inherent dependency within data obtain useful fewer computational resources. build citation domain that publication years using limited resources Web. discuss probabilistic approach this first results. The main contribution paper is propose general, comprehensive architecture designing adaptable different domains.