Scaling up Pattern Induction for Web Relation Extraction through Frequent Itemset Mining

作者: Philipp Cimiano , Sebastian Blohm

DOI:

关键词:

摘要: In this paper, we address the problem of extracting relational information from Web at a large scale. particular present bootstrapping approach to relation extraction which starts with few seed tuples target and induces patterns can be used extract further tuples. Our contribution in paper lies formulation pattern induction task as well-known machine learning problem, i.e. one determining frequent itemsets on basis set transactions representing patterns. The mining is not only elegant, but also speeds up step considerably respect previous implementations procedure. We evaluate our terms standard measures seven datasets varying size complexity. particular, by analyzing rate (extracted per time) show that reduces complexity quadratic linear (in occurrences generalized), while mantaining quality similar (or even marginally better) levels.

参考文章(36)
Günter Neumann, Rui Wang, Recognizing textual entailment using a subsequence kernel method national conference on artificial intelligence. pp. 937- 942 ,(2007)
Günter Neumann, Kathrin Eichler, Holmer Hemsen, Unsupervised Relation Extraction From Web Documents. language resources and evaluation. ,(2008)
Eugene Agichtein, Confidence Estimation Methods for Partially Supervised Information Extraction. siam international conference on data mining. pp. 539- 543 ,(2006)
Nitin Jindal, Bing Liu, Mining comparative sentences and relations national conference on artificial intelligence. pp. 1331- 1336 ,(2006)
Douglas E. Appelt, David J. Israel, Introduction to Information Extraction Technology ,(1999)
Sergey Brin, Extracting Patterns and Relations from the World Wide Web Lecture Notes in Computer Science. pp. 172- 183 ,(1999) , 10.1007/10704656_11
Michael J. Cafarella, Oren Etzioni, Stephen Soderland, Michele Banko, Matt Broadhead, Open information extraction from the web international joint conference on artificial intelligence. pp. 2670- 2676 ,(2007)
Fabio Ciravegna, Adaptive information extraction from text by rule induction and generalisation international joint conference on artificial intelligence. pp. 1251- 1256 ,(2001)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Oren Etzioni, Daniel S. Weld, Stephen Soderland, Doug Downey, Learning text patterns for web information extraction and assessment national conference on artificial intelligence. pp. 50- 55 ,(2004)