A Machine Learning Framework for Automatically Annotating Web Pages with Simple HTML Ontology Extension (SHOE)

作者: Stephen D. Scott , Sharad C. Seth , QingFeng Lin

DOI:

关键词:

摘要: With enormous amounts of information injected into the Internet every second, manual maintenance knowledge base on is a hopeless task. A reasonable remedy for this problem to create “machine understandable” Internet. To achieve this, Heflin et al. proposed an HTML-based representation language called Simple HTML Ontology Extension (SHOE). SHOE can be used in many application domains, but it requires users manually annotate web pages. overcome shortages SHOE, we created machine learning framework AutoSHOE automatically annotating pages with annotations. framework, easily collect SHOE-annotated as training data, experiment different feature selection methods and algorithms find best approach particular ontology, new trained classifiers rule sets. In addition, allows selectors learners plugged system run anywhere through web. We present architecture then discuss experimental results our proof-of-concept design.

参考文章(12)
David Rager, Sean Luke, Lee Spector, Ontology-Based Knowledge Discovery on the World-Wide Web ,(1996)
Michael J. Pazzani, Pedro M. Domingos, Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. international conference on machine learning. pp. 105- 112 ,(1996)
J. R. Quinlan, R. M. Cameron-Jones, FOIL: A Midterm Report european conference on machine learning. pp. 3- 20 ,(1993) , 10.1007/3-540-56602-3_124
Sean Luke, Jeff Heflin, James Hendler, SHOE: A Knowledge Representation Language for Internet Applications ,(1999)
Jeff Heflin, James Hendler, Sean Luke, Applying Ontology to the web: A case study international work-conference on artificial and natural neural networks. pp. 715- 724 ,(1999) , 10.1007/BFB0100539
Robert E. Schapire, Yoram Singer, Improved boosting algorithms using confidence-rated predictions conference on learning theory. ,vol. 37, pp. 80- 91 ,(1998) , 10.1145/279943.279960
N. Littlestone, M.K. Warmuth, The weighted majority algorithm Information & Computation. ,vol. 108, pp. 212- 261 ,(1994) , 10.1006/INCO.1994.1009
J.R. Quinlan, Induction of Decision Trees Machine Learning. ,vol. 1, pp. 81- 106 ,(1986) , 10.1023/A:1022643204877
Kristie Seymore, Jason Rennie, Kamal Nigam, Andrew McCallum, Building Domain-Specific Search Engines with Machine Learning Techniques ,(1999)
Seán Slattery, Kamal Nigam, Andrew McCallum, Mark Craven, Dayne Freitag, Tom Mitchell, Dan DiPasquo, Learning to extract symbolic knowledge from the World Wide Web national conference on artificial intelligence. pp. 509- 516 ,(1998)