A machine learning approach to building domain-specific search engines

作者: Kristie Seymore , Jason Rennie , Kamal Nigam , Andrew McCallum

DOI:

关键词:

摘要: Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide engines. Unfortunately, also difficult time-consuming to maintain. This paper proposes the use of machine learning techniques greatly automate creation maintenance domain-specific We describe new research in reinforcement learning, text classification information extraction that enables efficient spidering, populates topic hierarchies, identifies informative segments. Using these techniques, we have built a demonstration system: engine for computer science papers available at www.cora.justrcsettrch.com.

参考文章(18)
Filippo Menczer, ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery international conference on machine learning. ,(1997)
JRA McCallum, Jason Rennie, Using Reinforcement Learning to Spider the Web Efficiently international conference on machine learning. pp. 335- 343 ,(1999)
Filippo Menczer, ARCCHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods international conference on machine learning. pp. 227- 235 ,(1997)
Andrew McCallum, Ronald Rosenfeld, Thomas Mitchell, Andrew Y Ng, None, Improving Text Classification by Shrinkage in a Hierarchy of Classes international conference on machine learning. pp. 359- 367 ,(1998)
Kristie Seymore, Andrew McCallum, Roni Rosenfeld, Learning Hidden Markov Model Structure for Information Extraction ,(1999)
A. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm IEEE Transactions on Information Theory. ,vol. 13, pp. 260- 269 ,(1967) , 10.1109/TIT.1967.1054010
Ian H. Witten, Craig Nevill-Manning, Rodger McNab, Sally Jo Cunningham, A public library based on full-text retrieval Communications of The ACM. ,vol. 41, pp. 71- 75 ,(1998) , 10.1145/273035.273057
Thorsten Joachims, Dayne Freitag, Tom Mitchell, None, Web Watcher: A Tour Guide for the World Wide Web. international joint conference on artificial intelligence. pp. 770- 777 ,(1997)
William W. Cohen, A Web-based information system that reasons with structured collections of text adaptive agents and multi-agents systems. pp. 400- 407 ,(1998) , 10.1145/280765.280870