作者: Kristie Seymore , Jason Rennie , Kamal Nigam , Andrew McCallum
DOI:
关键词:
摘要: Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide engines. Unfortunately, also difficult time-consuming to maintain. This paper proposes the use of machine learning techniques greatly automate creation maintenance domain-specific We describe new research in reinforcement learning, text classification information extraction that enables efficient spidering, populates topic hierarchies, identifies informative segments. Using these techniques, we have built a demonstration system: engine for computer science papers available at www.cora.justrcsettrch.com.