作者: Stephen Dill , John A. Tomlin , Jason Y. Zien , Nadav Eiron , David Gibson
关键词:
摘要: This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the to perform automated semantic tagging of large corpora. We apply SemTag collection approximately 264 million web pages, generate 434 automatically disambiguated tags, published as label bureau providing metadata regarding annotations. To our knowledge, this is largest scale effort date.We describe Seeker platform, discuss architecture application, new disambiguation algorithm specialized support ontological data, evaluate algorithm, present final results with information about acquiring making use tags. argue that ambiguous content can bootstrap accelerate creation web.