作者: Wojciech Gryc , Prem Melville , Richard D. Lawrence
DOI: 10.1109/DEST.2010.5610621
关键词:
摘要: For many marketing and business applications, it is necessary to know the home page of a company specified only by its name. If we require for small number big companies, this task readily accomplished via use Internet search engines or access domain registration lists. However, if entities interest are these approaches can lead mismatches, particularly lacks page. We address problem using supervised machine-learning approach in which train binary classification model. classify potential website matches each name based on set explanatory features extracted from content candidate website. Our related web-based intelligence two ways: (1) build training our learning algorithms through crowdsourcing tools illustrate their research, (2) success model allows one easily corporate pages as data inputs into other research projects. Through successful crowdsourcing, able identify correct recognize that valid does not exist with an accuracy 57% better than simply taking highest ranked engine result match.