Method and apparatus for improved web scraping

作者: Douglas M. Boulware , John J. Salerno

DOI:

关键词:

摘要: Method and apparatus to enable the parser component of a web search engine adapt in response frequent page format changes at sites. Parser “learns” from set defined HTTP links, how find parse pages returned query. The invention intelligently locates various token/strings that will correctly extract attributes associated with item. Present may operate either automatically or user-assisted fashion.