作者: Stephen Soderland
DOI:
关键词:
摘要: There is a wealth of information to be mined from narrative text on the World Wide Web. Unfortunately, standard natural language processing (NLP) extraction techniques expect full, grammatical sentences, and perform poorly choppy sentence fragments that are often found web pages. This paper1 introduces Webfoot, preprocessor parses pages into logically coherent segments based page layout cues. Output Webfoot then passed CRYSTAL, an NLP system learns rules example. CRYSTAL transform formal representation equivalent relational database entries. This necessary first step for knowledge discovery other automated analysis free text.