作者: John C. Handley
DOI:
关键词:
摘要: The present invention handles fully-lined, semi-lined and line-less cell tables by identifying the cells separators during page recomposition processes as part of optical character recognition processes. accomplishes such iteratively cells. this merging word boxes into cells, finding separators, bounded same repeating these steps until correct structure is found. With method, rows are estimated, close words merged columns then within merged, re-estimated, in row column bigger according to detection various table styles. This large complex with multiple lines symbols per cell. method line lined, tables.