Removal of extraneous text from electronic documents

作者: Xiaofan Lin

DOI:

关键词: Weight valueLine numberArtificial intelligenceInformation retrievalLine (text file)Natural language processingComputer science

摘要: Method and apparatus for removing lines of extraneous text from a document. Similarities are identified between on each page corresponding selected subset pages. Different weight values associated with different line numbers page, value indicating degree likelihood that contains text. One or more selectively removed as function the similarities

参考文章(16)
William J. van Melle, James V. Mahoney, Thomas P. Moran, Patrick Chiu, Automatic extraction of text regions and region borders for an electronic work surface ,(1998)
Thomas M. Breuel, William C. Janssen, Ashok C. Popat, Daniel S. Bloomberg, Henry S. Baird, Method and system for document image layout deconstruction and redisplay ,(2003)
Charlton E. Lui, Dan Altman, Leroy B. Keely, Susanne Alysia Clark Cazzanti, Classifying, anchoring, and transforming ink ,(2000)
Barbara Claire Brown, Joann Molaro Rotter, A system for processing and consolidating records ,(2002)