Method and apparatus for identifying words described in a portable electronic document

作者: Robert M. Ayers , Mohammad Daryoush Paknad

DOI:

关键词: Digital computationCode segmentComputer scienceInformation retrievalWord (computer architecture)Full text searchWord listLinked listObject (computer science)Electronic document

摘要: A method and apparatus for identifying words stored in a portable electronic document. digital computation stores page of document including characters text segments that have not been identified as words. word mechanism analyzes the objects linked list. The identifies from list by analyzing breaks gaps between using position data associated with segments. are sorted if necessary. present invention receives segment having multiple data, x y coordinates each segment. object is created segment, entered into Words then data. added to above steps repeated until end reached. can be used searching

参考文章(27)
Daniel P. Huttenlocher, Eric W. Jaquith, Method for identifying word bounding boxes in text ,(1993)
Steven C. Bagley, Dan S. Bloomberg, Daniel P. Huttenlocher, Douglass R. Cutting, M. Margaret Withgott, Todd A. Cass, Ramana B. Rao, Per-Kristian Halvorsen, Ronald M. Kaplan, Methods and apparatus for selecting semantically significant images in a document image without decoding image content ,(1992)
David A Catapano, Thomas B Zell, Lillian-Liu Hsu, Paul E Reilly, Mark F Simpson, Eric W Baxter, Apparatus and method for processing a stream of image data in a printing system. ,(1994)
Dan S. Bloomberg, Daniel P. Huttenlocher, Todd A. Cass, Ramana B. Rao, Per-Kristian Halvorsen, Ronald M. Kaplan, Steven C. Bagley, M. Margaret Withgott, Method and apparatus for summarizing a document without document image decoding ,(1992)
T. Lau, Building a hypermedia information system on the Internet IPCC 94 Proceedings. Scaling New Heights in Technical Communication. pp. 192- 197 ,(1994) , 10.1109/IPCC.1994.347523