Analysis of Documents Born Digital.

作者： Jianying Hu , Ying Liu , None

DOI:

关键词:

摘要: While traditional document analysis has focused on printed media, an increasingly large portion of the documents today are generated in digital form from start. Such “documents born digital” range plain text such as emails to more sophisticated forms PDF and Web documents. On one hand, existence encoding eliminates need for scanning, image processing, character recognition most situations (a notable exception being prevalent use embedded images documents, elaborated upon section “Analysis Text Images”). other many higher-level processing tasks remain due fact that design purpose almost existing systems (i.e., HTML, PDF) is display or printing human consumption, not machine-level information exchange extraction. As such, significant amount still required automatic extraction, indexing, content repurposing challenges exist this process. This chapter describes detail key technologies digital, with a focus processing.

uni-trier.de 本地加速

springer.com 本地加速

doi.org 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(48)

Dimosthenis A. Karatzas, Text Segmentation in Web Images Using Colour Perception and Topological Features University of Liverpool. ,(2003)

Nicholas Kushmerick, Barry Smyth, Aidan Finn, Fact or Fiction: Content Classification for Digital Libraries. DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries. ,(2001)

Paul Bohunsky, Wolfgang Gatterbauer, Table extraction using spatial reasoning on the CSS2 visual box model national conference on artificial intelligence. pp. 1313- 1318 ,(2006)

Matthew Hurst, Layout and Language: Challenges for Table Understanding on the Web ,(2001)

Oleg Okun, David Scott Doermann, Matti Pietikäinen, Page Segmentation and Zone Classification: The State of the Art Defense Technical Information Center. ,(1999) , 10.21236/ADA458676

Vasileios Hatzivassiloglou, Kathleen R McKeown, Simone Teufel, Regina Barzilay, Barry Schiffman, David Evans, Columbia multi-document summarization : Approach and evaluation Porc. of Document Understanding Conference 2001. ,(2001) , 10.7916/D82V2QHF

A.K. Jain, Bin Yu, Automatic text location in images and video frames international conference on pattern recognition. ,vol. 2, pp. 1497- 1499 ,(1998) , 10.1109/ICPR.1998.711990

Silvia Miksch, Burcu Yildiz, Katharina Kaiser, pdf2table: A Method to Extract Table Information from PDF Files. indian international conference on artificial intelligence. pp. 1773- 1785 ,(2005)

Hui Chao, Jian Fan, Layout and Content Extraction for PDF Documents Document Analysis Systems VI. pp. 213- 224 ,(2004) , 10.1007/978-3-540-28640-0_20

10.

Daniel Lopresti, Jiangying Zhou, Locating and Recognizing Text in WWW Images Information Retrieval. ,vol. 2, pp. 177- 206 ,(2000) , 10.1023/A:1009954710479

Analysis of Documents Born Digital.

来源期刊

我的账户

Analysis of Documents Born Digital.

来源期刊

相似文章 9

我的账户