Building a test collection for complex document information processing

作者： D. Lewis , G. Agam , S. Argamon , O. Frieder , D. Grossman

关键词:

摘要: Research and development of information access technology for scanned paper documents has been hampered by the lack public test collections realistic scope complexity. As part a project to create prototype system search mining masses document images, we are assembling 1.5 terabyte dataset support evaluation both end-to-end complex processing (CDIP) tasks (e.g., text retrieval data mining) as well component technologies such optical character recognition (OCR), structure analysis, signature matching, authorship attribution.

参考文章(3)

Kazem Taghva, Julie Borsack, Allen Condit, Srinivas Erva, The effects of noisy data on text retrieval Journal of the American Society for Information Science. ,vol. 45, pp. 50- 58 ,(1994) , 10.1002/(SICI)1097-4571(199401)45:1<50::AID-ASI6>3.0.CO;2-B

S. Argamon, G. Agam, O. Frieder, D. Grossman, D. Lewis, G. Sohn, K. Voorhees, A complex document information processing prototype Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 599- 600 ,(2006) , 10.1145/1148170.1148274

Heidi Schmidt, Karen Butter, Cynthia Rider, Building Digital Tobacco Industry Document Libraries at the University of California, San Francisco Library/Center for Knowledge Management D-lib Magazine. ,vol. 8, ,(2002) , 10.1045/SEPTEMBER2002-SCHMIDT

Building a test collection for complex document information processing

来源期刊

我的账户

Building a test collection for complex document information processing

来源期刊

相似文章 10

我的账户