Systems and methods for retrieving tabular data from textual sources

作者: Pallavi Pyreddy , W. Bruce Croft

DOI:

关键词: Column (database)Information retrievalHeuristicKey (cryptography)Table (information)Data elementRepresentation (mathematics)Information needsComputer scienceComponent (UML)

摘要: Tables form an important kind of data element in text retrieval. Often, the gist entire news article or other exposition can be concisely captured tabular form. Information than key words a digital document exploited to provide users with more flexible and powerful query capabilities. More specifically, structural information is identify tables their component fields let based on these fields. Component include table lines, caption row headings, column components. Empirical results have demonstrated that heuristic method extraction tagging performed effectively efficiently. Moreover, experiments retrieval using system present invention strongly indicate such decomposition facilitate better representation user's needs hence effective tables.

参考文章(17)
James P. Callan, W. Bruce Croft, Stephen M. Harding, The INQUERY Retrieval System database and expert systems applications. pp. 78- 83 ,(1992) , 10.1007/978-3-7091-7557-6_14
Michael S. Krupit, Thomas Barr, Marvin I. Weinberger, Howard Morgan, Lawrence A. Husick, Method and apparatus for identifying textual documents and multi-mediafiles corresponding to a search topic ,(1995)
Frederick S. M. Herz, Jason M. Eisner, Lyle H. Ungar, Mitchell P. Marcus, System for generation of user profiles for a system for customized electronic identification of desirable objects ,(1995)
Mitsuo Ooyama, Noriyuki Kaneoka, Hiromichi Fujisawa, Masaharu Murakami, Hisamitsu Kawaguchi, Hidefumi Masuzaki, Atsushi Hatakeyama, Masaaki Fujinawa, Kanji Kato, Mitsuru Akizawa, Hierarchical presearch type text search method and apparatus and magnetic disk unit used in the apparatus ,(1990)
Pallavi Pyreddy, W. Bruce Croft, TINTIN: a system for retrieval in text tables acm international conference on digital libraries. pp. 193- 200 ,(1997) , 10.1145/263690.263816
Howard R Turtle, Gerald J Morton, Larntz F Kinley, System of document representation retrieval by successive iterated probability sampling Laboratory Automation & Information Management. ,vol. 33, pp. 65- ,(1993) , 10.1016/S1381-141X(97)80056-1
Dacheng Wang, Sargur N Srihari, Classification of newspaper image blocks using texture analysis Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing. ,vol. 47, pp. 327- 352 ,(1989) , 10.1016/0734-189X(89)90116-3
Daniela Rus, Devika Subramanian, Customizing information capture and access ACM Transactions on Information Systems. ,vol. 15, pp. 67- 101 ,(1997) , 10.1145/239041.239048