Using Natural Language Processing for Identifying and Interpreting Tables in Plain Text

作者: Shona Douglas , Matthew Hurst , David Quinn



摘要: Figure 2:A generic table with terminology (`*' representsoptionality)3.2VariationsonTableLayoutDi erent layout arrangements can b e thought of as expressing functional asp ects the relation, usingsomesimple heuristicsab outthewaygroupingandorderingmay eexpressedintodimensions.Because we conventionally read tables from left and top, distinguish theleft marginandtopmarginoftablesasareasthetableinwhichhigh-precedencedomains areplaced(seeFigure 2).A given supp orts a certainreading order.This reading order reectsthe way inwhich domains are organised sp eci cally thegroupsandordersin which easilyusedaskeysin cho osing reading/constructing tuple table; is thusthe emb o diment decision structure identi ed part table.Thus,while single canonical form may have many layouts, plus functionalinformation will much reducedrange felicitous layouts.These typical constraints on laoutwillb eusedlaterinourpro cessingheuristics.First,wepresentarep ertoireoftransformations ofsimple in terms analyse variations that ccur.RotationThis transformation est describ by example:StandardslumpValue75mmv1ST1125mmv275mmv3ST2125mmv4!Standard75mm125mmValueST1v1v2v34
