One-shot Text Field labeling using Attention and Belief Propagation for Structure Information Extraction

作者: Mengli Cheng , Minghui Qiu , Xing Shi , Jun Huang , Wei Lin

DOI: 10.1145/3394171.3413511

关键词: One shotText detectionField (computer science)Computer scienceInformation retrievalTask (project management)Conditional random fieldInformation extractionBelief propagationStructure (mathematical logic)

摘要: Structured information extraction from document images usually consists of three steps: text detection, recognition, and field labeling. While detection recognition have been heavily studied improved a lot in literature, labeling is less explored still faces many challenges. Existing learning based methods for task require large amount labeled examples to train specific model each type document. However, collecting amounts them difficult sometimes impossible due privacy issues. Deploying separate models also consumes resources. Facing these challenges, we explore one-shot the task. are mostly rule-based difficulty fields crowded regions with few landmarks consisting multiple regions. To alleviate problems, proposed novel deep end-to-end trainable approach labeling, which makes use attention mechanism transfer layout between images. We further applied conditional random on transferred refinement collected annotated real-world dataset variety types conducted extensive experiments examine effectiveness model. stimulate research this direction, will be released (https://github.com/AlibabaPAI/one_shot_text_labeling).

参考文章(19)
Santanu Chaudhury, Megha Jindal, Sumantra Dutta Roy, Model-Guided Segmentation and Layout Labelling of Document Images Using a Hierarchical Conditional Random Field Lecture Notes in Computer Science. pp. 375- 380 ,(2009) , 10.1007/978-3-642-11164-8_61
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0
Sunil Kumar, Rajat Gupta, Nitin Khanna, Santanu Chaudhury, Shiv Dutt Joshi, Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model IEEE Transactions on Image Processing. ,vol. 16, pp. 2117- 2128 ,(2007) , 10.1109/TIP.2007.900098
Marcal Rusinol, Tayeb Benkhelfallah, Vincent Poulain dAndecy, Field Extraction from Administrative Documents by Incremental Structural Templates international conference on document analysis and recognition. pp. 1100- 1104 ,(2013) , 10.1109/ICDAR.2013.223
Maroua Hammami, Pierre Heroux, Sebastien Adam, Vincent Poulain d'Andecy, One-shot field spotting on colored forms using subgraph isomorphism 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 586- 590 ,(2015) , 10.1109/ICDAR.2015.7333829
Baoguang Shi, Xiang Bai, Cong Yao, An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 39, pp. 2298- 2304 ,(2017) , 10.1109/TPAMI.2016.2646371
Chen-Yu Lee, Simon Osindero, Recursive Recurrent Nets with Attention Modeling for OCR in the Wild computer vision and pattern recognition. pp. 2231- 2239 ,(2016) , 10.1109/CVPR.2016.245
Koray Kavukcuoglu, Daan Wierstra, Charles Blundell, Oriol Vinyals, Timothy Lillicrap, Matching Networks for One Shot Learning arXiv: Learning. ,(2016)
Kevin Swersky, Richard S. Zemel, Jake Snell, Prototypical Networks for Few-shot Learning neural information processing systems. ,vol. 30, pp. 4077- 4087 ,(2017)
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang, EAST: An Efficient and Accurate Scene Text Detector computer vision and pattern recognition. ,vol. 2017, pp. 2642- 2651 ,(2017) , 10.1109/CVPR.2017.283