作者: David Morris , Peichen Tang , Ralph Ewerth
关键词: Computer science 、 Information retrieval 、 Line (text file) 、 Container (abstract data type) 、 Convolutional neural network 、 Artificial neural network 、 Pipeline (software)
摘要: In recent years, the problem of scene text extraction from images has received extensive attention and significant progress. However, scholarly figures such as plots charts remains an open problem, in part due to difficulty locating irregularly placed lines. To best our knowledge, literature not described implementation a system for that adapts deep convolutional neural networks used detection. this paper, we propose approach forgoes preprocessing favor using network line localization. Our uses publicly available detection whose architecture is well suited figures. Training data are derived arXiv papers which extracted Allen Institute's pdffigures tool. Since tool analyzes PDF container format order extract location through mechanisms render it, were able gather large set labeled training samples. We show improvement methods literature, discuss structural changes pipeline.