作者: Falk Böschen , Tilman Beck , Ansgar Scherp
DOI: 10.1007/S11042-018-6162-7
关键词:
摘要: Different approaches have been proposed in the past to address challenge of extracting text from scholarly figures. However, until recently, no comparative evaluation different had conducted. Thus, we performed an extensive study related work and evaluated total 32 approaches. In this work, perform a more detailed comparison 7 most relevant described literature extend 37 systematic linear combinations methods for Our generic pipeline, consisting six steps, allows us freely combine possible fair comparison. Overall, 44 pipeline configurations systematically compared methods. We then derived two non-linear two-pass approach. evaluate all over four datasets figures origin characteristics. The quality extraction results is assessed using F-measure Levenshtein distance, measure runtime performance. experiments showed that there configuration overall shows best on datasets. Further can be improved by extending it Regarding runtime, observed huge differences very fast those running several weeks. found working our method set. they also further improvements regarding region classification are needed.