作者: Jungkap Park , Gus R Rosania , Kerby A Shedden , Mandee Nguyen , Naesung Lyu
关键词:
摘要: To search for chemical structures in research articles, diagrams or text representing molecules need to be translated a standard file format compatible with cheminformatic engines. Nevertheless, information contained articles is often referenced as analog of embedded digital raster images. automate analog-to-digital conversion structure scientific several software systems have been developed. But their algorithmic performance and utility not investigated. This paper aims provide critical reviews these also report our recent development ChemReader – fully automated tool extracting converting them into standard, searchable formats. Basic algorithms recognizing lines letters bonds atoms can independently run sequence from graphical user interface-and the algorithm parameters readily changed-to facilitate additional specifically tailored database annotation scheme. Compared existing programs such OSRA, Kekule, CLiDE, results indicate that outperforms other on sets sample images diverse sources terms rate correct outputs accuracy molecular substructure patterns. The availability allows groups enrich databases by annotating entries published articles. Based its stable high accuracy, may sufficiently accurate links