Automated extraction of chemical structure information from digital raster images

作者: Jungkap Park , Gus R Rosania , Kerby A Shedden , Mandee Nguyen , Naesung Lyu

DOI: 10.1186/1752-153X-3-4

关键词:

摘要: To search for chemical structures in research articles, diagrams or text representing molecules need to be translated a standard file format compatible with cheminformatic engines. Nevertheless, information contained articles is often referenced as analog of embedded digital raster images. automate analog-to-digital conversion structure scientific several software systems have been developed. But their algorithmic performance and utility not investigated. This paper aims provide critical reviews these also report our recent development ChemReader – fully automated tool extracting converting them into standard, searchable formats. Basic algorithms recognizing lines letters bonds atoms can independently run sequence from graphical user interface-and the algorithm parameters readily changed-to facilitate additional specifically tailored database annotation scheme. Compared existing programs such OSRA, Kekule, CLiDE, results indicate that outperforms other on sets sample images diverse sources terms rate correct outputs accuracy molecular substructure patterns. The availability allows groups enrich databases by annotating entries published articles. Based its stable high accuracy, may sufficiently accurate links

参考文章(19)
D.H. Ballard, Generalizing the hough transform to detect arbitrary shapes Pattern Recognition. ,vol. 13, pp. 714- 725 ,(1987) , 10.1016/0031-3203(81)90009-1
Karl Tombre, Salvatore Tabbone, Loïc Pélissier, Bart Lamiroy, Philippe Dosch, Text/Graphics Separation Revisited document analysis systems. ,vol. 2423, pp. 200- 211 ,(2002) , 10.1007/3-540-45869-7_24
Karl S. Zilles, Richard G. Casey, Stephen K. Boyer, Alex M. Miller, Bernadette Oudot, Apparatus and method for optical recognition of chemical graphics ,(1991)
David Weininger, SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules Journal of Chemical Information and Computer Sciences. ,vol. 28, pp. 31- 36 ,(1988) , 10.1021/CI00057A005
Gus R. Rosania, Gordon Crippen, Peter Woolf, David States, Kerby Shedden, A cheminformatic toolkit for mining biomedical knowledge. Pharmaceutical Research. ,vol. 24, pp. 1791- 1802 ,(2007) , 10.1007/S11095-007-9285-5
Chin-Shyurng Fahn, Jhing-Fa Wang, Jau-Yien Lee, A topology-based component extractor for understanding electronic circuit diagrams Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing. ,vol. 44, pp. 119- 138 ,(1988) , 10.1016/S0734-189X(88)80001-X
Joe R. McDaniel, Jason R. Balmuth, Kekule: OCR-optical chemical (structure) recognition Journal of Chemical Information and Computer Sciences. ,vol. 32, pp. 373- 378 ,(1992) , 10.1021/CI00008A018
Arthur Dalby, James G. Nourse, W. Douglas Hounshell, Ann K. I. Gushurst, David L. Grier, Burton A. Leland, John Laufer, Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited Journal of Chemical Information and Computer Sciences. ,vol. 32, pp. 244- 255 ,(1992) , 10.1021/CI00007A012
Georgios V. Gkoutos, Henry Rzepa, Richard M. Clark, Osei Adjei, Harpal Johal, Chemical machine vision: automated extraction of chemical metadata from raster images. Journal of Chemical Information and Computer Sciences. ,vol. 43, pp. 1342- 1355 ,(2003) , 10.1021/CI034017N
Eduard Sojka, A new algorithm for detecting corners in digital images spring conference on computer graphics. pp. 55- 62 ,(2002) , 10.1145/584458.584469