Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

作者: Julian Ibarz , Ian Goodfellow , Sacha Arnoud , Vinay Shet , Yaroslav Bulatov

DOI:

关键词: Artificial neural networkSegmentationConvolutional neural networkImage (mathematics)Artificial intelligenceComputer scienceDomain (software engineering)Pattern recognition

摘要: Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally sub-problem domain viz. recognizing multi-digit numbers from Street View imagery. Traditional approaches to solve problem typically separate out the localization, segmentation, and recognition steps. paper propose unified approach that integrates these three steps via use of deep convolutional neural network operates directly on image pixels. We employ DistBelief implementation networks order train large, distributed high quality images. find performance increases with depth network, best occurring deepest architecture trained, eleven hidden layers. evaluate publicly available SVHN dataset achieve over $96\%$ accuracy complete street numbers. show per-digit task, improve upon state-of-the-art, achieving $97.84\%$ accuracy. also even more challenging generated imagery containing several tens millions number annotations $90\%$ To further explore applicability proposed system broader tasks, apply it synthetic distorted reCAPTCHA. reCAPTCHA one most secure reverse turing tests uses distinguish humans bots. report $99.8\%$ hardest category Our evaluations both tasks indicate at specific operating thresholds, comparable to, some cases exceeds, human operators.

参考文章(14)
Erkki Oja, Aapo Hyvarinen, Juha Karhunen, Independent Component Analysis ,(2001)
Joelle Pineau, Ouais Alsharif, End-to-End Text Recognition with Hybrid HMM Maxout Models arXiv: Computer Vision and Pattern Recognition. ,(2013)
W. L. Buntine, Operations for learning with graphical models Journal of Artificial Intelligence Research. ,vol. 2, pp. 159- 225 ,(1994) , 10.1613/JAIR.62
Ilya Sutskever, Geoffrey E. Hinton, Alex Krizhevsky, Ruslan R. Salakhutdinov, Nitish Srivastava, Improving neural networks by preventing co-adaptation of feature detectors arXiv: Neural and Evolutionary Computing. ,(2012)
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition Proceedings of the IEEE. ,vol. 86, pp. 2278- 2324 ,(1998) , 10.1109/5.726791
Christian Szegedy, Alexander Toshev, Dumitru Erhan, Deep Neural Networks for Object Detection neural information processing systems. ,vol. 26, pp. 2553- 2561 ,(2013)
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc Le, Andrew Ng, None, Large Scale Distributed Deep Networks neural information processing systems. ,vol. 25, pp. 1223- 1231 ,(2012)
Yuval Netzer, Andrew Y. Ng, Adam Coates, Alessandro Bissacco, Tao Wang, Bo Wu, Reading Digits in Natural Images with Unsupervised Feature Learning ,(2011)