作者: Julian Ibarz , Ian Goodfellow , Sacha Arnoud , Vinay Shet , Yaroslav Bulatov
DOI:
关键词: Artificial neural network 、 Segmentation 、 Convolutional neural network 、 Image (mathematics) 、 Artificial intelligence 、 Computer science 、 Domain (software engineering) 、 Pattern recognition
摘要: Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally sub-problem domain viz. recognizing multi-digit numbers from Street View imagery. Traditional approaches to solve problem typically separate out the localization, segmentation, and recognition steps. paper propose unified approach that integrates these three steps via use of deep convolutional neural network operates directly on image pixels. We employ DistBelief implementation networks order train large, distributed high quality images. find performance increases with depth network, best occurring deepest architecture trained, eleven hidden layers. evaluate publicly available SVHN dataset achieve over $96\%$ accuracy complete street numbers. show per-digit task, improve upon state-of-the-art, achieving $97.84\%$ accuracy. also even more challenging generated imagery containing several tens millions number annotations $90\%$ To further explore applicability proposed system broader tasks, apply it synthetic distorted reCAPTCHA. reCAPTCHA one most secure reverse turing tests uses distinguish humans bots. report $99.8\%$ hardest category Our evaluations both tasks indicate at specific operating thresholds, comparable to, some cases exceeds, human operators.