作者: Max Jaderberg , Andrea Vedaldi , Andrew Zisserman
DOI: 10.1007/978-3-319-10593-2_34
关键词:
摘要: The goal of this work is text spotting in natural images. This divided into two sequential tasks: detecting words regions the image, and recognizing within these regions. We make following contributions: first, we develop a Convolutional Neural Network (CNN) classifier that can be used for both tasks. CNN has novel architecture enables efficient feature sharing (by using number layers common) detection, character case-sensitive insensitive classification, bigram classification. It exceeds state-of-the-art performance all these. Second, technical changes over traditional architectures, including no downsampling per-pixel sliding window, multi-mode learning with mixture linear models (maxout). Third, have method automated data mining Flickr, generates word level annotations. Finally, components are together to form an end-to-end, system. evaluate text-spotting system on standard benchmarks, ICDAR Robust Reading set Street View Text set, demonstrate improvements multiple measures.