作者: Purnendu Banerjee , Ujjwal Bhattacharya , Bidyut B. Chaudhuri
关键词:
摘要: Automatic recognition of handwritten texts in video lectures has important applications. In lectures, the presenter usually writes on white / colored board. The camera often captures writing board along with certain other objects possibly including itself. Recognition from such a frame requires prior detection region frame. this article, we present our recent study text localization lecture frames. Here, use Scale Invariant Feature Transform (SIFT) descriptors densely over entire are located regular grid 5 pixels following usual practice and considered uniform patch size 60 × as its support basis an empirical study. This SIFT descriptor at each location (grid point) is fed 128-dimensional input feature vector to Multilayer Perceptron (MLP) network which gives response for point either or non-text. Depending aggregate pixel localize regions Next, employ K-means clustering detect components localized Finally, two simple rules applied decide possible detected noise. We obtained encouraging simulation results approach variety