作者: Alan L. Yuille , Liang-Chieh Chen , Fangting Xia , Peng Wang
DOI:
关键词: Process (computing) 、 Machine learning 、 Zoom 、 Parsing 、 Segmentation 、 Artificial intelligence 、 Task (computing) 、 Pattern recognition 、 Convolutional neural network 、 Scale (map) 、 Structure (mathematical logic) 、 Computer science
摘要: Parsing human regions into semantic parts, e.g., body, head and arms etc., from a random natural image is challenging while fundamental for computer vision widely applicable in industry. One major difficulty to handle such problem the high flexibility of scale location instance its corresponding making parsing task either lack boundary details or suffer local confusions. To tackle problems, this work, we propose "Auto-Zoom Net" (AZN) part parsing, which unified fully convolutional neural network structure that: (1) parses each detailed parts. (2) predicts locations scales instances their In our network, two tasks are mutually beneficial. The score maps obtained help estimate With predicted scales, model "zooms" region right further refine parsing. practice, perform iteratively so that parts gradually recovered. We conduct extensive experiments over PASCAL-Person-Part segmentation, show approach significantly outperforms state-of-art techniques especially at small scale. addition, horse cow segmentation also obtain results considerably better than state-of-the-art methods (by 5%)., contribued by proposed iterative zooming process.