作者: Xiaodan Liang , Xiaohui Shen , Donglai Xiang , Jiashi Feng , Liang Lin
关键词:
摘要: Semantic object parsing is a fundamental task for understanding objects in detail computer vision community, where incorporating multi-level contextual information critical achieving such fine-grained pixel-level recognition. Prior methods often leverage the through post-processing predicted confidence maps. In this work, we propose novel deep Local-Global Long Short-Term Memory (LG-LSTM) architecture to seamlessly incorporate short-distance and long-distance spatial dependencies into feature learning over all pixel positions. each LG-LSTM layer, local guidance from neighboring positions global whole image are imposed on position better exploit complex information. Individual LSTMs distinct dimensions also utilized intrinsically capture various layouts of semantic parts images, yielding hidden memory cells dimension. our approach, several layers stacked appended intermediate convolutional directly enhance visual features, allowing network parameters be learned an end-to-end way. The long chains sequential computation by enable sense much larger region inference benefiting memorization previous along dimensions. Comprehensive evaluations three public datasets well demonstrate significant superiority other state-of-the-art methods.