Semantic Object Parsing with Local-Global Long Short-Term Memory

作者: Xiaodan Liang , Xiaohui Shen , Donglai Xiang , Jiashi Feng , Liang Lin

DOI: 10.1109/CVPR.2016.347

关键词:

摘要: Semantic object parsing is a fundamental task for understanding objects in detail computer vision community, where incorporating multi-level contextual information critical achieving such fine-grained pixel-level recognition. Prior methods often leverage the through post-processing predicted confidence maps. In this work, we propose novel deep Local-Global Long Short-Term Memory (LG-LSTM) architecture to seamlessly incorporate short-distance and long-distance spatial dependencies into feature learning over all pixel positions. each LG-LSTM layer, local guidance from neighboring positions global whole image are imposed on position better exploit complex information. Individual LSTMs distinct dimensions also utilized intrinsically capture various layouts of semantic parts images, yielding hidden memory cells dimension. our approach, several layers stacked appended intermediate convolutional directly enhance visual features, allowing network parameters be learned an end-to-end way. The long chains sequential computation by enable sense much larger region inference benefiting memorization previous along dimensions. Comprehensive evaluations three public datasets well demonstrate significant superiority other state-of-the-art methods.

参考文章(38)
Yang Wang, Duan Tran, Zicheng Liao, David Forsyth, Discriminative hierarchical part-based models for human parsing and action recognition Journal of Machine Learning Research. ,vol. 13, pp. 3075- 3102 ,(2012) , 10.1007/978-3-319-57021-1_9
Russell Stewart, Mykhaylo Andriluka, Andrew Y. Ng, End-to-End People Detection in Crowded Scenes computer vision and pattern recognition. pp. 2325- 2333 ,(2016) , 10.1109/CVPR.2016.255
Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, Raquel Urtasun, A High Performance CRF Model for Clothes Parsing asian conference on computer vision. ,vol. 9005, pp. 64- 81 ,(2014) , 10.1007/978-3-319-16811-1_5
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio, None, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention international conference on machine learning. ,vol. 3, pp. 2048- 2057 ,(2015)
Si Liu, Xiaodan Liang, Luoqi Liu, Ke Lu, Liang Lin, Xiaochun Cao, Shuicheng Yan, Fashion Parsing With Video Context IEEE Transactions on Multimedia. ,vol. 17, pp. 1347- 1358 ,(2015) , 10.1109/TMM.2015.2443559
Ivo Danihelka, Nal Kalchbrenner, Alex Graves, Grid Long Short-Term Memory arXiv: Neural and Evolutionary Computing. ,(2015)
Xinlei Chen, C. Lawrence Zitnick, Mind's eye: A recurrent visual representation for image caption generation computer vision and pattern recognition. pp. 2422- 2431 ,(2015) , 10.1109/CVPR.2015.7298856
Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolutional networks for semantic segmentation computer vision and pattern recognition. pp. 3431- 3440 ,(2015) , 10.1109/CVPR.2015.7298965
Jianyu Wang, Alan Yuille, Semantic part segmentation using compositional model combining shape and appearance computer vision and pattern recognition. pp. 1788- 1797 ,(2015) , 10.1109/CVPR.2015.7298788
Wonmin Byeon, Thomas M. Breuel, Federico Raue, Marcus Liwicki, Scene labeling with LSTM recurrent neural networks computer vision and pattern recognition. pp. 3547- 3555 ,(2015) , 10.1109/CVPR.2015.7298977