作者: Bolei Zhou , Hang Zhao , Xavier Puig , Tete Xiao , Sanja Fidler
DOI: 10.1007/S11263-018-1140-0
关键词:
摘要: Semantic understanding of visual scenes is one the holy grails computer vision. Despite efforts community in data collection, there are still few image datasets covering a wide range and object categories with pixel-wise annotations for scene understanding. In this work, we present densely annotated dataset ADE20K, which spans diverse scenes, objects, parts some cases even parts. Totally 25k images complex everyday containing variety objects their natural spatial context. On average 19.5 instances 10.5 classes per image. Based on construct benchmarks parsing instance segmentation. We provide baseline performances both re-implement state-of-the-art models open source. further evaluate effect synchronized batch normalization find that reasonably large size crucial semantic segmentation performance. show networks trained ADE20K able to segment objects.