Self-Supervised Multi-View Synchronization Learning for 3D Pose Estimation

作者: Paolo Favaro , Simon Jenni

DOI:

关键词: Artificial intelligencePoseArtificial neural networkRigid transformationComputer vision3D pose estimationFeature learningTask (project management)Synchronization (computer science)Data setComputer science

摘要: Current state-of-the-art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses. In contrast, we propose an approach that can exploit small annotated fine-tuning pre-trained via self-supervised (large) unlabeled sets. To drive such towards supporting during the pre-training step, introduce novel feature task designed to focus structure in image. We extracted from videos captured with multi-view camera system. The is classify whether two depict views same scene up rigid transformation. set, where objects deform non-rigid manner, transformation occurs only between taken at exact time, i.e., when they are synchronized. demonstrate effectiveness synchronization Human3.6M set achieve results estimation.

参考文章(62)
Carl Doersch, Abhinav Gupta, Alexei A. Efros, Unsupervised Visual Representation Learning by Context Prediction international conference on computer vision. pp. 1422- 1430 ,(2015) , 10.1109/ICCV.2015.167
Sijin Li, Weichen Zhang, Antoni B. Chan, Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation 2015 IEEE International Conference on Computer Vision (ICCV). pp. 2848- 2856 ,(2015) , 10.1109/ICCV.2015.326
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, Bernt Schiele, 2D Human Pose Estimation: New Benchmark and State of the Art Analysis computer vision and pattern recognition. pp. 3686- 3693 ,(2014) , 10.1109/CVPR.2014.471
T. Tuytelaars, L. Van Gool, Synchronizing video sequences computer vision and pattern recognition. ,vol. 1, pp. 762- 768 ,(2004) , 10.1109/CVPR.2004.1315108
Catalin Ionescu, Dragos Papava, Vlad Olaru, Cristian Sminchisescu, Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 36, pp. 1325- 1339 ,(2014) , 10.1109/TPAMI.2013.248
Aseem Agarwala, Ke Colin Zheng, Chris Pal, Maneesh Agrawala, Michael Cohen, Brian Curless, David Salesin, Richard Szeliski, Panoramic video textures ACM Transactions on Graphics. ,vol. 24, pp. 821- 827 ,(2005) , 10.1145/1073204.1073268
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition computer vision and pattern recognition. pp. 770- 778 ,(2016) , 10.1109/CVPR.2016.90
Bodo Rosenhahn, Reinhard Klette, Dimitris N. Metaxas, 06241 Abstracts Collection -- Human Motion - Understanding, Modeling, Capture and Animation. 13th Workshop dagstuhl seminar proceedings. pp. 0- ,(2006)
Sijin Li, Antoni B. Chan, 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network asian conference on computer vision. pp. 332- 347 ,(2014) , 10.1007/978-3-319-16808-1_23