作者: Paolo Favaro , Simon Jenni
DOI:
关键词: Artificial intelligence 、 Pose 、 Artificial neural network 、 Rigid transformation 、 Computer vision 、 3D pose estimation 、 Feature learning 、 Task (project management) 、 Synchronization (computer science) 、 Data set 、 Computer science
摘要: Current state-of-the-art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses. In contrast, we propose an approach that can exploit small annotated fine-tuning pre-trained via self-supervised (large) unlabeled sets. To drive such towards supporting during the pre-training step, introduce novel feature task designed to focus structure in image. We extracted from videos captured with multi-view camera system. The is classify whether two depict views same scene up rigid transformation. set, where objects deform non-rigid manner, transformation occurs only between taken at exact time, i.e., when they are synchronized. demonstrate effectiveness synchronization Human3.6M set achieve results estimation.