Deep Depth Pose (DDP) model: 3D Pose Estimation from Depth Maps using a Deep combination of Poses
Manuel
J. Marín-Jiménez, Francisco J. Romero, Rafael Muñoz-Salinas, Rafael Medina-Carnicer
Overview
This work addresses the problem of 3D human pose estimation from depth maps employing a Deep Learning approach. We propose a model, named Deep Depth Pose (DDP), which receives a depth map containing a person and a set of predefined 3D prototype poses and returns the 3D position of the body joints of the person. In particular, DDP is defined as a ConvNet that computes the specific weights needed to linearly combine the prototypes for the given input. We have thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which respectively depict realistic and synthetic samples, defining a new state-of-the-art on them.
The following figure summarizes the main steps of our approach:
Results
Quantitative results
Results on ITOP (frontal and top views):
Results on UBC3V Hard-Pose (vs Shafaei'2016):
Qualitative results
We show in the following video actual results on the test partition of ITOP dataset. Each image has been processed independently.
Poses estimated on ITOP test samples:
Download.
Downloads
Filename | Description | Size |
---|---|---|
Demo code at GitHub | Demo code for ITOP (contains sample data) | -- MB |
itopresults.mp4 | Video with estimated 3D poses | 78 MB |
Related Publications
[1]
M. Marin-Jimenez, Francisco J. Romero, Rafael Muñoz-Salinas, Rafael Medina-Carnicer
3D Pose Estimation from Depth Maps using a Deep combination of Poses
Journal of Visual Communication and Image Representation (in press), 2018
@Article{Marin18ijvcr, author = "Marin-Jimenez, M.J. and Romero, F.J. and Mu\~noz-Salinas, R. and Medina-Carnincer, R.", title = "3D Pose Estimation from Depth Maps using a Deep combination of Poses", journal = "Journal of Visual Communication and Image Representation", year = "2018", doi = "https://doi.org/10.1016/j.jvcir.2018.07.010", note = "In press" }
Acknowledgements
This project has been funded under projects TIN2016-75279-P and IFI16/00033 (ISCIII) of Spain Ministry of Economy, Industry and Competitiveness, and FEDER. Thanks to NVidia for donating the GPU Titan Xp used for the experiments presented in this work.