Project: Deep Depth Pose

Deep Depth Pose (DDP) model: 3D Pose Estimation from Depth Maps using a Deep combination of Poses

Manuel J. Marín-Jiménez, Francisco J. Romero, Rafael Muñoz-Salinas, Rafael Medina-Carnicer

Overview

This work addresses the problem of 3D human pose estimation from depth maps employing a Deep Learning approach. We propose a model, named Deep Depth Pose (DDP), which receives a depth map containing a person and a set of predefined 3D prototype poses and returns the 3D position of the body joints of the person. In particular, DDP is defined as a ConvNet that computes the specific weights needed to linearly combine the prototypes for the given input. We have thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which respectively depict realistic and synthetic samples, defining a new state-of-the-art on them.

The following figure summarizes the main steps of our approach:

DDP pipeline

Results

Quantitative results

Results on ITOP (frontal and top views):

Results on UBC3V Hard-Pose (vs Shafaei'2016):

Qualitative results

We show in the following video actual results on the test partition of ITOP dataset. Each image has been processed independently.

Poses estimated on ITOP test samples: Download.

Downloads

Filename	Description	Size
Demo code at GitHub	Demo code for ITOP (contains sample data)	-- MB
itopresults.mp4	Video with estimated 3D poses	78 MB

Related Publications

[1] M. Marin-Jimenez, Francisco J. Romero, Rafael Muñoz-Salinas, Rafael Medina-Carnicer
3D Pose Estimation from Depth Maps using a Deep combination of Poses
Journal of Visual Communication and Image Representation (in press), 2018

@Article{Marin18ijvcr,
  author     = "Marin-Jimenez, M.J. and Romero, F.J. and Mu\~noz-Salinas, R. and Medina-Carnincer, R.",
  title      = "3D Pose Estimation from Depth Maps using a Deep combination of Poses",
  journal    = "Journal of Visual Communication and Image Representation",
  year       = "2018",
  doi  = "https://doi.org/10.1016/j.jvcir.2018.07.010",
  note = "In press"
}

Acknowledgements

This project has been funded under projects TIN2016-75279-P and IFI16/00033 (ISCIII) of Spain Ministry of Economy, Industry and Competitiveness, and FEDER. Thanks to NVidia for donating the GPU Titan Xp used for the experiments presented in this work.