|
|
|
Dynamics-based Human
Motion Modeling for People Tracking (DARPA Mind’s Eye Program) |
| Human pose estimation using monocular vision is a challenging problem in computer vision. Past work has focused on developing efficient inference algorithms and probabilistic prior models based on captured kinematic/dynamic measurements. However, such algorithms face challenges in generalization beyond the learned dataset. |
|
Figure 1: Project Overview |
|
In this work, we propose a model-based generative approach for estimating the human pose solely from uncalibrated monocular video in unconstrained environments without any prior learning on motion capture/image annotation data. We propose a novel Product of Heading Experts (PoHE) based generalized heading estimation framework by probabilistically-merging heading outputs (probabilistic/non-probabilistic) from time varying number of estimators. Our current implementation employs motion cues based human heading estimation framework to bootstrap a synergistically integrated probabilistic-deterministic sequential optimization framework to robustly estimate human pose. Novel pixel-distance based performance measures are developed to penalize false human detections and ensure identity-maintained human tracking. We tested our framework with varied inputs (silhouette and bounding boxes) to evaluate, compare and benchmark it against ground-truth data (collected using our human annotation tool) for 52 video vignettes in the publicly available DARPA Mind’s Eye Year I dataset 1. Results show robust pose estimates on this challenging dataset of highly diverse activities.
Human tracking
is typically formulated as a Bayesian filtering problem, based on a Particle
Filter (PF). In PF the posterior is approximated using a set of weighted
samples/particles and is computed recursively. In this work, we will focus on
developing a dynamics based temporal prior contributing to the posterior as
opposed to a first or second order linear dynamical system with Gaussian noise
which is often adopted due to unavailability of more realistic priors. We assume
that for simulating dynamics of the scene the segment shapes, mass properties,
collision geometries and other associated parameters (e.g. direction of gravity)
is known and remain constant throughout the motion sequence. We also consider a
human as a loop-free articulated structure. Bayesian filtering technique such as
PF or Annealed Particle Filter will finally be employed with the proposed
dynamics-based prior method. |
|
Figure 2: Summary of optimization framework implemented for pose estimation on each frame
|
| Research Issue- Human Heading Estimation from Videos |
|
We model the heading estimation task independent of features/types of individual estimators and focus on optimally fusing the information from all the available estimators. Hence, we propose a Product of Heading Experts (PoHE) based generalized heading estimation framework which probabilistically merges heading outputs from time varying number of estimators to produce robust heading estimates under varied conditions in unconstrained scenarios. Further, we developed a novel generative model for estimating heading direction of the subject in the video using motion-based cues thus, significantly reducing the pose search space. |
|
|
|
|
|
|
| Research Issue- Pose Estimation and Optimization | |||
|
In order to tackle this complex human pose estimation problem, we adopted a sequential optimization based framework to determine the optimal and uncoupled pose states (camera/body location, body joint angles) separately using a combination of deterministic and probabilistic optimization approaches to leverage the advantages associated with each. By implementing a probabilistic-deterministic optimization scheme, faster convergence to the global minima were achieved. Initial guesses for our problem were estimated using population based global optimization technique for deterministic convex optimization scheme. Finally, we introduced the notion of pose evaluation for videos with multiple humans to quantitatively evaluate the (optimal) pose estimates by defining identity maintained pose evaluation metrics. |
|
|
|
|
Figure 3: Manually Annotated Human Markers with Image Overlay |
|||
|
|
|
![]() ![]() |
|
| Movies - Human Annotation GUI | |
|
|
- Tutorial for MATLAB based manual human annotation application developed to obtain ground truth pose estimates data from the video datasets.
|
| Movies - Motion Detection based on Bounding Box (Results) |
|
|
|
|
|
|
|
|
| Students Involved: |
|
- Priyanshu Agarwal, MS Student, University at Buffalo - Suren Kumar, PhD Student, University at Buffalo |
| Related Publications - Conference Proceedings: | ||
| [01] | P. Agarwal, S. Kumar, J. Ryde, J. Corso, and V. Krovi. Estimating Human Dynamics On-the-fly Using Monocular Video for Pose Estimation. Robotics: Science and Systems Conference, University of Sydney, Sydney, Australia, July 9-13, 2012. | [PDF] |
| [02] | P. Agarwal, S. Kumar, J. Corso, and V. Krovi. Estimating Dynamics On-the-fly Using Monocular Video. Dynamic Systems and Control Conference, California, October 12-14, 2011 | [PDF] |
| Related Publications - Theses | ||
| [01] | P. Agarwal, Dynamics-based Human Pose Estimation Using Monocular Vision, M.S. Thesis, Department of Mechanical & Aerospace Engineering, SUNY at Buffalo, Jun 2012. | [PDF] |
Questions or comments regarding the website, please contact the webmaster.
Last Updated: April 21, 2012