A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture

Visual and Spatial AI Lab1, VCCM Section College of PVFA2, Department of ECEN3, Department of CSCE4,
Texas A&M University, College Station, Texas, USA
† Indicates Corresponding Author: suryanshkumar@exchange.tamu.edu
IEEE/CVF CVPR 2026, 4D World Models Workshop
44 MoCap sequences  •  6 synchronized RGB-D cameras  •  83,768 frames  •  3 interacting subjects  •  Ground-truth SMPL parameters (θ, β, t)

HUM4D provides synchronized multi-view RGB-D sequences aligned with professional Vicon motion capture ground truth, designed to benchmark markerless human motion capture under severe occlusion and multi-person interactions.

Overview

[Jittering] Single Spin
[Jittering] Single Jump
[Jittering] Group Spin
[Jittering] Group Jump
[ID Swap] Group Switch Location
[ID Swap] Group Switch Location
[ID Swap] Group Switch Location
[ID Swap] Group Walk Crosspath
[Occlusion] Group Huddle Blob
[Occlusion] Group Huddle
[Occlusion] Group Huddle
[Occlusion] Group Break Formation
[Near-Far Camera] Single RunInPlace Stop
[Near-Far Camera] Group Walk Toward Camera
[Occlusion] Single Furniture SitStand
[Occlusion] Group Huddle Blob

The dataset includes challenging scenarios such as
Jittering, Identity Switching, Occlusion, and Near-Far Interaction.

Data Acquisition Pipeline

Pipeline figure

Pipeline. Multi-view RGB-D capture is synchronized with Vicon motion capture. Marker trajectories are reconstructed and retargeted to SMPL to produce pose (θ), shape (β), and translation (t), along with evaluation-ready annotations.

Data Acquisition

Capture environment

Capture Environment. Professional motion capture studio with 44 synchronized infrared Vicon cameras and a multi-view RGB-D setup.

Hardware setup

Hardware Setup. From left to right: RGB-D camera perspective layout (1.45 m height), top-view circular arrangement (3 m radius), Intel RealSense D455 sensor, and the Vicon motion capture system.

HUM4D Dataset

Dataset examples
Dataset hierarchy figure 1
Dataset hierarchy figure 2

Dataset structure. HUM4D provides synchronized RGB-D sequences aligned with marker-based MoCap ground truth. We release evaluation-ready annotations and organized data in a hierarchical structure for easy navigation.

Behind the Scenes

Behind the scenes. Footage from the HUM4D recording sessions, illustrating the multi-sensor setup and multi-person interactions.

Download

You can download the dataset from the following links:

  • https://
  • https://
  • https://
  • https://

For data that are not publicly available but are included in HUM4D, contact us at cszghp [at] gmail.com.

Contact

For questions, please contact cszghp [at] gmail.com.

Acknowledgments

The authors would like to thank Michael Walsh for his assistance with the human motion capture acquisition at the RELLIS Starlab facility at Texas A&M University (TAMU). We also thank Morgan Jenks for managing and operating the Vicon motion capture system and for overseeing all aspects of data acquisition. We also acknowledge valuable discussions and feedback from John Keyser from the Department of CSCE at TAMU. Additionally, we thank Jyothi Naidu for support in facilitating the IRB approval process.

Citation

@inproceedings{park2026hum4d,
title={A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture},
author={Park, Yeeun and Naduthodi, Miqdad and Kumar, Suryansh},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)},
year={2026}
}