Figure 1: Overview of H-RDT. A human-to-robotics diffusion transformer with two-stage training.
Figure 2: H-RDT framework.
This task involves manipulating deformable towel with two sequential folds, where the first fold requires bimanual coordination to simultaneously grasp the towel's bottom edges.
This task requires spatial reasoning to select the appropriate arm (left or right) based on the cup's position relative to the coaster.
Water Pouring
Plates Stacking
Pen Capping
Note: More demos coming soon
Figure 4: Task definition of real-world experiments.
Our real-world experiments encompass diverse bimanual manipulation tasks across multiple robotic platforms:
These tasks validate H-RDT's ability to handle complex real-world scenarios with varying degrees of dexterity and coordination requirements.
If you find our work helpful, please cite us:
@misc{bi2025hrdt,
title={H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation},
author={Hongzhe Bi and Lingxuan Wu and Tianwei Lin and Hengkai Tan and Zhizhong Su and Hang Su and Jun Zhu},
year={2025},
eprint={2507.23523},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://embodiedfoundation.github.io/hrdt},
}
Thank you!