RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception

1ETH Zürich, Switzerland 2HKUST (Guangzhou), China 3HKUST, Hong Kong (China)
*Equal Contribution

Relying on single-view perception, our method achieves robust grasping (94.6% success rate) of 500+ unseen objects with various shapes, sizes, materials, masses, and random poses.
The video shows continuous grasping of diverse objects without failure.

Abstract

The ability to robustly grasp a variety of objects is essential for dexterous robots. In this paper, we present a framework for zero-shot dynamic dexterous grasping using single-view visual inputs, designed to be resilient to various disturbances. Our approach utilizes a hand-centric object shape representation based on dynamic distance vectors between finger joints and object surfaces. This representation captures the local shape around potential contact regions rather than focusing on detailed global object geometry, thereby enhancing generalization to shape variations and uncertainties. To address perception limitations, we integrate a privileged teacher policy with a mixed curriculum learning approach, allowing the student policy to effectively distill grasping capabilities and explore for adaptation to disturbances. Trained in simulation, our method achieves success rates of 97.0% across 247,786 simulated objects and 94.6% across 512 real objects, demonstrating remarkable generalization. Quantitative and qualitative results validate the robustness of our policy against various disturbances.

Objects Used for Evaluation

Our method is evaluated on a diverse set of objects with various materials, shapes, and masses, which are all unseen during training.

Diverse set of test objects

500+ real-world objects used in our experiments.

Object attributes

Distribution of object attributes.

Generalization

Trained with only 35 objects in simulation, our method demonstrates exceptional generalization capability across 500+ unseen real objects with various physical properties and random poses.

Continual robust grasping of diverse objects with varying shapes, weights, and materials.

Robust grasping of the same objects with random poses on the table.

Robustness

Utilizing a reinforcement-learning-based training framework, our method shows great robustness and adaptability to internal noises and external disturbances.

Adaptive motions to observation noises and actuator inaccuracies.

Maintaining stable grasps despite external disturbances such as unexpected forces.

Method Comparison

With closed-loop RL-based dynamic control, our method demonstrates adaptive arm and finger motions, leading to robust grasping. We compare our method with:
1) executing grasping poses from DexGraspNet (with extra torques for firm grasps)
2) a naive controller that gradually closes fingers to grasp objects

The baselines suffer from inperfect grasp pose or undesired object movements.

The baselines suffer from thin objects due to the requirement of precise contact points.

The baselines suffer from round objects as misaligned torques lead to non-force closure grasps.

Application

Grasping target objects under the disturbances caused by other objects in a cluttered environment.

Language-guided grasping and failure recovery with a VLM planner.

BibTeX

@article{zhang2025RobustDexGrasp,
      title={{RobustDexGrasp}: Robust Dexterous Grasping of General Objects from Single-view Perception},
      author={Zhang, Hui and Wu, Zijian and Huang, Linyi and Christen, Sammy and Song, Jie},
      journal={arXiv preprint arXiv:2504.05287},
      year={2025}
    }