RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception

1ETH Zürich, Switzerland 2HKUST (Guangzhou), China 3HKUST, Hong Kong (China)
*Equal Contribution

Relying on single-view perception, our method achieves robust grasping (94.6% success rate) of 500+ unseen objects with various shapes, sizes, materials, masses, and random poses.
The video shows continuous grasping of diverse objects without failure.

Abstract

Robust grasping of various objects from single-view perception is fundamental for dexterous robots. Previous works often rely on fully observable objects, expert demonstrations, or static grasping poses, which restrict their generalization ability and adaptability to external disturbances. In this paper, we present a reinforcement-learning-based framework that enables zero-shot dynamic dexterous grasping of a wide range of unseen objects from single-view perception, while performing adaptive motions to external disturbances. We utilize a hand-centric object representation for shape feature extraction that emphasizes interaction-relevant local shapes, enhancing robustness to shape variance and uncertainty. To enable effective hand adaptation to disturbances with limited observations, we propose a mixed curriculum learning strategy, which first utilizes imitation learning to distill a policy trained with privileged real-time visual-tactile feedback, and gradually transfers to reinforcement learning to learn adaptive motions under disturbances caused by observation noises and dynamic randomization. Our experiments demonstrate strong generalization in grasping unseen objects with random poses, achieving success rates of 97.0% across 247,786 simulated objects and 94.6% across 512 real objects. We also demonstrate the robustness of our method to various disturbances, including unobserved object movement and external forces, through both quantitative and qualitative evaluations.

Objects Used for Evaluation

Our method is evaluated on a diverse set of objects with various materials, shapes, and masses, which are all unseen during training.

Diverse set of test objects

500+ real-world objects used in our experiments.

Object attributes

Distribution of object attributes.

Generalization

Trained with only 35 objects in simulation, our method demonstrates exceptional generalization capability across 500+ unseen real objects with various physical properties and random poses.

Continual robust grasping of diverse objects with varying shapes, weights, and materials.

Robust grasping of the same objects with random poses on the table.

Robustness

Utilizing a reinforcement-learning-based training framework, our method shows great robustness and adaptability to internal noises and external disturbances.

Adaptive motions to observation noises and actuator inaccuracies.

Real-time adaptation to environmental changes such as object movements.

Maintaining stable grasps despite external disturbances such as unexpected forces.

Method Comparison

With closed-loop RL-based dynamic control, our method demonstrates adaptive arm and finger motions, leading to robust grasping. We compare our method with:
1) executing grasping poses from DexGraspNet (with extra torques for firm grasps)
2) a naive controller that gradually closes fingers to grasp objects

The baselines suffer from inperfect grasp pose or undesired object movements.

The baselines suffer from thin objects due to the requirement of precise contact points.

The baselines suffer from round objects as misaligned torques lead to non-force closure grasps.

Application

Grasping target objects under the disturbances caused by other objects in a cluttered environment.

Language-guided grasping and failure recovery with a VLM planner.

BibTeX

@article{zhang2025RobustDexGrasp,
      title={{RobustDexGrasp}: Robust Dexterous Grasping of General Objects from Single-view Perception},
      author={Zhang, Hui and Wu, Zijian and Huang, Linyi and Christen, Sammy and Song, Jie},
      journal={arXiv preprint arXiv:2504.05287},
      year={2025}
    }