3D Vision Based Grasp Planning for Transparent Objects

1 Vision System Lab, Department of Mechanical Engineering
Thiagarajar College of Engineering (affiliated to Anna University), India

Abstract

Optical 3D sensors, such as RGB-D cameras and LIDAR, have been widely used in robotic applications to create accurate 3D maps of environments ranging from dynamic scenes observed by self-driving cars to tabletop settings in autonomous manipulators. Despite their success in these complex scenarios, transparent objects can pose significant challenges to these optical sensors. These sensors typically assume that all surfaces are Lambertian, meaning they reflect light uniformly in all directions. However, transparent objects do not adhere to this assumption, causing distortions and reflections that disrupt depth information, leading to errors in detecting and localizing target objects. Enabling robots to sense and detect transparent surfaces enhances safety and expands the scope of applications, such as in biosynthesis, work cells, clean-room environments, and handling kitchenware. This project addresses this challenge using custom Neural Radiance Fields (NeRF) to learn implicit surface modeling and geometry estimation tasks. NeRF creates detailed 3D models of real-world scenes from sparse 2D RGB images, accurately detecting, localizing, and inferring the geometry of transparent objects. This enables downstream tasks like grasping, manipulation, and mapping without relying on prior information about transparent objects, such as lighting or camera positions. To achieve full autonomy, it is essential for manipulators to handle tasks, such as pick-and-place, even with rough initial surface estimates. Thus, developing task-generalizable algorithms that adapt to objects of varying geometries is critical. This work leverages NeRFs to accurately estimate 6-DoF poses of transparent objects by combining view-independent density modeling with transparency-aware depth rendering. The resulting occupancy mapping is then used to train a generalizable grasp planner capable of handling and reorienting objects with diverse geometries.


Objective

The objective of this work is to develop a view-independent 3D vision-based geometry and state estimation algorithm for detecting and handling transparent objects using a robotic manipulator. To achieve this, the work focuses on demonstrating the grasping of a transparent object placed in a stable pose within a tabletop setting. This task presents a dual challenge: the depth rendering quality must be sufficient to support both grasp planning and collision avoidance, while ensuring that the final grasp sample point is precise enough to accurately distinguish transparent objects located in close proximity.

Image Description

Depth estimation with Kinect fails as it is unable to detect transparent objects.


Neural representation learning for grasping

Image Description

(1) We initially lift the 2D monocular observation to a implicit 3D representation using a NeRF model. (2) Through raymarching we get a explicit representation of the 3D scene. (3) We then use the 3D representation to estimate the 6-DoF pose of the object. (4) Finally, we use the pose to plan a grasp for the object.


Some Results

Sample of novel-view rendering (Implict rendering)
GIF Description

Depth map of scene

Image Description

Pose estimation through ICP. We also evaluate learned pose estimation, directly from the implicit scene representation!

Grasp Planning

CA Image
CA2 Image
Case A: Test Tube grasped by the rim and placed upright
CB Image
CB2 Image
Case B: Test Tube grasped at the bottom and placed up side down

PyBullet Testbed. We evaluate the pick-and-place success rates in simulation, testing our approach with various shapes and novel objects.


This webpage template was recycled from LION and LEAP.

Accessibility