Optical 3D sensors, such as RGB-D cameras and LIDAR, have been widely used in robotic applications to create accurate 3D maps of environments ranging from dynamic scenes observed by self-driving cars to tabletop settings in autonomous manipulators. Despite their success in these complex scenarios, transparent objects can pose significant challenges to these optical sensors. These sensors typically assume that all surfaces are Lambertian, meaning they reflect light uniformly in all directions. However, transparent objects do not adhere to this assumption, causing distortions and reflections that disrupt depth information, leading to errors in detecting and localizing target objects. Enabling robots to sense and detect transparent surfaces enhances safety and expands the scope of applications, such as in biosynthesis, work cells, clean-room environments, and handling kitchenware. This project addresses this challenge using custom Neural Radiance Fields (NeRF) to learn implicit surface modeling and geometry estimation tasks. NeRF creates detailed 3D models of real-world scenes from sparse 2D RGB images, accurately detecting, localizing, and inferring the geometry of transparent objects. This enables downstream tasks like grasping, manipulation, and mapping without relying on prior information about transparent objects, such as lighting or camera positions. To achieve full autonomy, it is essential for manipulators to handle tasks, such as pick-and-place, even with rough initial surface estimates. Thus, developing task-generalizable algorithms that adapt to objects of varying geometries is critical. This work leverages NeRFs to accurately estimate 6-DoF poses of transparent objects by combining view-independent density modeling with transparency-aware depth rendering. The resulting occupancy mapping is then used to train a generalizable grasp planner capable of handling and reorienting objects with diverse geometries.
The objective of this work is to develop a view-independent 3D vision-based geometry and state estimation algorithm for detecting and handling transparent objects using a robotic manipulator. To achieve this, the work focuses on demonstrating the grasping of a transparent object placed in a stable pose within a tabletop setting. This task presents a dual challenge: the depth rendering quality must be sufficient to support both grasp planning and collision avoidance, while ensuring that the final grasp sample point is precise enough to accurately distinguish transparent objects located in close proximity.
Depth estimation with Kinect fails as it is unable to detect transparent objects.
(1) We initially lift the 2D monocular observation to a implicit 3D representation using a NeRF model. (2) Through raymarching we get a explicit representation of the 3D scene. (3) We then use the 3D representation to estimate the 6-DoF pose of the object. (4) Finally, we use the pose to plan a grasp for the object.
Depth map of scene
Pose estimation through ICP. We also evaluate learned pose estimation, directly from the implicit scene representation!
Grasp Planning
PyBullet Testbed. We evaluate the pick-and-place success rates in simulation, testing our approach with various shapes and novel objects.