Lessons from Figure AI’s Sorting Challenge — The Real Bottleneck of Humanoid Robots Lies in Its "Eyes"

Lessons from Figure AI’s Sorting Challenge — The Real Bottleneck of Humanoid Robots Lies in Its "Eyes"

Sensor LiDAR In-depth Insights
Theme: Warehouse deployment of humanoid robots, sorting perception solutions & application value of RGBD ToF cameras
 
The entire robotics industry is closely following one live stream recently: the long-duration warehouse sorting endurance test conducted by Figure AI with its F.03 humanoid robots.
 
Operating non-stop day and night, three robots take shifts to work continuously, with hundreds of thousands of parcels sorted and full field test data fully disclosed. For the first time, humanoid robots have stepped out of exhibition halls where they only perform simple demos, and officially entered real industrial scenarios featuring repetitive tasks, complex interference and long-term stable operation.
 
However, behind the heated discussion over whether humanoid robots can replace manual labor and boost efficiency, there lies a more essential truth:
It is not motors, computing power nor motion algorithms that restrict the large-scale commercial rollout of humanoid robots, but unstable environmental perception capability.
 
1. Common Industry Pain Points Exposed in Figure AI’s Live Test
 
Even equipped with the top-tier end-to-end vision model Helix-02, Figure AI’s robots still encountered many typical practical problems during the long-hour field test:
 
- Unstable recognition of thin soft packages, plastic bags and bubble wraps

- Empty grasping, misoperation and ineffective movement towards empty space when handling irregularly deformed parcels

- Positioning lag when tracking items on moving conveyor belts

- Recognition deviation caused by dim light, strong reflection and mixed ambient colors
 
These issues do not stem from insufficient AI algorithm performance, but inadequate dimension of acquired environmental data.
 
Traditional 2D RGB cameras can only capture flat images. Robots are able to identify colors, textures and barcodes, yet fail to detect the actual height, thickness, softness of objects, as well as whether items are suspended or placed flat and their tilting angles.
 
2D vision works perfectly in well-structured laboratory environments, but fails to adapt to real warehouse conditions with randomly placed parcels, diverse materials, dynamic moving targets and unstable lighting. Lacking depth information, even advanced AI models are prone to misjudgment.
 
To perform practical work steadily, humanoid robots must be equipped with vision systems that can perceive the 3D world accurately.
 
2. Why RGBD ToF Cameras Are the Optimal Choice for Sorting & Precision Grasping
 
Many people tend to confuse LiDAR and ToF cameras. With LiDAR widely popularized, why do humanoid robots still need RGBD ToF cameras?
 
Here is the clear scenario positioning in the industry:
LiDAR serves as the long-distance vision for robot navigation
Main functions: long-distance mapping, global positioning, obstacle avoidance and walking safety
Features: excellent long-range detection performance but sparse point cloud data, not suitable for close-range precision manipulation
 
RGBD ToF cameras act as the core visual eyes for close-range operation
Main functions: target recognition, pose calculation, deformation detection and precise fitting & flexible object handling
Features: high-density depth data within short range, high frame rate, low latency, cost-effectiveness and active light sensing capability
 
Over 90% of operational challenges during humanoid robot sorting work occur within the interaction range of 0.05 to 1.5 meters, which is exactly the dominant application scenario of RGBD ToF cameras.
 
In the whole sorting workflow of Figure AI robots, a series of high-precision actions need to be completed in sequence: parcel identification, shape judgment, hardness distinction, grabbing point positioning, dynamic tracking of conveyor belts, barcode adjustment and stable placement.
 
All these delicate interactions rely on synchronized high-frame-rate color and depth data with millisecond-level response.
 
3. How RGBD ToF Cameras Solve Core Sorting Pain Points of Humanoid Robots
 
Combined with the actual test scenario of Figure AI, we analyze the practical values of RGBD ToF cameras:
 
1. Effectively eliminate empty grasping and inaccurate picking
 
While RGB cameras only capture outlines, RGBD ToF cameras can accurately detect the actual thickness, protrusions, depressions and suspended gaps of objects. For soft packages, deformed express parcels and irregular materials, it can lock effective grabbing areas efficiently and greatly reduce the empty grasping rate.
 
2. Real-time dynamic tracking with low latency
 
Adopting global shutter imaging technology, ToF cameras output complete depth information synchronously without scanning ghosting or accumulated delay. Robots can update object poses in real time to keep pace with the running speed of conveyor belts.
 
3. Strong anti-interference capability for 24-hour unmanned warehouse operation
 
Built-in active infrared light sources enable ToF cameras to work stably independent of ambient light. It maintains stable depth accuracy under complex lighting conditions including flickering warehouse lights, alternating brightness and darkness, backlight and shadow areas, fully meeting the 7/24 continuous working requirements of Figure AI robots.
 
4. Cost-effective for mass production and large-scale deployment
 
Industrial LiDAR features large size, high power consumption and high cost, making it impractical to be installed in large quantities on the head and arms of each humanoid robot. In contrast, RGBD ToF cameras are compact, low-power and highly cost-efficient, supporting binocular and multi-eye layout design, which is essential for the commercial mass production of humanoid robots.
 
4. Industry Consensus: Future Humanoid Robots Adopt Dual Perception Fusion System
 
Drawing conclusions from Figure AI’s extreme field test, the mainstream perception architecture for next-generation humanoid robots has been confirmed:
 
- LiDAR: responsible for long-distance navigation and walking obstacle avoidance

- RGBD ToF cameras: dedicated to close-range operation, grabbing interaction and flexible object handling
 
Advanced large models empower robots with environmental understanding ability, while ToF cameras provide solid 3D spatial perception capability.
 
Without high-quality RGBD depth data input, even the most powerful end-to-end AI algorithms are limited and cannot achieve accurate environmental cognition.
 
5. Conclusion: Competition of Humanoid Robots Has Extended to Core Perception Hardware
 
From factory sorting, warehouse handling and supermarket shelf restocking to future household services, the large-scale popularization turning point of humanoid robots lies not in improved mobility, but in the capability to perceive the world and interact with surroundings stably, accurately and economically.
 
Figure AI’s field test has proven that humanoid robots capable of long-term stable work must be equipped with professional machine vision solutions.
 
Undoubtedly, RGBD ToF cameras have become indispensable core visual hardware for humanoid robots to move from fancy demos to practical industrial application

Leave a comment

Please note, comments must be approved before they are published

What are you looking for?