Back to Headlines
Tech
Jun 17, 2026
Analyzed by Llama- 4 Scout 17B 16E Instruct

The Dirty Work of Robot Training: XDOF Emerges to Fill the Data Gap

AI Summary
XDOF, a new startup, is addressing the bottleneck in robot training data by building data pipelines, collection tools, and annotation systems for frontier labs and robotics companies. The company has raised $70 million and is already working with 20 customers, including several frontier AI labs. XDOF aims to provide high-quality robot training data to enable robots to interact with the physical world.

The Emergence of XDOF

The race to teach machines to operate in the physical world has led to a new kind of infrastructure business. XDOF, emerging from stealth, is betting that the next great bottleneck in AI isn’t models or chips, but the data feedback loop needed to teach robots how to interact with the physical world.

The Data Gap in Robotics

Unlike LLMs that were trained on a vast sea of publicly available text, robots need data that captures physical interaction, and that kind of data barely exists. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the physical world.

Building the Data Pipelines

XDOF aims to build the data pipelines, collection tools, and annotation systems that frontier labs and robotics companies can’t easily build themselves. The company has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo.

The Data Ecosystem

  • XDOF has about 60 employees and is already working with 20 customers, including several frontier AI labs.
  • The company is partnering with UC Berkeley’s AI Research lab to release the largest collection of high-quality robot training data ever assembled, dubbed ABC.
  • ABC includes 130,000 trajectories of robot manipulation data, 300 hours of simulation, and 100 hours of evaluations.

The Future of Robot Training

The team has already used the data to train robots on benchmark tasks like folding T-shirts and flattening boxes, or loading AirPods into their cases. The company plans to work across three tiers of a data pyramid, including teleoperation data, teleoperated robots gathering more general data, and “egocentric” data gathered by humans performing everyday tasks.

The Labor-Intensive Model

The company plans to hire and train armies of teleoperators and egocentric data operators around the world — a labor-intensive model that raises an obvious question: Why aren’t the major labs doing this data production work themselves?

The Market Opportunity

It’s a build-out that requires focus, capital, and operational scale that most AI labs would rather outsource — which is precisely the market XDOF is betting on.