Reinforcement Learning Is Finally Getting Serious About Robot Navigation

A batch of new papers suggests the field is moving past toy problems, but I've seen this movie before.

2 days ago3 min read

Five papers crossed my desk this week, all on reinforcement learning for robot navigation, and I'll be honest, I almost didn't write this up. We've been hearing about RL solving navigation "any day now" since before I left Kuka. But something feels different this time, and I think it's worth talking about.

The common thread across these papers is hierarchical control. Split the problem into a high-level policy that figures out where to go, and a low-level policy that handles the actual motor commands. This isn't new (we were doing something similar with behaviour trees back in 2015), but the execution has gotten dramatically better.

Take the underwater vehicle work from arXiv. They've got a high-level policy running at 2Hz processing camera and sonar data, generating spatial subgoals for a low-level controller running at 10Hz. The results are within 4% to 6% of RRT* planning baselines. That's not perfect, but for an end-to-end learned system? That's actually pretty good. When I was at Kuka, we would have killed for that kind of performance from a learning system.

The GHOST framework from another team takes this further for manipulation. Their insight is that sub-goals (basically, where should the end-effector be next?) are largely embodiment-agnostic. So you can train the high-level policy on human video demonstrations, then let the low-level policy figure out how to actually execute those goals on whatever robot you've got. I called my old colleague at Siemens about this, and he pointed out this could dramatically reduce. Well, it could reduce a lot of the data collection burden we've been complaining about for years.

The SARM2 paper tackles what I think is the real bottleneck: reward design. They've built a multi-task reward model that can evaluate whether a robot is making progress on long-horizon manipulation tasks. On their benchmark, they went from around 50% success to near-perfect on tasks like folding shorts (58% to 100%) and cleaning whiteboards (50% to 90%). Those are real improvements, though I'd want to see this replicated outside their specific benchmark before getting too excited.

Related coverage

More in Autonomy

Three new navigation papers tackle the same ugly problem: robots that trust bad visual information too much. The fix isn't more AI horsepower. It's teaching machines a little epistemic humility.

Mark Kowalski · 1 hour ago · 6 min

Researchers want large language models to rewrite the cost functions that govern how self-driving cars move. Bob Macintosh has some thoughts.

Robert "Bob" Macintosh · 1 hour ago · 4 min

Justin Ernest built a captive LP network to back Anthropic, Anduril, and SpaceX without ever raising a traditional venture fund. Sound familiar?

Mark Kowalski · 8 hours ago · 7 min

A pair of fresh arXiv preprints tackle the tension between real-time planning and honest uncertainty in self-driving systems. Neither is a silver bullet, but the ideas are worth examining carefully.

Sources