Zhancun Mu

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

Shaofei Cai, Zhancun Mu, Anji Liu, Yitao Liang

arXiv preprint • 2025

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment teaser

Abstract#

We present ROCKET-2, a state-of-the-art Minecraft Agent supporting cross-view goal specification. This work builds upon our previous ROCKET framework and introduces novel techniques for steering visuomotor policies through cross-view goal alignment.

Key Contributions#

  1. Cross-View Goal Specification: Novel approach to goal specification that works across different viewing angles and perspectives.

  2. Visuomotor Policy Steering: Advanced techniques for steering policies based on visual input and motor actions.

  3. State-of-the-Art Performance: Demonstrates superior performance on various Minecraft tasks compared to existing methods.

Methodology#

Our approach combines several key innovations:

  • Cross-view alignment mechanisms
  • Robust goal conditioning
  • Multi-modal fusion techniques
  • Hierarchical policy learning

Results#

ROCKET-2 achieves significant improvements over baseline methods on standard Minecraft benchmarks, demonstrating the effectiveness of our cross-view goal alignment approach.

Future Work#

We plan to extend this work to other domains beyond Minecraft and explore applications in real-world robotics scenarios.

Comment seems to stuck. Try to refresh?✨