ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

Shaofei Cai , Zhancun Mu , Anji Liu , Yitao Liang
AAAI 2026 2026

Abstract

We present ROCKET-2, a state-of-the-art Minecraft Agent supporting cross-view goal specification.

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment teaser
Figure 1: ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment Overview

Abstract#

We present ROCKET-2, a state-of-the-art Minecraft Agent supporting cross-view goal specification. This work builds upon our previous ROCKET framework and introduces novel techniques for steering visuomotor policies through cross-view goal alignment.

Key Contributions#

  1. Cross-View Goal Specification: Novel approach to goal specification that works across different viewing angles and perspectives.

  2. Visuomotor Policy Steering: Advanced techniques for steering policies based on visual input and motor actions.

  3. State-of-the-Art Performance: Demonstrates superior performance on various Minecraft tasks compared to existing methods.

Methodology#

Our approach combines several key innovations:

  • Cross-view alignment mechanisms
  • Robust goal conditioning
  • Multi-modal fusion techniques
  • Hierarchical policy learning

Results#

ROCKET-2 achieves significant improvements over baseline methods on standard Minecraft benchmarks, demonstrating the effectiveness of our cross-view goal alignment approach.

Future Work#

We plan to extend this work to other domains beyond Minecraft and explore applications in real-world robotics scenarios.

Comment seems to stuck. Try to refresh?✨
© 2026 Zhancun Mu. All rights reserved.