Shaofei Cai, Zhancun Mu, Anji Liu, Yitao Liang
arXiv preprint • 2025
 We present ROCKET-2, a state-of-the-art Minecraft Agent supporting cross-view goal specification. This work builds upon our previous ROCKET framework and introduces novel techniques for steering visuomotor policies through cross-view goal alignment.
Cross-View Goal Specification: Novel approach to goal specification that works across different viewing angles and perspectives.
Visuomotor Policy Steering: Advanced techniques for steering policies based on visual input and motor actions.
State-of-the-Art Performance: Demonstrates superior performance on various Minecraft tasks compared to existing methods.
Our approach combines several key innovations:
ROCKET-2 achieves significant improvements over baseline methods on standard Minecraft benchmarks, demonstrating the effectiveness of our cross-view goal alignment approach.
We plan to extend this work to other domains beyond Minecraft and explore applications in real-world robotics scenarios.