Abstract#
We present ROCKET-2, a state-of-the-art Minecraft Agent supporting cross-view goal specification. This work builds upon our previous ROCKET framework and introduces novel techniques for steering visuomotor policies through cross-view goal alignment.
Key Contributions#
-
Cross-View Goal Specification: Novel approach to goal specification that works across different viewing angles and perspectives.
-
Visuomotor Policy Steering: Advanced techniques for steering policies based on visual input and motor actions.
-
State-of-the-Art Performance: Demonstrates superior performance on various Minecraft tasks compared to existing methods.
Methodology#
Our approach combines several key innovations:
- Cross-view alignment mechanisms
- Robust goal conditioning
- Multi-modal fusion techniques
- Hierarchical policy learning
Results#
ROCKET-2 achieves significant improvements over baseline methods on standard Minecraft benchmarks, demonstrating the effectiveness of our cross-view goal alignment approach.
Future Work#
We plan to extend this work to other domains beyond Minecraft and explore applications in real-world robotics scenarios.