Enable an embodied agent to perceive → reason about physics → predict and generate world states → plan → act → self-correct across long horizons in the real world, with language/speech as the interface and video as the primary substrate.
Long-Horizon Research Thoughts
My System-level Mission: