VLA on LIBERO-Spatial (BC → GRPO)
Vision-language-action model from scratch: SigLIP + Qwen2.5-3B + action-token head. Behavior cloning to 29% action-token accuracy but 0% closed-loop success; GRPO trained cleanly on a dense reaching reward but stayed flat. An honest negative result with a full bug log.