-
stateful-agent: an agent that actually remembers
A from-scratch agent harness built around cross-session memory — a tool-use control loop over a real streaming backend (Kafka, Flink, Redis, Cassandra) with semantic recall and rolling-summary compaction, served as an API.
-
A VLA from scratch: 29% tokens, 0% grasps, and a GRPO that wouldn't budge
Building a vision-language-action model from scratch on LIBERO-Spatial: SigLIP + Qwen2.5-3B, behavior cloning to 29% action-token accuracy but 0% closed-loop success, and a GRPO run that trained cleanly without improving. An honest negative result, plus the six bugs in the way.
-
A haiku VLM: SFT did the work, KTO collapsed at λ=1.0
A LLaVA-pattern VLM that writes a 5-7-5 haiku for a ukiyo-e woodblock print. SigLIP (frozen) + trained projector + Qwen2.5-3B (LoRA), on 3,913 Met Museum prints, in English and Japanese. SFT delivered ~95% of the lift; preference optimization only helped where the chosen/rejected gap was real; KTO collapsed at its default λ_U.
-
A 7B math fine-tune on 8× H100: SFT +6.4, DPO +0.6
Two-stage LoRA (SFT then DPO) on DeepSeek-R1-Distill-Qwen-7B, end-to-end on a single 8× H100 pod for ~$93. SFT lifted four math benchmarks by +6.4 pp average; DPO at conservative defaults moved nothing, and the training-time reward margin predicted it.
-
Eight A100s, $61, and 124M parameters
Full reproduction of GPT-2 124M on rented multi-GPU hardware. Val loss 3.40 vs OpenAI's 3.29 (97% match), HellaSwag 27% vs 29.45%, in 2.5 hours of training.