notes

one project per post — architecture, bugs, numbers

A haiku VLM: SFT did the work, KTO collapsed at λ=1.0

A LLaVA-pattern VLM that writes a 5-7-5 haiku for a ukiyo-e woodblock print. SigLIP (frozen) + trained projector + Qwen2.5-3B (LoRA), on 3,913 Met Museum prints, in English and Japanese. SFT delivered ~95% of the lift; preference optimization only helped where the chosen/rejected gap was real; KTO collapsed at its default λ_U.

14 min read · 2026

A 7B math fine-tune on 8× H100: SFT +6.4, DPO +0.6

Two-stage LoRA (SFT then DPO) on DeepSeek-R1-Distill-Qwen-7B, end-to-end on a single 8× H100 pod for ~$93. SFT lifted four math benchmarks by +6.4 pp average; DPO at conservative defaults moved nothing, and the training-time reward margin predicted it.

16 min read · 2026

Eight A100s, $61, and 124M parameters

Full reproduction of GPT-2 124M on rented multi-GPU hardware. Val loss 3.40 vs OpenAI's 3.29 (97% match), HellaSwag 27% vs 29.45%, in 2.5 hours of training.

18 min read · 2026

Birkhoff in 8.7 KB

An 8.71 KB prompt for SAIR's equational-theories competition (Tao + Davis, follow-up to Honda-Murakami-Zhang 2025). Replace free-form LLM reasoning with a 9-magma Birkhoff-sound decision procedure. A 31B model running this prompt beat a 120B one on the hardest set.

16 min read · 2026

stateful-agent: an agent that actually remembers

A from-scratch agent harness built around cross-session memory — a tool-use control loop over a real streaming backend (Kafka, Flink, Redis, Cassandra) with semantic recall and rolling-summary compaction, served as an API.

9 min read · June 07, 2026

2026 · agents llm memory systems infrastructure · deep-learning
A VLA from scratch: 29% tokens, 0% grasps, and a GRPO that wouldn't budge

Building a vision-language-action model from scratch on LIBERO-Spatial: SigLIP + Qwen2.5-3B, behavior cloning to 29% action-token accuracy but 0% closed-loop success, and a GRPO run that trained cleanly without improving. An honest negative result, plus the six bugs in the way.

12 min read · June 07, 2026

2026 · vla robotics grpo reinforcement-learning libero · deep-learning
A haiku VLM: SFT did the work, KTO collapsed at λ=1.0

A LLaVA-pattern VLM that writes a 5-7-5 haiku for a ukiyo-e woodblock print. SigLIP (frozen) + trained projector + Qwen2.5-3B (LoRA), on 3,913 Met Museum prints, in English and Japanese. SFT delivered ~95% of the lift; preference optimization only helped where the chosen/rejected gap was real; KTO collapsed at its default λ_U.

14 min read · May 23, 2026

2026 · vlm multimodal lora orpo kto llava · deep-learning
A 7B math fine-tune on 8× H100: SFT +6.4, DPO +0.6

Two-stage LoRA (SFT then DPO) on DeepSeek-R1-Distill-Qwen-7B, end-to-end on a single 8× H100 pod for ~$93. SFT lifted four math benchmarks by +6.4 pp average; DPO at conservative defaults moved nothing, and the training-time reward margin predicted it.

16 min read · May 22, 2026

2026 · sft dpo lora reasoning deepspeed vllm · deep-learning
Eight A100s, $61, and 124M parameters

Full reproduction of GPT-2 124M on rented multi-GPU hardware. Val loss 3.40 vs OpenAI's 3.29 (97% match), HellaSwag 27% vs 29.45%, in 2.5 hours of training.

18 min read · May 17, 2026

2026 · gpt-2 reproduction training ddp · deep-learning

notes

one project per post — architecture, bugs, numbers

A haiku VLM: SFT did the work, KTO collapsed at λ=1.0

A 7B math fine-tune on 8× H100: SFT +6.4, DPO +0.6

Eight A100s, $61, and 124M parameters

Birkhoff in 8.7 KB

stateful-agent: an agent that actually remembers

A VLA from scratch: 29% tokens, 0% grasps, and a GRPO that wouldn't budge

A haiku VLM: SFT did the work, KTO collapsed at λ=1.0

A 7B math fine-tune on 8× H100: SFT +6.4, DPO +0.6

Eight A100s, $61, and 124M parameters