shipped GPT-2 (124M) reproduction Full Karpathy-style reproduction. 8× A100, 19073 steps over 10B FineWeb-Edu tokens, $61, val loss 3.40 (97% of OpenAI baseline). Math SLM (SFT + DPO) Two-stage LoRA on DeepSeek-R1-Distill-Qwen-7B. SFT +6.4 pp across four math benchmarks; DPO a config-bottlenecked no-op. End-to-end on 8× H100 for ~$93. stateful-agent (event-driven assistant) A from-scratch agent harness centered on cross-session memory: a tool-use control loop over a streaming backend (Kafka / Flink / Redis / Cassandra) with semantic recall and rolling-summary compaction, served via FastAPI + Docker. Equational theories cheatsheet An 8.71 KB Birkhoff-sound prompt for SAIR's equational-theories competition (Tao + Davis). A 31B model running this prompt beat a 120B one on the hardest set. Ukiyo-e Haiku VLM LLaVA-pattern VLM that writes a haiku for a ukiyo-e print. SigLIP + trained projector + Qwen2.5-3B LoRA, in English and Japanese. SFT did ~95% of the lift; KTO collapsed at default λ_U. C++ → Python transformer 16.4M-parameter encoder-decoder for code translation, trained on XLCoST on a GTX 1650. val_loss 2.0474. VLA on LIBERO-Spatial (BC → GRPO) Vision-language-action model from scratch: SigLIP + Qwen2.5-3B + action-token head. Behavior cloning to 29% action-token accuracy but 0% closed-loop success; GRPO trained cleanly on a dense reaching reward but stayed flat. An honest negative result with a full bug log. BPE tokenizer Pure-Python byte-pair encoding, plus a deep dive on why tokenization makes LLMs weird (SolidGoldMagikarp, spelling, arithmetic). Tiny Shakespeare GPT 1.83M-parameter character-level decoder transformer on Tiny Shakespeare. Same architecture as GPT-2, scaled down. makemore Five character-level language models on 32K baby names — bigram counts, MLP, BatchNorm, manual backprop, WaveNet-style. micrograd Scalar-valued autograd engine in ~150 lines of pure Python. Supports +, *, **, tanh, exp, plus a tiny MLP. in-progress