shipped GPT-2 (124M) reproduction Full Karpathy-style reproduction. 8× A100, 19073 steps over 10B FineWeb-Edu tokens, $61, val loss 3.40 (97% of OpenAI baseline). Equational theories cheatsheet An 8.71 KB Birkhoff-sound prompt for SAIR's equational-theories competition (Tao + Davis). A 31B model running this prompt beat a 120B one on the hardest set. C++ → Python transformer 16.4M-parameter encoder-decoder for code translation, trained on XLCoST on a GTX 1650. val_loss 2.0474. BPE tokenizer Pure-Python byte-pair encoding, plus a deep dive on why tokenization makes LLMs weird (SolidGoldMagikarp, spelling, arithmetic). Tiny Shakespeare GPT 1.83M-parameter character-level decoder transformer on Tiny Shakespeare. Same architecture as GPT-2, scaled down. makemore Five character-level language models on 32K baby names — bigram counts, MLP, BatchNorm, manual backprop, WaveNet-style. micrograd Scalar-valued autograd engine in ~150 lines of pure Python. Supports +, *, **, tanh, exp, plus a tiny MLP. in-progress Math SLM (in progress) A 1.5B math-reasoning model. LoRA SFT of DeepSeek-R1-Distill-Qwen-1.5B on teacher-generated CoT data, BoN-32 inference with code voting. Target benchmarks include MathNet (ICLR 2026 text-only 500). Ukiyo-e Haiku VLM (planned) Vision-language model that writes haiku about Japanese ukiyo-e woodblock prints. SigLIP vision encoder + Qwen LLM, LoRA fine-tuned on Met Museum API images.