-
BPE from scratch, and why your LLM can't count L's
Byte-pair encoding implemented in pure Python. Plus SolidGoldMagikarp, the encode/decode asymmetry, and a list of LLM weirdness all caused by the tokenizer.
-
Birkhoff in 8.7 KB
An 8.71 KB prompt for SAIR's equational-theories competition (Tao + Davis, follow-up to Honda-Murakami-Zhang 2025). Replace free-form LLM reasoning with a 9-magma Birkhoff-sound decision procedure. A 31B model running this prompt beat a 120B one on the hardest set.
-
Tiny Shakespeare, tiny GPT
A 1.83M-parameter decoder-only transformer trained on 1MB of Shakespeare. Architecture is identical to GPT-2, just smaller.
-
makemore: from counting bigrams to a WaveNet
Five character-level language models trained on 32K baby names. Bigram → MLP → BatchNorm → manual backprop → hierarchical fusion.
-
micrograd: a scalar-valued autograd engine
A 150-line autograd engine that supports +, *, **, tanh, exp, and a tiny MLP on top.