GPT-2 (124M) reproduction

Full Karpathy-style reproduction. 8× A100, 19073 steps over 10B FineWeb-Edu tokens, $61, val loss 3.40 (97% of OpenAI baseline).